Controlling register pressure on GPUs is critical for performance. CUDA exposes a per-kernel mechanism to enforce a maximum number of registers per thread on the compiler. Is there something equivalent in Mojo?
Reference: 5.4. C/C++ Language Extensions — CUDA Programming Guide
Thank you.