Launch_bounds support for GPU code

massimim · February 11, 2026, 7:08pm

Hello, I am porting some CUDA code to Mojo.
Does Mojo support a mechanism like launch_bounds to provide the compiler with more info on how to manage SM resources?
Are there plans to support it in the future?
Thank you

emil.martens · February 11, 2026, 7:32pm

Hi, you can check out this file for reference.

Caroline · February 17, 2026, 4:52pm

Hi @massimim, thank you for sharing this question! We do support this, via the @__llvm_metadata( MAX_THREADS_PER_BLOCK_METADATA =StaticTuple[Int32, 1](256)) decorator on the kernel function - here is an example: https://github.com/modularml/modular/blob/main/max/examples/custom_ops/kernels/histogram.mojo#L46-L50

massimim · February 18, 2026, 8:34am

Thank you Emil

massimim · February 18, 2026, 8:43am

Thank you Caroline.

Topic		Replies	Views
Launch bounds for cuda kernels Mojo gpu , mojo-compiler	2	149	July 11, 2025
Launching Mojo kernels on a specific CUDA/HIP stream? GPU Programming	1	57	March 12, 2026
Mojo equivalent for CUDA __maxnreg__ Performance gpu	0	57	March 4, 2026
Examples of custom CPU / GPU operations in Mojo MAX discussion , 24_6	29	1473	October 6, 2025
CPU benchmark finding: Mojo vs Numba sensitive to default thread/runtime behavior — best practices for Mojo defaults? Mojo	8	165	March 30, 2026

Launch_bounds support for GPU code

Related topics