Hello, I am porting some CUDA code to Mojo.
Does Mojo support a mechanism like launch_bounds to provide the compiler with more info on how to manage SM resources?
Are there plans to support it in the future?
Thank you
Hi, you can check out this file for reference.
Hi @massimim, thank you for sharing this question! We do support this, via the @__llvm_metadata( MAX_THREADS_PER_BLOCK_METADATA =StaticTuple[Int32, 1](256)) decorator on the kernel function - here is an example: https://github.com/modularml/modular/blob/main/max/examples/custom_ops/kernels/histogram.mojo#L46-L50
Thank you Emil
Thank you Caroline.