Launch_bounds support for GPU code

Hello, I am porting some CUDA code to Mojo.
Does Mojo support a mechanism like launch_bounds to provide the compiler with more info on how to manage SM resources?
Are there plans to support it in the future?
Thank you

Hi, you can check out this file for reference.

Hi @massimim, thank you for sharing this question! We do support this, via the @__llvm_metadata( MAX_THREADS_PER_BLOCK_METADATA =StaticTuple[Int32, 1](256)) decorator on the kernel function - here is an example: https://github.com/modularml/modular/blob/main/max/examples/custom_ops/kernels/histogram.mojo#L46-L50

Thank you Emil

Thank you Caroline.