Hi,
I am writing a set of Mojo bindings for a CUDA/HIP library that expects applications to launch their kernels on a given stream, because the library may launch dependent kernels on the same stream.
Is there a way to specify the CUDA/HIP stream on which to launch a Mojo kernel? If not, we can work around this, but it requires doing a device synchronize before/after the Mojo kernel launch, so it’s not performance-optimal.
Thanks,
Ben