I am currently porting some CUDA code to Mojo and was wondering about memory management support.
Does Mojo currently support unified memory allocations similar to cudaMallocManaged? If not, are there recommended alternatives for handling data that needs to be accessed from both the host and accelerator?
Additionally, are there any plans or a roadmap to support unified memory (or an equivalent abstraction) in the future?
I’ve been trying my hand at writing some general allocator abstractions, and while I have some things that function they also currently require some fairly drastic changes to the standard library which break it for anyone without a NVIDIA GPU. This is a spare time activity for me so it’s progressing slowly.
The current recommendation is explicit data movement, which I know is annoying to implement but it does have performance benefits. Alternatively, Mojo doesn’t actually stop you from calling cudaMallocManaged via FFI and using that pointer, but there be dragons.