Thank you for saying this - this is exactly what we’re trying to achieve with Mojo, and I’m thrilled it is working well for you!
I’ve been super impressed with Mojo so far, so nice work! Even though we’re not doing anything remotely related to genAI, Mojo is quickly proving to be the best tool available for general scientific HPC anyway.
If you are interested, @martinvuyk has done some really impressive work on GPU FFTs, beating cuFFT by quite a lot.
Yes! I’ve been discussing with Martin about how I can use their new FFT code. Unfortunately, the dynamic sizes issue has been a blocker for my work, but I’m hoping we can find a solution (that doesn’t sacrifice performance) by exploring some kind of JIT approach. It’s encouraging to hear that MAX already does something similar, so maybe there’s hope we can get it working without too much trouble.
I think much of the MAX API was open-sourced recently? So I’ll take a look and try to get an idea of how it works. Maybe using the graph API directly is what we need in this case? Or maybe there’s some lower-level tool in there somewhere that’s a better fit since we’re not actually doing anything with AI models here.