In the post Modular: MAX is here! What does that mean for Mojo🔥? it says “Mojo generalizes code to vastly different hardware, including CPUs, GPUs, and TPUs.” but this is the only mention that I found about the optimization towards TPU and I would like to know if in this case, the code destined to GPU is the one that should be used in order to fully use the TPU for greater inference speed like GPU or training (by third libs).
Optimization for TPU (Cloud) (https://www.modular.com/blog/max-is-here-what-does-that-mean-for-mojo)
MAX and Mojo are built to be hardware-independent and to generalize across arbitrary accelerators. Several of us at Modular have worked on or with the TPU software stack, so we’re pretty familiar with what may be required to target that hardware.
We’ve been very targeted in the addition of accelerator hardware support to MAX, picking individual architectures and making sure we provide the best experience with them before moving on to the next. We first rolled out initial NVIDIA GPU support in December for Ampere and Lovelace-class GPUs, tuned those for performance, and then added Hopper a few weeks ago. More support is coming, but we only tend to talk about new accelerators when they are usable via MAX.
While we don’t have support today for targeting TPUs in MAX, that’s not a limitation of the framework and language design, just a matter of the order in which we are adding support for each platform. As we add new hardware support, we tend to announce in the nightly release notes first, so keep an eye on those and our release announcements to hear the latest.