Support for Turing Architecture?

ssslakter · April 25, 2025, 4:08pm

Hi Modular Team,

Any chance we could get official support for NVIDIA’s Turing architecture (RTX 20xx series) in Mojo and MAX? These GPUs are still widely used by individuals, like myself, and small teams, and adding support would make MAX more accessible to a broader audience.

Thanks for considering it!

clattner · April 26, 2025, 1:36am

Hi Slava,

This is a situation where we are caught: we want to be conservative about what we commit to supporting, but aggressive about what we enable. I think this is already possible (maybe some cuda version limitations that are overly conservative). Are you interested in MAX enabling this architecture and doing your own kernel programming on this architecture or are you asking us to deep dive into supporting it for a wide range of use-cases?

BradLarson · April 26, 2025, 2:59pm

To follow up on what Chris is asking, there’s a couple of levels of GPU support: the simple ability to support building Mojo code to target a specific GPU architecture, and the full kernel support for running complex models on a given architecture.

If you just want the ability to program GPUs using Mojo and basic Python MAX graphs, there’s good news: you may be able to add this support for Turing GPUs yourself. As of the last MAX nightly, the gpu module is now open-sourced within the Mojo standard library. It contains our basic support for targeting given architectures, and you can add on to that now.

You potentially could build a local custom Mojo standard library containing support for Turing (sm_75) GPUs. This would let you build Mojo code that runs on your RTX 20XX GPU. Architecture support is largely contained within the gpu/host/info.mojo source file. If you want an example of how to add a new architecture, here’s how we enabled basic GPU programming support for Jetson Orin devices. Some of the data structures have changed a bit since then, but you can get the general idea from that.

If you got that working locally and tested it on your GPU, I think we’d welcome a PR to add this support.

Beyond basic support, I can’t guarantee that our kernels will all work well on a Turing GPU, and they certainly won’t be tuned well for that GPU class out of the box. Stay tuned for more on how you might be able to tune the kernels used in MAX models for your specific GPU architecture.

ssslakter · April 26, 2025, 4:23pm

Thanks for the explanation! I could try writing kernels for Turing GPUs when I learn enough. I’ll definitely play around with this and see what I can figure out.

So the kernels might not work well, but they would work? For example, if I want to simply run max-recipes, would I need to rebuild the standard library from source, rather than using the one installed with magic?

BradLarson · April 26, 2025, 4:30pm

Without extending the standard library, either your Turing GPU won’t be recognized as a MAX-supported GPU, or you’ll get a Mojo compilation failure (“unable to satisfy a constraint: sm_75 not recognized as an architecture”, or something like that).

If you build a modified local version of the Mojo standard library with Turing GPUs added, you may be able to use that to run MAX models. However, none of these kernels have been tested on Turing, so architecture-specific differences may cause them to crash or otherwise not work correctly. Hard to tell what will happen without testing.

ssslakter · April 27, 2025, 7:14am

Hi Brad,
I think I have a little issue with the solution you showed, when I’m building mojo stdlib from source I get an error: unable to locate module 'layout' and I also was unable to find it in the repository. Does this mean I need to combine the library that has layout with the one I built from source somehow?

BradLarson · April 27, 2025, 1:17pm

Sorry about that, we’re open-sourcing these in sequence and the layout module is in one of the next set of libraries to make it into the nightlies. Due to cross-module dependencies, there may be occasional challenges in getting these to build until all of them have landed. That hopefully won’t last more than a few days. I’ll check on this one in particular, since I know it is referenced in a few places.

system · May 4, 2025, 1:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

BradLarson · May 4, 2025, 6:39pm

As an update to the topic, thanks to @ssslakter 's patch to add support: [stdlib] Support for RTX 2060 by ssslakter · Pull Request #4480 · modular/modular · GitHub the latest MAX nightly now has support for NVIDIA Turing-class GPUs! I tested this out myself on a T4 GPU (like those used in the free tier of Google Colab), and both Mojo GPU code and MAX Graphs worked well.

Full MAX models like Llama architectures aren’t fully functional on this GPU yet, because some of the kernels need to be updated to account for the Tensor Cores, etc. on the Turing GPUs. We definitely welcome patches to help extend the relevant kernels to work well on Turing GPUs.

Topic		Replies	Views
Optimization for TPU (Cloud) (https://www.modular.com/blog/max-is-here-what-does-that-mean-for-mojo) Mojo mojo-compiler	1	96	April 16, 2025
Examples of custom CPU / GPU operations in Mojo MAX discussion , 24_6	28	1140	April 9, 2025
Examples of programming GPU functions using the Mojo MAX Driver API MAX discussion , gpu , 25_1	5	379	April 26, 2025
Important Question: Intel Lunar Lake+ Support General	3	54	July 9, 2025
Porting various models to MAX MAX	6	170	May 8, 2025

Support for Turing Architecture?

Related topics