Thanks for asking! GPU programming is an incredibly broad space, and has a reputation for being extremely hard to get into. One of our goals is to make this far more accessible via a more approachable programming model. However, we’re just getting started in providing content and examples to show the capabilities of MAX for GPU programming.
What we’ve shown so far is how we at Modular use MAX for accelerating AI models and other large computational workloads on GPUs. From our experience with different frameworks, we strongly believe that representing large-scale models as computational graphs can provide the best opportunities for optimizing these workloads. As a result, we’ve built a very capable graph compiler that can take a graph that you’ve provided, find opportunities for optimization (fusing kernels, etc.), and run that on an accelerator. These graphs are designed to handle large data structures, like the massive number of weights in a neural network, and efficiently parallelize calculations across a range of hardware (CPUs, GPUs, and more).
Really complex structures (like the entire Llama family of large language models) can be built from graphs of simpler operations, such as adding tensors or multiplying matrices. We’ve provided APIs in Python and Mojo for assembling computational graphs using basic building blocks. Try out a tutorial of building a simple graph that can run on a GPU here.
The nodes in any MAX Graph are composed of operations written in Mojo. Prior to our latest releases, all of those operations had been written in-house by Modular. However, starting with the 25.1 release (and the nightlies that had preceded that), anyone can now write their own custom operations in Mojo in the same fashion as Modular developers. This is another level at which you can program GPUs in MAX.
As a first step in unveiling this GPU programming model, we’ve released a series of examples of custom operations written in Mojo and we add to them regularly in the MAX nightly releases. To start to explain how they work, we have an initial tutorial about how to write a custom operation. We’re also filling out our public API documentation on the types and functions you may see in those examples, such as the gpu
module and the @compiler.register
attribute for registering operations.
You can take each of those examples, run them on a MAX-supported GPU (and also locally on CPU) and start to pick them apart to see how they tick. The API docs linked above start to explain some of the Mojo code that targets the GPU in those examples, but we’ll be expanding upon those explanations.
For an additional outside-of-AI example, I have a long-standing fascination with accelerated image processing, and have been exploring some of our GPU acceleration concepts as part of the MAX-CV framework.
Also, for background on the current GPU programming landscape and how we got here, Chris is in the process of publishing a series on democratizing AI computing that I highly recommend.
Again, we have a lot more in the plans for documentation, tutorials, and examples to make it that much easier to get started with GPU programming. If there are specific areas where we could fill things in, or that you’re struggling with, please let us know and we can try to focus on those areas as we’re assembling this content.