Resources for learning MAX for non-ML developers

bgreni · February 21, 2025, 7:04pm

Given that MAX is likely to attract new people to heterogeneous programming (me, I am people). I think it would helpful to have a collection of resources to help developers with no gpu programming experience get started with MAX, writing custom ops, etc.

Potentially including things that may seem obvious to the previously ordained such as “what even is a graph compiler?”

BradLarson · February 21, 2025, 10:52pm

Thanks for asking! GPU programming is an incredibly broad space, and has a reputation for being extremely hard to get into. One of our goals is to make this far more accessible via a more approachable programming model. However, we’re just getting started in providing content and examples to show the capabilities of MAX for GPU programming.

What we’ve shown so far is how we at Modular use MAX for accelerating AI models and other large computational workloads on GPUs. From our experience with different frameworks, we strongly believe that representing large-scale models as computational graphs can provide the best opportunities for optimizing these workloads. As a result, we’ve built a very capable graph compiler that can take a graph that you’ve provided, find opportunities for optimization (fusing kernels, etc.), and run that on an accelerator. These graphs are designed to handle large data structures, like the massive number of weights in a neural network, and efficiently parallelize calculations across a range of hardware (CPUs, GPUs, and more).

Really complex structures (like the entire Llama family of large language models) can be built from graphs of simpler operations, such as adding tensors or multiplying matrices. We’ve provided APIs in Python and Mojo for assembling computational graphs using basic building blocks. Try out a tutorial of building a simple graph that can run on a GPU here.

The nodes in any MAX Graph are composed of operations written in Mojo. Prior to our latest releases, all of those operations had been written in-house by Modular. However, starting with the 25.1 release (and the nightlies that had preceded that), anyone can now write their own custom operations in Mojo in the same fashion as Modular developers. This is another level at which you can program GPUs in MAX.

As a first step in unveiling this GPU programming model, we’ve released a series of examples of custom operations written in Mojo and we add to them regularly in the MAX nightly releases. To start to explain how they work, we have an initial tutorial about how to write a custom operation. We’re also filling out our public API documentation on the types and functions you may see in those examples, such as the gpu module and the @compiler.register attribute for registering operations.

You can take each of those examples, run them on a MAX-supported GPU (and also locally on CPU) and start to pick them apart to see how they tick. The API docs linked above start to explain some of the Mojo code that targets the GPU in those examples, but we’ll be expanding upon those explanations.

For an additional outside-of-AI example, I have a long-standing fascination with accelerated image processing, and have been exploring some of our GPU acceleration concepts as part of the MAX-CV framework.

Also, for background on the current GPU programming landscape and how we got here, Chris is in the process of publishing a series on democratizing AI computing that I highly recommend.

Again, we have a lot more in the plans for documentation, tutorials, and examples to make it that much easier to get started with GPU programming. If there are specific areas where we could fill things in, or that you’re struggling with, please let us know and we can try to focus on those areas as we’re assembling this content.

mjadams · February 22, 2025, 2:39pm

I found this post, Getting started with CUDA graphs, useful for getting an overview of “graph compilation.”

My takeaway is that to fully use the massively parallel processing a GPU can do, you describe and bundle together the series of operations (kernels, which are functions that take one or more arrays, matrices, or tensors as input and return an output), described as a directed acyclic graph of inputs to output. The memory layout structure to support the entire graph of operations is set up, then reused as the data is loaded in batches onto the GPU.

Topic		Replies	Views
Examples of programming GPU functions using the Mojo MAX Driver API MAX discussion , gpu , 25_1	5	332	April 26, 2025
Examples of custom CPU / GPU operations in Mojo MAX discussion , 24_6	28	1073	April 9, 2025
Looking for examples of mulit-gpu usage with Mojo GPU Programming gpu	3	207	April 4, 2025
New GPU programming recipes GPU Programming gpu , modular-content	0	179	March 14, 2025
Support for Turing Architecture? MAX	9	191	May 4, 2025

Resources for learning MAX for non-ML developers

Related topics