In the latest MAX nightly, we’ve added a new group of GPU programming examples: how to write Mojo functions that run on a GPU via the MAX Driver API. These show a programming model that may be very familiar to CUDA C programmers, defining and dispatching GPU functions within a single Mojo file. In fact, the initial examples recreate the first three CUDA samples from the popular textbook “Programming Massively Parallel Processors” in MAX to show how basic concepts translate from CUDA. And we threw in calculating the Mandelbrot set, because that’s just fun.
These examples do require a MAX-compatible GPU to build and run.
We’ve also published the API documentation for the Mojo max.driver
module. One caution is that we do anticipate that the Mojo MAX Driver API will evolve before the next stable release. We also will be updating the API docs, so there may be some missing items right now. Treat these as experimental examples and interfaces, subject to rapid change.
Previously, we’d released examples that demonstrated how to program custom MAX Graph operations using Mojo. This is still the path that we use at Modular for building the nodes in complex computational graphs, like for AI models, and what we recommend for large-scale applications. The MAX graph compiler is extremely powerful and can optimize data-parallel code in ways that aren’t accessible when building outside of a graph.
These examples show the flexibility of the MAX GPU programming model, from single-file eager execution of GPU functions all the way to complex graphs of operations in a large language model. Working directly with the MAX Driver API can provide a great on-ramp for rapid prototyping of GPU code in Mojo that can then be placed inside a larger computational graph when ready. We’ll expand the examples in the near future to better illustrate this particular GPU development journey.
Try out these examples today using the latest MAX nightly and we’d love to hear your thoughts or questions about them!