[Experimental] Examples of custom CPU / GPU operations in Mojo

We have a ton of stuff launching with MAX in 24.6, including a preview of GPU support. All of those new additions have great new docs and tutorials (for example, how to serve the Llama 3.1 model on GPUs at scale).

One undocumented and experimental capability that we’re sharing with the community is very early support for writing custom GPU operations in Mojo. We currently have two examples of this in the nightly branch of the MAX GitHub repository under examples/custom_ops: a very basic “hello world” sample that adds 1 to each element of a tensor, and a kernel that calculates the number of iterations to escape in the Mandelbrot set. Both use a Mojo API for defining custom operations, then they show how to construct simple computational graphs in Python and use these operations within them.

I’ll caution that these examples are very much subject to change or breakage as we work towards our next stable release, and have little to no documentation, but we wanted to provide an early preview of this capability to the MAX community. As people have already started to try out these GPU programming examples, I felt it would be useful to start a thread to discuss them and to gather issues and requests.

3 Likes

One issue that was surfaced in a Discord discussion, and that I wanted to document here for reference: to use custom Mojo operations using our in-development extensibility API, you need to make sure the environment variable MODULAR_ONLY_USE_NEW_EXTENSIBILITY_API is set to true. This is done for you as part of the Magic invocation:

[tasks]
addition = { cmd="mojo package kernels/ -o kernels.mojopkg && python addition.py", env={ MODULAR_ONLY_USE_NEW_EXTENSIBILITY_API="true" }}

The approximate sequence of commands that would be used to build and run one of these examples if you were doing this at the shell level would be:

export MODULAR_ONLY_USE_NEW_EXTENSIBILITY_API=true
mojo package kernels/ -o kernels.mojopkg
python addition.py

This activates the new extensibility API, builds the Mojo custom ops into a package, and then uses the ops within a new graph on the GPU or CPU.

3 Likes