Examples of custom CPU / GPU operations in Mojo

BradLarson · January 25, 2025, 3:41pm

Note: we’re working on exposing a nightly changelog for MAX in the same way as has been done for Mojo. When that is live, you’ll be able to read these feature updates within the MAX changelog. Stay tuned!

The latest MAX nightly adds a lot of new content for GPU programming in MAX:

As a source-breaking change, ManagedTensorSlice and foreach have moved from the tensor_utils module to max.tensor.
Two new custom operation programming examples have been added that show more of the power that Mojo provides for programming GPUs:
- The vector_addition example demonstrates how to write device-specific codepaths, as well as how to manually dispatch Mojo functions on the GPU within a custom operation. This mode of programming may be much more familiar to those used to CUDA C programming. Note that the foreach abstraction performs elementwise calculations far more efficiently than the manual functions here, due to its hardware optimization. This is merely an instructive example.
- The top_k example shows a practical use case for a custom operation that is used today within large language model graphs: a top-K token sampler. The Mojo code contains a much more complex calculation, as well as how to construct a custom shape function for the operation. The Python-side code also hosts a showcase for how such an operation is used in practice.
The synchronous parameter has been removed from the interfaces in the custom operation examples. It was only useful in a few cases, and we’re evaluating removing it overall. That simplifies the overall operation interface.
Once the nightly docs update (the team is working hard on deploying these right now), initial API docs will appear for the gpu and layout modules and their dependencies.

Topic		Replies	Views
Examples of programming GPU functions using the Mojo MAX Driver API MAX discussion , gpu , 25_1	6	623	October 23, 2025
Initial support for writing PyTorch custom ops in Mojo Python Interop gpu	1	407	July 17, 2025
New GPU programming recipes GPU Programming gpu , modular-content	0	237	March 14, 2025
Resources for learning MAX for non-ML developers MAX discussion , gpu , docs , 25_1	3	285	August 21, 2025
Porting various models to MAX MAX	6	257	May 8, 2025

Examples of custom CPU / GPU operations in Mojo

Related topics