Note: we’re working on exposing a nightly changelog for MAX in the same way as has been done for Mojo. When that is live, you’ll be able to read these feature updates within the MAX changelog. Stay tuned!
The latest MAX nightly adds a lot of new content for GPU programming in MAX:
- As a source-breaking change,
ManagedTensorSliceandforeachhave moved from thetensor_utilsmodule tomax.tensor. - Two new custom operation programming examples have been added that show more of the power that Mojo provides for programming GPUs:
- The
vector_additionexample demonstrates how to write device-specific codepaths, as well as how to manually dispatch Mojo functions on the GPU within a custom operation. This mode of programming may be much more familiar to those used to CUDA C programming. Note that theforeachabstraction performs elementwise calculations far more efficiently than the manual functions here, due to its hardware optimization. This is merely an instructive example. - The
top_kexample shows a practical use case for a custom operation that is used today within large language model graphs: a top-K token sampler. The Mojo code contains a much more complex calculation, as well as how to construct a custom shape function for the operation. The Python-side code also hosts a showcase for how such an operation is used in practice.
- The
- The
synchronousparameter has been removed from the interfaces in the custom operation examples. It was only useful in a few cases, and we’re evaluating removing it overall. That simplifies the overall operation interface. - Once the nightly docs update (the team is working hard on deploying these right now), initial API docs will appear for the
gpuandlayoutmodules and their dependencies.