We have a ton of stuff launching with MAX in 24.6, including a preview of GPU support. All of those new additions have great new docs and tutorials (for example, how to serve the Llama 3.1 model on GPUs at scale).
One undocumented and experimental capability that we’re sharing with the community is very early support for writing custom GPU operations in Mojo. We currently have two examples of this in the nightly branch of the MAX GitHub repository under examples/custom_ops: a very basic “hello world” sample that adds 1 to each element of a tensor, and a kernel that calculates the number of iterations to escape in the Mandelbrot set. Both use a Mojo API for defining custom operations, then they show how to construct simple computational graphs in Python and use these operations within them.
I’ll caution that these examples are very much subject to change or breakage as we work towards our next stable release, and have little to no documentation, but we wanted to provide an early preview of this capability to the MAX community. As people have already started to try out these GPU programming examples, I felt it would be useful to start a thread to discuss them and to gather issues and requests.