Three years ago, we began reimagining AI development by rebuilding its infrastructure to be more performant, programmable, and portable. Today, we’re introducing MAX 24.6, featuring MAX GPU—a technology preview of the first vertically integrated generative AI serving stack that eliminates dependency on vendor-specific libraries like NVIDIA CUDA. MAX GPU is built on two groundbreaking technologies:
- MAX Engine: A high-performance AI model compiler and runtime supporting vendor-agnostic Mojo GPU kernels for NVIDIA GPUs.
- MAX Serve: A Python-native serving layer engineered for LLMs, handling complex request batching and scheduling for reliable performance under heavy workloads.
Run magic self-update
to get started and check out the technology preview of MAX GPU.
Don’t miss:
Our release announcement, including hints about what’s in store for 2025
Our tutorial on building a continuous chat app using Llama 3.1 and MAX Serve
Our benchmarking deep dive to help you understand how MAX Serve stacks up
Drop your thoughts, questions, and hype in the official MAX 24.6 forum thread: MAX 24.6 and MAX GPU feedback.