MAX 25.2: Unleash the power of your H200s without CUDA!

Building on last month’s release, we’re excited to unveil our next round of improvements! :flexed_biceps: MAX 25.2 delivers industry-leading performance on NVIDIA GPUs, built from the ground up without CUDA, to power faster, more responsive, and more customizable GenAI deployments at scale.

25.2 includes state of the art H100 and H200 performance and support for more than 500 GenAI models. Multi-GPU support enables you to run large LLMs seamlessly, and enhanced LLM serving means that MAX is now 12% faster than vLLM 0.8 on Sonnet benchmarks.

Ultra-slim containers for rapid deployment help you deploy to prod faster :person_running: and GPU programming with Mojo​:fire: is unlocked – write custom, high-performance GPU code in Mojo, directly accessing NVIDIA GPUs.

Finally, thanks to advanced features like GPTQ quantization, the biggest models run efficiently. Dive into what’s new with 25.2 and get started today: Modular: MAX 25.2: Unleash the power of your H200's–without CUDA!

2 Likes