MAX 25.2: Unleash the power of your H200s without CUDA!

Modular · March 25, 2025, 5:58pm

Building on last month’s release, we’re excited to unveil our next round of improvements! MAX 25.2 delivers industry-leading performance on NVIDIA GPUs, built from the ground up without CUDA, to power faster, more responsive, and more customizable GenAI deployments at scale.

25.2 includes state of the art H100 and H200 performance and support for more than 500 GenAI models. Multi-GPU support enables you to run large LLMs seamlessly, and enhanced LLM serving means that MAX is now 12% faster than vLLM 0.8 on Sonnet benchmarks.

Ultra-slim containers for rapid deployment help you deploy to prod faster and GPU programming with Mojo:fire: is unlocked – write custom, high-performance GPU code in Mojo, directly accessing NVIDIA GPUs.

Finally, thanks to advanced features like GPTQ quantization, the biggest models run efficiently. Dive into what’s new with 25.2 and get started today: Modular: MAX 25.2: Unleash the power of your H200's–without CUDA!

Topic	Replies	Views
Modular: MAX 25.2: Unleash the power of your H200's–without CUDA! Content blog	68	March 25, 2025
Modular: Modverse #47: MAX 25.2 and an evening of GPU programming at Modular HQ Content blog	44	April 17, 2025
Beyond CUDA: Accelerating GenAI with MAX and Mojo (Chris Lattner's lightning talk at GTC 2025) Content youtube	85	April 1, 2025
It's here: MAX 24.6 and MAX GPU! :rocket: Official Announcements	214	December 17, 2024
Modular: Modular 25.4: One Container, AMD and NVIDIA GPUs, No Lock-In Content blog	61	June 18, 2025

MAX 25.2: Unleash the power of your H200s without CUDA!

Related topics