Modular Platform 25.5: Introducing Large Scale Batch Inference

Modular · August 5, 2025, 5:35pm

Modular Platform 25.5 is built for developers who need performance at scale!

We’re introducing Large Scale Batch Inference, a high-throughput, OpenAI-compatible API powered by Mammoth. It runs on NVIDIA and AMD hardware, and is already live in production through our partner San Francisco Compute Company.

This release also brings major updates across the platform:
• Mojo now ships as standalone Conda packages for easier development and deployment.
• The MAX Graph API is fully open source, including unit tests and reference code for building portable, GPU-accelerated graphs in Python.
• MAX graphs can now be used directly in PyTorch workflows with the new @graph_op decorator.

Other highlights include smaller and faster MAX serving containers, performance improvements across MAX, and expanded Mojo language features.

Explore the full release: Modular: Modular Platform 25.5: Introducing Large Scale Batch Inference

Topic		Replies	Views
Modular: Modular Platform 25.5: Introducing Large Scale Batch Inference Content blog	0	35	August 5, 2025
Modular: Modular Platform 25.3: 450K+ Lines of Open Source Code and pip Packaging Content blog	0	58	May 6, 2025
Modular: MAX 25.2: Unleash the power of your H200's–without CUDA! Content blog	0	60	March 25, 2025
Modular Platform 25.3 is live — our biggest open source release yet! 🎉 Official Announcements	1	194	May 6, 2025
MAX Nightly 25.4.0.dev2025051505 Released Nightly	0	38	May 15, 2025

Modular Platform 25.5: Introducing Large Scale Batch Inference

Related topics