Modular Platform 25.5: Introducing Large Scale Batch Inference

Modular Platform 25.5 is built for developers who need performance at scale! :flexed_biceps:

We’re introducing Large Scale Batch Inference, a high-throughput, OpenAI-compatible API powered by Mammoth. It runs on NVIDIA and AMD hardware, and is already live in production through our partner San Francisco Compute Company.

This release also brings major updates across the platform:
• Mojo now ships as standalone Conda packages for easier development and deployment.
• The MAX Graph API is fully open source, including unit tests and reference code for building portable, GPU-accelerated graphs in Python.
• MAX graphs can now be used directly in PyTorch workflows with the new @graph_op decorator.

Other highlights include smaller and faster MAX serving containers, performance improvements across MAX, and expanded Mojo language features.

Explore the full release: Modular: Modular Platform 25.5: Introducing Large Scale Batch Inference