First-class autodiff/training in MAX: timeline, and supersede-or-compose with community autograd libraries?

Gles · June 4, 2026, 5:29pm

Hi there.

I’m considering investing serious time in a pure-Mojo autograd / tensor-NN library, and before I commit I’d like to align with Modular’s direction rather than guess at it.

Here’s what I’ve found so far (happy to be corrected). MAX today is inference-focused, and the public signals on training/autodiff are:

- Chris Lattner, on the Latent Space “The Shape of Compute” episode (2025-06-13): “We’re not focused on solving training yet. Maybe we’ll get there.”

- In the 2023 autodiff discussion ([Discussion #188]( Automatic Differentiation in Mojo · modular/modular · Discussion #188 · GitHub )), a Modular team response (+1’d by Lattner) noted that tape-based autograd “just works” without language support, and that the intended longer-term vehicle is metaprogramming, with the intuition that backward kernels could eventually be auto-generated from forward kernels.

Two questions:

1. Timeline / intent: Is first-class training/autodiff anywhere on the roadmap horizon, or is it strictly demand-gated behind inference adoption for now? I’m not asking for a commitment, just whether to plan around “community-owned for the foreseeable future” versus “first-party is coming.”

2. Supersede vs compose: If/when MAX gains compiler-level AD (the “auto-backward-from-forward” idea), is the intent for it to be exposed so a third-party pure-Mojo autograd frontend could lower to / delegate to it (compose on top), or would it more likely replace third-party autograd libraries? This is what decides whether the right move is to build on MAX from the start versus build a standalone engine.

For context, the community training-on-MAX efforts I’m aware of (Nabla, max_training) are Python frontends; I’m specifically interested in a pure-Mojo frontend, which appears unoccupied.

Either direction is useful to know. I’d just rather build with the platform than against it. Thanks for any pointers, and for the work on Mojo/MAX.

Ehsan · June 5, 2026, 2:23am

We definitely encourage such a community project. As said, MAX focus is on inference now. We’ve made great progress in providing a unified distributed-aware tensor for distributed inference which is under max.experimental.tensor | Modular. Once we’re happy with its status and is graduated out of experimental then we’ll have plans for training support but first things first and we need to make sure distributed inference is done correctly as training will be built on top of it.

Topic		Replies	Views
Getting started with deep learning Mojo discussion , 24_6	10	1540	January 9, 2025
Micrograd implementation in mojo Community Showcase docs	1	273	February 24, 2025
Metaprogramming with Python in Mojo Mojo discussion	8	841	May 26, 2025
I burned some Codex quota testing MAX training MAX	1	138	May 13, 2026
Porting various models to MAX MAX	5	268	May 1, 2025

First-class autodiff/training in MAX: timeline, and supersede-or-compose with community autograd libraries?

Related topics