Hi there.
I’m considering investing serious time in a pure-Mojo autograd / tensor-NN library, and before I commit I’d like to align with Modular’s direction rather than guess at it.
Here’s what I’ve found so far (happy to be corrected). MAX today is inference-focused, and the public signals on training/autodiff are:
- Chris Lattner, on the Latent Space “The Shape of Compute” episode (2025-06-13): “We’re not focused on solving training yet. Maybe we’ll get there.”
- In the 2023 autodiff discussion ([Discussion #188]( Automatic Differentiation in Mojo · modular/modular · Discussion #188 · GitHub )), a Modular team response (+1’d by Lattner) noted that tape-based autograd “just works” without language support, and that the intended longer-term vehicle is metaprogramming, with the intuition that backward kernels could eventually be auto-generated from forward kernels.
Two questions:
1. Timeline / intent: Is first-class training/autodiff anywhere on the roadmap horizon, or is it strictly demand-gated behind inference adoption for now? I’m not asking for a commitment, just whether to plan around “community-owned for the foreseeable future” versus “first-party is coming.”
2. Supersede vs compose: If/when MAX gains compiler-level AD (the “auto-backward-from-forward” idea), is the intent for it to be exposed so a third-party pure-Mojo autograd frontend could lower to / delegate to it (compose on top), or would it more likely replace third-party autograd libraries? This is what decides whether the right move is to build on MAX from the start versus build a standalone engine.
For context, the community training-on-MAX efforts I’m aware of (Nabla, max_training) are Python frontends; I’m specifically interested in a pure-Mojo frontend, which appears unoccupied.
Either direction is useful to know. I’d just rather build with the platform than against it. Thanks for any pointers, and for the work on Mojo/MAX.