MAX Nightly 26.3.0.dev2026031822 Released

:astronaut: A new nightly version has been released! :astronaut:

See the quickstart guide for installation instructions: Quickstart | Modular

MAX changelog updates:

  • Fixed slow axis=None reductions (mean, sum, prod, max, min) in
    max.experimental.functional. The previous implementation flattened the
    tensor before reducing, serializing the work onto a single GPU block.
    Reductions now iterate axis-by-axis to preserve parallelism.

  • Added experimental max.experimental.distributed module with DTensor,
    DeviceMesh, and placement types (Replicated, Sharded, Partial) for
    expressing how tensors are distributed across multiple devices. Op dispatch
    is not yet supported.

  • max/python/max/benchmark/benchmark_throughput.py has been deprecated and
    will be removed in a future MAX release.

  • Added GPU kernel examples from the Programming Massively Parallel Processors
    (PMPP) textbook covering reductions, scans, histograms, sorting, sparse
    matrix operations, graph algorithms, convolutions, FlashAttention, and more.

Mojo changelog updates:

  • [stdlib] Add IterableOwned trait.
  • [Docs] Move rebased changelog entries from nightly to v0.26.2
  • [stdlib] Relax external_call return type to RegisterPassable (#80375)

Raw MAX diff: https://github.com/modular/modular/compare/9bca948a3d5ebebd41c9665c882810c2a8ac3215...9866abc4395c6b182110ddbe21b250eb7df78eec)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/nightly-changelog.md