A new nightly version has been released! ![]()
See the quickstart guide for installation instructions: Quickstart | Modular
MAX changelog updates:
-
Fixed slow
axis=Nonereductions (mean,sum,prod,max,min) in
max.experimental.functional. The previous implementation flattened the
tensor before reducing, serializing the work onto a single GPU block.
Reductions now iterate axis-by-axis to preserve parallelism. -
Added experimental
max.experimental.distributedmodule withDTensor,
DeviceMesh, and placement types (Replicated,Sharded,Partial) for
expressing how tensors are distributed across multiple devices. Op dispatch
is not yet supported. -
max/python/max/benchmark/benchmark_throughput.pyhas been deprecated and
will be removed in a future MAX release. -
Added GPU kernel examples from the Programming Massively Parallel Processors
(PMPP) textbook covering reductions, scans, histograms, sorting, sparse
matrix operations, graph algorithms, convolutions, FlashAttention, and more.
Mojo changelog updates:
- [stdlib] Add
IterableOwnedtrait. - [Docs] Move rebased changelog entries from nightly to v0.26.2
- [stdlib] Relax
external_callreturn type toRegisterPassable(#80375)
Raw MAX diff: https://github.com/modular/modular/compare/9bca948a3d5ebebd41c9665c882810c2a8ac3215...9866abc4395c6b182110ddbe21b250eb7df78eec)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/nightly-changelog.md