MAX Nightly 26.3.0.dev2026042405 (Mojo 1.0.0b1.dev2026042405) Released

Modular · April 24, 2026, 6:57am

A new nightly version has been released!

See the quickstart guide for installation instructions: Quickstart | Modular

MAX changelog updates:

Fixed Wan 2.1 / 2.2 video diffusion pipelines silently running without
classifier-free guidance. The tokenizer gated negative-prompt tokenization
on true_cfg_scale > 1.0 (default 1.0), so negative tokens were never
produced and the executor fell back to unguided generation even when
guidance_scale > 1.0 and a negative prompt were supplied. Wan now enables
classical CFG whenever guidance_scale > 1.0 and defaults an absent
negative prompt to the empty string, matching the diffusers baseline.
max.experimental.Tensor is now distribution-aware: it carries a
tuple of per-shard storages, driver.Buffers (realized) or graph
values (TensorValue / BufferValue, unrealized), paired with a
DeviceMapping that maps those local shards onto the
DeviceMesh.
Reworked max.experimental.functional from a single functional.py
into a functional/ package, a new distribution-and mesh-aware
dispatch layer on top of the graph-compiler Python API, split cleanly
into three op categories: creation_ops (tensor factories), spmd_ops
(rule-based per-op SPMD dispatch), and collective_ops
(allreduce_sum, allgather, reduce_scatter etc., now applied per
device-group along a chosen mesh axis so they dispatch correctly on
multi-dimensional meshes, plus a transfer_to convenience op
between DeviceMappings).
Added max.experimental.sharding with the core types for distributed
tensors (DeviceMesh; DeviceMapping with PlacementMapping and
NamedMapping; placement primitives Replicated / Sharded /
Partial; DistributedTensorType / DistributedBufferType;
TensorLayout), plus a sharding.rules submodule of pure
mapping-propagation rules (elementwise, matmul, reduction, shape,
conv, pooling) that, for each op, either error out or reshard inputs
to the proposed DeviceMappings and derive the resulting output
DeviceMapping.
max.experimental.nn.Module.compile() now accepts
DistributedTensorType symbolic inputs (not just TensorType), so
distributed models can be built via the graph-compilation path in
addition to running eagerly; gemma3_modulev3 is the first multi-GPU
model wired up. DTensor support in MAX is still ongoing work and
these APIs may evolve.

Mojo changelog updates:

[stdlib] feat: Add forget_deinit() wrapper around lit.ownership.mark_destroyed
[mojo-lang] Make xhh/ooo denote code points

Raw MAX diff: https://github.com/modular/modular/compare/6c6ae42721413f0c5c33a2cdba7f2f5e23d5a846...9f74eec755adc00adb08e377e1dff8e3b0523482)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/nightly-changelog.md

Topic	Replies	Views
MAX Nightly 26.3.0.dev2026040105 Released Nightly	24	April 1, 2026
MAX Nightly 25.6.0.dev2025091017 Released Nightly	22	September 10, 2025
MAX Nightly 26.1.0.dev2026011005 Released Nightly	35	January 10, 2026
MAX Nightly 26.1.0.dev2026010517 Released Nightly	16	January 5, 2026
MAX Nightly 26.3.0.dev2026042817 (Mojo 1.0.0b1.dev2026042817) Released Nightly	28	April 28, 2026

MAX Nightly 26.3.0.dev2026042405 (Mojo 1.0.0b1.dev2026042405) Released

Related topics