MAX Nightly 26.3.0.dev2026042405 (Mojo 1.0.0b1.dev2026042405) Released

:astronaut: A new nightly version has been released! :astronaut:

See the quickstart guide for installation instructions: Quickstart | Modular

MAX changelog updates:

  • Fixed Wan 2.1 / 2.2 video diffusion pipelines silently running without
    classifier-free guidance. The tokenizer gated negative-prompt tokenization
    on true_cfg_scale > 1.0 (default 1.0), so negative tokens were never
    produced and the executor fell back to unguided generation even when
    guidance_scale > 1.0 and a negative prompt were supplied. Wan now enables
    classical CFG whenever guidance_scale > 1.0 and defaults an absent
    negative prompt to the empty string, matching the diffusers baseline.
  • max.experimental.Tensor is now distribution-aware: it carries a
    tuple of per-shard storages, driver.Buffers (realized) or graph
    values (TensorValue / BufferValue, unrealized), paired with a
    DeviceMapping that maps those local shards onto the
    DeviceMesh.
  • Reworked max.experimental.functional from a single functional.py
    into a functional/ package, a new distribution-and mesh-aware
    dispatch layer on top of the graph-compiler Python API, split cleanly
    into three op categories: creation_ops (tensor factories), spmd_ops
    (rule-based per-op SPMD dispatch), and collective_ops
    (allreduce_sum, allgather, reduce_scatter etc., now applied per
    device-group along a chosen mesh axis so they dispatch correctly on
    multi-dimensional meshes, plus a transfer_to convenience op
    between DeviceMappings).
  • Added max.experimental.sharding with the core types for distributed
    tensors (DeviceMesh; DeviceMapping with PlacementMapping and
    NamedMapping; placement primitives Replicated / Sharded /
    Partial; DistributedTensorType / DistributedBufferType;
    TensorLayout), plus a sharding.rules submodule of pure
    mapping-propagation rules (elementwise, matmul, reduction, shape,
    conv, pooling) that, for each op, either error out or reshard inputs
    to the proposed DeviceMappings and derive the resulting output
    DeviceMapping.
  • max.experimental.nn.Module.compile() now accepts
    DistributedTensorType symbolic inputs (not just TensorType), so
    distributed models can be built via the graph-compilation path in
    addition to running eagerly; gemma3_modulev3 is the first multi-GPU
    model wired up. DTensor support in MAX is still ongoing work and
    these APIs may evolve.

Mojo changelog updates:

  • [stdlib] feat: Add forget_deinit() wrapper around lit.ownership.mark_destroyed
  • [mojo-lang] Make xhh/ooo denote code points

Raw MAX diff: https://github.com/modular/modular/compare/6c6ae42721413f0c5c33a2cdba7f2f5e23d5a846...9f74eec755adc00adb08e377e1dff8e3b0523482)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/nightly-changelog.md