A new nightly version has been released! ![]()
See the quickstart guide for installation instructions: Quickstart | Modular
MAX changelog updates:
- Fixed Wan 2.1 / 2.2 video diffusion pipelines silently running without
classifier-free guidance. The tokenizer gated negative-prompt tokenization
ontrue_cfg_scale > 1.0(default1.0), so negative tokens were never
produced and the executor fell back to unguided generation even when
guidance_scale > 1.0and a negative prompt were supplied. Wan now enables
classical CFG wheneverguidance_scale > 1.0and defaults an absent
negative prompt to the empty string, matching the diffusers baseline. max.experimental.Tensoris now distribution-aware: it carries a
tuple of per-shard storages,driver.Buffers (realized) or graph
values (TensorValue/BufferValue, unrealized), paired with a
DeviceMappingthat maps those local shards onto the
DeviceMesh.- Reworked
max.experimental.functionalfrom a singlefunctional.py
into afunctional/package, a new distribution-and mesh-aware
dispatch layer on top of the graph-compiler Python API, split cleanly
into three op categories:creation_ops(tensor factories),spmd_ops
(rule-based per-op SPMD dispatch), andcollective_ops
(allreduce_sum,allgather,reduce_scatteretc., now applied per
device-group along a chosen mesh axis so they dispatch correctly on
multi-dimensional meshes, plus atransfer_toconvenience op
betweenDeviceMappings). - Added
max.experimental.shardingwith the core types for distributed
tensors (DeviceMesh;DeviceMappingwithPlacementMappingand
NamedMapping; placement primitivesReplicated/Sharded/
Partial;DistributedTensorType/DistributedBufferType;
TensorLayout), plus asharding.rulessubmodule of pure
mapping-propagation rules (elementwise, matmul, reduction, shape,
conv, pooling) that, for each op, either error out or reshard inputs
to the proposedDeviceMappings and derive the resulting output
DeviceMapping. max.experimental.nn.Module.compile()now accepts
DistributedTensorTypesymbolic inputs (not justTensorType), so
distributed models can be built via the graph-compilation path in
addition to running eagerly;gemma3_modulev3is the first multi-GPU
model wired up. DTensor support in MAX is still ongoing work and
these APIs may evolve.
Mojo changelog updates:
- [stdlib] feat: Add
forget_deinit()wrapper aroundlit.ownership.mark_destroyed - [mojo-lang] Make
xhh/ooodenote code points
Raw MAX diff: https://github.com/modular/modular/compare/6c6ae42721413f0c5c33a2cdba7f2f5e23d5a846...9f74eec755adc00adb08e377e1dff8e3b0523482)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/nightly-changelog.md