MAX Nightly 26.3.0.dev2026040205 Released

:astronaut: A new nightly version has been released! :astronaut:

See the quickstart guide for installation instructions: Quickstart | Modular

MAX changelog updates:

  • Added DevicePlacementPolicy (Ignore, Warn, Error) to Graph to
    control behavior when CPU-only ops (ops.scatter, ops.cumsum,
    ops.nonzero, ops.tile) receive GPU tensors. The default (Warn) emits a
    UserWarning and falls back to CPU; Error raises ValueError instead.
    ops.cond and ops.while_loop always raise ValueError for GPU predicates.
  • Added scatter_nd op handler to the experimental eager interpreter (CPU
    and GPU), scattering slices from updates into input at N-dimensional index
    positions via max.experimental.functional.scatter_nd.
  • Added scatter_nd_add op handler to the experimental eager interpreter
    (CPU), accumulating slices from updates into input at N-dimensional index
    positions and summing duplicate indices via
    max.experimental.functional.scatter_nd_add.
  • Added bottom_k op handler to the experimental eager interpreter with CPU
    and GPU support, returning the k smallest values and their original indices
    along a specified axis via max.experimental.functional.bottom_k.
  • Added nonzero op handler to the experimental eager interpreter (CPU),
    returning the row-major coordinates of all nonzero elements as a
    [nnz, rank] int64 tensor via max.experimental.functional.nonzero.
  • Added scatter_add op handler to the experimental eager interpreter (CPU),
    accumulating updates into a copy of input at indices along axis
    and summing duplicate indices via max.experimental.functional.scatter_add.
  • max.graph.ops.pad (and max.graph.experimental.functional.pad) now
    accepts mode=reflect and mode=edge in addition to
    mode=constant.
  • Added pad op handlers (pad.constant, pad.reflect, pad.repeat) to
    the experimental eager interpreter. pad.constant supports CPU and GPU;
    pad.reflect and pad.repeat (edge padding) run on CPU.
  • Added max.graph.ops.resize_linear for linear (bilinear) interpolation
    resizing with configurable coordinate_transform_mode (half_pixel,
    align_corners, asymmetric, half_pixel_1D) and optional antialias
    downscaling support; max.graph.ops.resize now supports
    InterpolationMode.BILINEAR by delegating to resize_linear.
  • Added resize_linear op handler to the experimental eager interpreter
    (CPU) via max.experimental.functional.resize_linear.
  • Optimized GPU topk stage-1 kernel with a per-thread register heap that
    caches the top-8 elements during a single scan pass, eliminating redundant
    global memory re-reads for the first 8 extraction iterations.

Mojo changelog updates:

Raw MAX diff: https://github.com/modular/modular/compare/cc314486029d15bda9df356307f770785648c523...5de6438bf5aae931acfafb86bffe70c8ba1bd5f3)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/nightly-changelog.md