MAX Nightly 26.1.0.dev2026012217 Released

:astronaut: A new nightly version has been released! :astronaut:

See the quickstart guide for installation instructions: Quickstart | Modular

MAX changelog updates:

  • Gemma3 now supports vision input (multimodal) in the 12B and 27B variants.

  • All Python wheel URLs are now https://whl.modular.com/nightly/simple/.
    If using uv, change --index-url to --index, and if using pip, change to
    --extra-index-url. For precise commands, see the
    install guide.

  • Improved scheduling to achieve higher KVCache utilization and batch sizes. By
    default, MAX now schedules a context encoding (CE) request only if KVCache
    memory is less than 95% full after allocating blocks for that request or if
    no active requests exist. You can adjust this watermark value (0.95) with
    --kvcache-ce-watermark.
    Beware that increasing it causes more preemptions.

  • Server stats collection (collect_server_stats) is now enabled by default for
    serving benchmarks.

  • Added tracing flags to benchmark_serving.py for nsys profiling:

    • --trace: Enable tracing of the benchmark run (currently NVIDIA GPUs only)
    • --trace-file: Path to save the trace file
    • --trace-session: Optional session name for tracing
  • MAX tensor operations are now eager by default.

  • accelerator_count() now returns a non-zero value when called on an Apple
    silicon system. This means that the common pattern of

    device = CPU() if accelerator_count() == 0 else Accelerator()
    

    will default to using the available Apple silicon GPU, and as a consequence
    MAX graphs should in most cases be dispatched to run on Apple silicon GPUs.
    Note that most MAX models do not yet work on Apple silicon GPUs due to
    missing hardware-specific kernel pathways and other support, but this is an
    important step towards enabling MAX more broadly on Apple silicon GPUs.

  • Added max.nn.module_v3.rope containing rotary embedding implementations

  • Added ops.complex.mul for multiplying complex-valued tensors

  • Renamed prefill_chunk_size to
    max_batch_input_tokens
    and max_batch_context_length to
    max_batch_total_tokens
    in PipelineConfig
    and TTSConfig classes to better reflect their purpose in batch memory
    management. The corresponding CLI flags have also been renamed:
    --prefill-chunk-size is now --max-batch-input-tokens and
    --max-batch-context-length is now --max-batch-total-tokens.

  • Fixed max.driver.Buffer.to(stream) to not copy (it return reference to

Mojo changelog updates:

  • [stdlib] Add offset_of to Mojo reflection API
  • [kernels] Remove implicit copyability from Layout (#75841)
  • [docs] Consolidate reflection changelog entries
  • [stdlib] Replace UInt with Scalar[DType.uint]
  • [KGEN] Fix rebind of downcasted reflected field types
  • [Stdlib] support stripping multi-byte characters
  • [stdlib] Add black_box to benchmarking utilities.

Raw MAX diff: https://github.com/modular/modular/compare/4e22d270976d0e7e92b7cd6764cbdebe828c5a35...6699f1a52bd5b01aa16aee5a087f9f7be0d734d0)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/changelog.md

https://forum.modular.com/max/packages#install - link doesn’t exist or is private - thanks :slight_smile: