MAX Nightly 26.1.0.dev2026012217 Released

Modular · January 22, 2026, 6:16pm

A new nightly version has been released!

See the quickstart guide for installation instructions: Quickstart | Modular

MAX changelog updates:

Gemma3 now supports vision input (multimodal) in the 12B and 27B variants.
All Python wheel URLs are now https://whl.modular.com/nightly/simple/.
If using uv, change --index-url to --index, and if using pip, change to
--extra-index-url. For precise commands, see the
install guide.
Improved scheduling to achieve higher KVCache utilization and batch sizes. By
default, MAX now schedules a context encoding (CE) request only if KVCache
memory is less than 95% full after allocating blocks for that request or if
no active requests exist. You can adjust this watermark value (0.95) with
--kvcache-ce-watermark.
Beware that increasing it causes more preemptions.
Server stats collection (collect_server_stats) is now enabled by default for
serving benchmarks.
Added tracing flags to benchmark_serving.py for nsys profiling:
- --trace: Enable tracing of the benchmark run (currently NVIDIA GPUs only)
- --trace-file: Path to save the trace file
- --trace-session: Optional session name for tracing
MAX tensor operations are now eager by default.
accelerator_count() now returns a non-zero value when called on an Apple
silicon system. This means that the common pattern of
```
device = CPU() if accelerator_count() == 0 else Accelerator()
```
will default to using the available Apple silicon GPU, and as a consequence
MAX graphs should in most cases be dispatched to run on Apple silicon GPUs.
Note that most MAX models do not yet work on Apple silicon GPUs due to
missing hardware-specific kernel pathways and other support, but this is an
important step towards enabling MAX more broadly on Apple silicon GPUs.
Added max.nn.module_v3.rope containing rotary embedding implementations
Added ops.complex.mul for multiplying complex-valued tensors
Renamed prefill_chunk_size to
max_batch_input_tokens
and max_batch_context_length to
max_batch_total_tokens
in PipelineConfig
and TTSConfig classes to better reflect their purpose in batch memory
management. The corresponding CLI flags have also been renamed:
--prefill-chunk-size is now --max-batch-input-tokens and
--max-batch-context-length is now --max-batch-total-tokens.
Fixed max.driver.Buffer.to(stream) to not copy (it return reference to

Mojo changelog updates:

[stdlib] Add offset_of to Mojo reflection API
[kernels] Remove implicit copyability from Layout (#75841)
[docs] Consolidate reflection changelog entries
[stdlib] Replace UInt with Scalar[DType.uint]
[KGEN] Fix rebind of downcasted reflected field types
[Stdlib] support stripping multi-byte characters
[stdlib] Add black_box to benchmarking utilities.

Raw MAX diff: https://github.com/modular/modular/compare/4e22d270976d0e7e92b7cd6764cbdebe828c5a35...6699f1a52bd5b01aa16aee5a087f9f7be0d734d0)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/changelog.md

mjboothaus · January 23, 2026, 1:39am

https://forum.modular.com/max/packages#install - link doesn’t exist or is private - thanks

Topic	Replies	Views
MAX Nightly 26.2.0.dev2026020616 Released Nightly	15	February 6, 2026
MAX Nightly 26.2.0.dev2026020706 Released Nightly	20	February 7, 2026
MAX Nightly 26.2.0.dev2026012305 Released Nightly	32	January 23, 2026
MAX Nightly 26.1.0.dev2026010517 Released Nightly	15	January 5, 2026
MAX Nightly 26.2.0.dev2026021105 Released Nightly	21	February 11, 2026

MAX Nightly 26.1.0.dev2026012217 Released

Related topics