A new nightly version has been released! ![]()
See the quickstart guide for installation instructions: Quickstart | Modular
MAX changelog updates:
-
Gemma3 now supports vision input (multimodal) in the 12B and 27B variants.
-
All Python wheel URLs are now
https://whl.modular.com/nightly/simple/.
If usinguv, change--index-urlto--index, and if usingpip, change to
--extra-index-url. For precise commands, see the
install guide. -
Improved scheduling to achieve higher KVCache utilization and batch sizes. By
default, MAX now schedules a context encoding (CE) request only if KVCache
memory is less than 95% full after allocating blocks for that request or if
no active requests exist. You can adjust this watermark value (0.95) with
--kvcache-ce-watermark.
Beware that increasing it causes more preemptions. -
Server stats collection (
collect_server_stats) is now enabled by default for
serving benchmarks. -
Added tracing flags to
benchmark_serving.pyfor nsys profiling:--trace: Enable tracing of the benchmark run (currently NVIDIA GPUs only)--trace-file: Path to save the trace file--trace-session: Optional session name for tracing
-
MAX tensor operations are now eager by default.
-
accelerator_count()now returns a non-zero value when called on an Apple
silicon system. This means that the common pattern ofdevice = CPU() if accelerator_count() == 0 else Accelerator()will default to using the available Apple silicon GPU, and as a consequence
MAX graphs should in most cases be dispatched to run on Apple silicon GPUs.
Note that most MAX models do not yet work on Apple silicon GPUs due to
missing hardware-specific kernel pathways and other support, but this is an
important step towards enabling MAX more broadly on Apple silicon GPUs. -
Added
max.nn.module_v3.ropecontaining rotary embedding implementations -
Added
ops.complex.mulfor multiplying complex-valued tensors -
Renamed
prefill_chunk_sizeto
max_batch_input_tokens
andmax_batch_context_lengthto
max_batch_total_tokens
inPipelineConfig
andTTSConfigclasses to better reflect their purpose in batch memory
management. The corresponding CLI flags have also been renamed:
--prefill-chunk-sizeis now--max-batch-input-tokensand
--max-batch-context-lengthis now--max-batch-total-tokens. -
Fixed
max.driver.Buffer.to(stream)to not copy (it return reference to
Mojo changelog updates:
- [stdlib] Add
offset_ofto Mojo reflection API - [kernels] Remove implicit copyability from Layout (#75841)
- [docs] Consolidate reflection changelog entries
- [stdlib] Replace
UIntwithScalar[DType.uint] - [KGEN] Fix
rebindof downcasted reflected field types - [Stdlib] support stripping multi-byte characters
- [stdlib] Add
black_boxto benchmarking utilities.
Raw MAX diff: https://github.com/modular/modular/compare/4e22d270976d0e7e92b7cd6764cbdebe828c5a35...6699f1a52bd5b01aa16aee5a087f9f7be0d734d0)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/changelog.md