A new nightly version has been released! ![]()
See the quickstart guide for installation instructions: Quickstart | Modular
MAX changelog updates:
-
When running models with data-parallelism (DP), the semantics of max-batch-size
has changed. For example, when specifying--data-parallel-degree 8and
--max-batch-size 32this used to mean that each data-parallel replica could
have at most 4 requests for an aggregate max-batch-size of 32. We changed this
so that now the CLI flag specifies the max-batch-size per replica. This means
the aggregate max-batch-size of the above invocation is 8*32=256 requests.
This is done to align with vLLM and other inference engines. -
Server stats collection (
collect_server_stats) is now enabled by default for
serving benchmarks.
Mojo changelog updates:
- [Doc][KGEN] Update changelog (NFC) (#73013)
Raw MAX diff: https://github.com/modular/modular/compare/6008a39fe2584e08aa41d9e83fdb8d244e68114c...7351fe3df6549b6b16d5e6b3959aac703fcb03d7)>
Current Mojo changelog: https://github.com/modular/modular/blob/main/mojo/docs/changelog.md