Max Local Server Hanging with Message: instrument is None for maxserve.pipeline_load

Hi, I am trying to run the Max quickstart tutorial Run Inference with an Endpoint on my laptop but stuck with launching the endpoint:

(quickstart) $ max serve --model-path=modularai/Llama-3.1-8B-Instruct-GGUF
09:57:22.801 INFO: 54276 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
09:57:22.801 INFO: 54276 MainThread: max.serve: Unsupported recording method. Metrics unavailable in model worker
09:57:22.807 INFO: 54276 MainThread: max.pipelines: Starting download of model: modularai/Llama-3.1-8B-Instruct-GGUF
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 13.21it/s]
09:57:22.884 INFO: 54276 MainThread: max.pipelines: Finished download of model: modularai/Llama-3.1-8B-Instruct-GGUF in 0.076673 seconds.
09:57:22.950 WARNING: 54276 MainThread: max.pipelines: Insufficient cache memory to support a batch containing one request at the max sequence length of 131072 tokens. Need to allocate at least 1024 pages (32.00 GiB), but only have enough memory for 303 pages (9.47 GiB).
09:57:23.127 INFO: 54276 MainThread: max.pipelines: Paged KVCache Manager allocated 303 device pages using 32.00 MiB per page.
09:57:23.128 INFO: 54276 MainThread: max.pipelines: Building and compiling model...
09:57:34.491 INFO: 54276 MainThread: max.pipelines: Building and compiling model took 11.362761 seconds
instrument is None for maxserve.pipeline_load

Specs:

Apple M1 Pro 2021
16GB Mem
Sequoia 15.5
modular==25.5.0.dev2025070105

Thank you!

Try setting a max length to avoid the cache memory error.

--max-length 1024

You can see more max CLI | Modular

@timdavis Thx. I reduced max-length all the way down to 32, 8, etc… but still got stuck here:

max serve --model-path=modularai/Llama-3.1-8B-Instruct-GGUF --max-length 8
12:38:17.370 INFO: 55253 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
12:38:17.371 INFO: 55253 MainThread: max.serve: Unsupported recording method. Metrics unavailable in model worker
12:38:17.376 INFO: 55253 MainThread: max.pipelines: Starting download of model: modularai/Llama-3.1-8B-Instruct-GGUF
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  8.69it/s]
12:38:17.492 INFO: 55253 MainThread: max.pipelines: Finished download of model: modularai/Llama-3.1-8B-Instruct-GGUF in 0.116008 seconds.
12:38:17.728 INFO: 55253 MainThread: max.pipelines: Paged KVCache Manager allocated 1 device pages using 32.00 MiB per page.
12:38:17.728 INFO: 55253 MainThread: max.pipelines: Building and compiling model...
12:38:35.413 INFO: 55253 MainThread: max.pipelines: Building and compiling model took 17.684945 seconds
instrument is None for maxserve.pipeline_load

Thanks @coffeegriz - it’s a bug we’re fixing. The server is actually live on that instrument is None for maxserve.pipeline_load and you can curl to it - its a logging issue. We should have a fix out in the next nightly hopefully.

1 Like

Hey @coffeegriz, thanks for pointing this out :folded_hands:

The logging issue has now been fixed, as of MAX 25.5.0.dev2025070207

Hi @tboerstad Thank you! Resolved.

$ curl [http://0.0.0.0:8000](http://0.0.0.0:8000)
{"detail":"Not Found"}
$ python generate-text.py
The Los Angeles Dodgers won the 2020 World Series. They defeated the Tampa Bay Rays in the series 4 games to 2.