Max Local Server Hanging with Message: instrument is None for maxserve.pipeline_load

coffeegriz · July 1, 2025, 2:05pm

Hi, I am trying to run the Max quickstart tutorial Run Inference with an Endpoint on my laptop but stuck with launching the endpoint:

(quickstart) $ max serve --model-path=modularai/Llama-3.1-8B-Instruct-GGUF
09:57:22.801 INFO: 54276 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
09:57:22.801 INFO: 54276 MainThread: max.serve: Unsupported recording method. Metrics unavailable in model worker
09:57:22.807 INFO: 54276 MainThread: max.pipelines: Starting download of model: modularai/Llama-3.1-8B-Instruct-GGUF
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 13.21it/s]
09:57:22.884 INFO: 54276 MainThread: max.pipelines: Finished download of model: modularai/Llama-3.1-8B-Instruct-GGUF in 0.076673 seconds.
09:57:22.950 WARNING: 54276 MainThread: max.pipelines: Insufficient cache memory to support a batch containing one request at the max sequence length of 131072 tokens. Need to allocate at least 1024 pages (32.00 GiB), but only have enough memory for 303 pages (9.47 GiB).
09:57:23.127 INFO: 54276 MainThread: max.pipelines: Paged KVCache Manager allocated 303 device pages using 32.00 MiB per page.
09:57:23.128 INFO: 54276 MainThread: max.pipelines: Building and compiling model...
09:57:34.491 INFO: 54276 MainThread: max.pipelines: Building and compiling model took 11.362761 seconds
instrument is None for maxserve.pipeline_load

Specs:

Apple M1 Pro 2021
16GB Mem
Sequoia 15.5
modular==25.5.0.dev2025070105

Thank you!

timdavis · July 1, 2025, 2:33pm

Try setting a max length to avoid the cache memory error.

--max-length 1024

You can see more max CLI | Modular

coffeegriz · July 1, 2025, 4:49pm

@timdavis Thx. I reduced max-length all the way down to 32, 8, etc… but still got stuck here:

max serve --model-path=modularai/Llama-3.1-8B-Instruct-GGUF --max-length 8
12:38:17.370 INFO: 55253 MainThread: root: Logging initialized: Console: INFO, File: None, Telemetry: None
12:38:17.371 INFO: 55253 MainThread: max.serve: Unsupported recording method. Metrics unavailable in model worker
12:38:17.376 INFO: 55253 MainThread: max.pipelines: Starting download of model: modularai/Llama-3.1-8B-Instruct-GGUF
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.69it/s]
12:38:17.492 INFO: 55253 MainThread: max.pipelines: Finished download of model: modularai/Llama-3.1-8B-Instruct-GGUF in 0.116008 seconds.
12:38:17.728 INFO: 55253 MainThread: max.pipelines: Paged KVCache Manager allocated 1 device pages using 32.00 MiB per page.
12:38:17.728 INFO: 55253 MainThread: max.pipelines: Building and compiling model...
12:38:35.413 INFO: 55253 MainThread: max.pipelines: Building and compiling model took 17.684945 seconds
instrument is None for maxserve.pipeline_load

timdavis · July 1, 2025, 6:16pm

Thanks @coffeegriz - it’s a bug we’re fixing. The server is actually live on that instrument is None for maxserve.pipeline_load and you can curl to it - its a logging issue. We should have a fix out in the next nightly hopefully.

tboerstad · July 2, 2025, 9:11am

Hey @coffeegriz, thanks for pointing this out

The logging issue has now been fixed, as of MAX 25.5.0.dev2025070207

coffeegriz · July 2, 2025, 3:43pm

Hi @tboerstad Thank you! Resolved.

$ curl [http://0.0.0.0:8000](http://0.0.0.0:8000)
{"detail":"Not Found"}
$ python generate-text.py
The Los Angeles Dodgers won the 2020 World Series. They defeated the Tampa Bay Rays in the series 4 games to 2.

Topic		Replies	Views
Error running offline_inference.py MAX	2	41	May 29, 2025
MAX on CPU doubt and request MAX discussion , feature-request , 24_6	2	114	January 10, 2025
Use MAX with Continue as a local Code Assistant Community Showcase discussion , modular-content	2	73	January 31, 2025
MAX Nightly 25.5.0.dev2025061317 Released Nightly	0	31	June 13, 2025
MAX Nightly 25.3.0.dev2025042516 Released Nightly	0	46	April 26, 2025

Max Local Server Hanging with Message: instrument is None for maxserve.pipeline_load

Related topics