Multiple LLMs serve

harish · August 1, 2025, 11:52am

how to serve multiple LLMs. Tried using –port option with latest max version, but second serve command shows error that it is already on same port, even though other port is mentioned

Ehsan · August 1, 2025, 5:03pm

Please elaborate what do mean by multiple LLMs and what’s your usecase? are you trying to serve multiple LLMs in a single/multi GPU and have enough resources then what errors do you get? please include as much info as possible.

harish · August 5, 2025, 6:40am

Thanks. I was able to execute command “max serve –port xxxx” for multiple LLMs, one port for Gemma and another port for DeepSeek. if we set –device-memory-utilization=0.4 or 0.5 as per memory available, it works for multiple serves. If we do not set this, it will use 0.9 of available memory, it means it will use 90% memory by default for first serve.

Topic		Replies	Views
Customize a model - gemma3 Serving	1	62	August 15, 2025
Memory Requirement Estimation to run N concurrent request of a specific model Serving	2	43	August 18, 2025
It's here: MAX 24.6 and MAX GPU! :rocket: Official Announcements	0	172	December 17, 2024
MAX on CPU doubt and request MAX discussion , feature-request , 24_6	3	135	July 9, 2025
Doubt about the benchmarking tutorial MAX discussion , modular-content , 24_6	3	135	January 7, 2025

Multiple LLMs serve

Related topics