Doubt about the benchmarking tutorial

Dasor · January 6, 2025, 8:15pm

I was reading and following the great tutorial provided at: Deploy a PyTorch model from Hugging Face | Modular Docs but there is a thing I don’t understand about it.

When you run the server you can clearly see MAX Engine is running and is compiling the graph (as shown in stdout). Yet at the end of the tutorial it is said that you can change the backed to vLLM or TensorRT-LLM by just changing the parameters on the benchmarking script, that’s the part that I don’t really understand.

How is it possible to change the backend from the benchmarking script if the server is already running and using MAX Engine, does changing the backend not imply that the graph should be compiled again with the new selected backend and so on?

Probably I’m missing some key details here so any help is highly appreciated, thanks in advance.

BradLarson · January 6, 2025, 9:27pm

The benchmarking script itself will run against various backends (docs and script here), but that assumes that instances of them are up and running.

You’re right that the example there takes you through starting up MAX Serve and leaves an instance of that running. To benchmark against TensorRT-LLM or vLLM, it’s assumed that you’d separately spin up a vLLM, etc. instance on equivalent hardware and run the same script against that.

The benchmark script itself won’t replace the serving backend that is running. The --backend parameter on that script ensures that the output is compatible with the various backends. The server and the benchmark are independent processes. Hopefully, that clears things up a bit.

Dasor · January 7, 2025, 6:50pm

Yes that clears everything up, thanks!.

system · January 8, 2025, 6:50am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Community meeting question: MAX speed loading weights & GPU warmup MAX	5	100	February 13, 2025
Porting various models to MAX MAX	6	149	May 8, 2025
MAX on CPU doubt and request MAX discussion , feature-request , 24_6	2	110	January 10, 2025
ONNX: difference in MAX cpu <-> gpu execution MAX debugging , 25_2	3	131	April 15, 2025
Upcoming changes to our GitHub repositories Official Announcements	4	217	February 27, 2025

Doubt about the benchmarking tutorial

Related topics