MAX distributed graph execution

TilliFe · July 6, 2025, 12:51pm

How does MAX handle distributed inference execution across multiple devices? I’m interested in understanding what parallelization strategies are built into the framework (such as pipeline parallelism, tensor parallelism, etc.) versus what requires manual implementation, and how the graph compiler manages cross-device communication during execution. Are there any code references or examples that demonstrate MAX’s distributed execution capabilities?

joshpeterson · July 7, 2025, 1:26pm

I don’t think we are quite ready for this yet. There is a bunch of internal development going on in this area, but we don’t have much public at the moment. We’re working to firm up the internal APIs, then we will likely expose them as we have with other APIs.

TilliFe · July 7, 2025, 1:38pm

All right, thank you for the reply! This information helps a lot.

system · July 14, 2025, 1:39pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ONNX: difference in MAX cpu <-> gpu execution MAX debugging , 25_2	3	175	April 15, 2025
Resources for learning MAX for non-ML developers MAX discussion , gpu , docs , 25_1	2	209	February 22, 2025
New resources for building models in MAX Models & Pipelines gpu	0	40	June 27, 2025
Looking for examples of mulit-gpu usage with Mojo GPU Programming gpu	3	232	April 4, 2025
Will Max support cerebras.ai hardware? MAX discussion , gpu , modular-content , 24_6	4	259	December 28, 2024

MAX distributed graph execution

Related topics