MAX distributed graph execution

How does MAX handle distributed inference execution across multiple devices? I’m interested in understanding what parallelization strategies are built into the framework (such as pipeline parallelism, tensor parallelism, etc.) versus what requires manual implementation, and how the graph compiler manages cross-device communication during execution. Are there any code references or examples that demonstrate MAX’s distributed execution capabilities?

1 Like

I don’t think we are quite ready for this yet. There is a bunch of internal development going on in this area, but we don’t have much public at the moment. We’re working to firm up the internal APIs, then we will likely expose them as we have with other APIs.

All right, thank you for the reply! This information helps a lot.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.