Looking for examples of mulit-gpu usage with Mojo

I realized that internally there is already some sophisticated logic for broadcasting/scattering tensor data across different devices: modular/max/kernels/src/comm/allreduce.mojo at e4d5f27d9f2cec5a4f0e6831a77003174a957d12 · modular/modular · GitHub
Would it be possible to expose this functionality to the max.ops API? @stef