Currently, the usability for performing tensors operations using Pytorch, from the programming perspective, it’s very concise and powerful. For example, if we compare how we can create a kernel for performing a broadcast add in Mojo GPU (without max), we had something like here: mojo-gpu-puzzles/problems/p05/p05_layout_tensor.mojo at 763da1d193680afb396e13582397cd9c6673e5e1 · winding-lines/mojo-gpu-puzzles · GitHub
In pytorch, a similar code would be very concise:
a = torch.tensor([1, 2, 3], device='cuda')
b = torch.tensor([10, 20, 30, 40], device='cuda')
# Broadcast addition: out[i][j] = a[i] + b[j]
out = a[:, None] + b[None, :] # shape: [3, 4]
The Max Python API is a great and concise library to work with Max and even If I did not deeply tried, as I am more Mojo GPU focused right now, it seems it will accomplish the same as Pytorch.
But, if we want to use Mojo for simple tensors operations, as the Modular team removed the Max Tensor API (e.g. see this commit), we don’t have AFAIK any other option that building a GPU kernel.
However, thanks to the Modular incredible work in the Mojo language and recent improvements in usability, I wonder why not creating a Tensor struct in which we can try to play with them directly in Mojo. Actually, I think that if Mojo is a AI-first language, the stdlib could include them, so it would be a factor that differentiate Mojo from the rest of general purpose languages.
I believe some of the stdlib contributors (myself included) would be happy to maintain this part of the library, so we don’t have to switch to Python to do cool things in a concise manner, even in a Jupyter notebook maybe.
What do you think?