When you’re talking about abstractions over tensor operations, there’s two different ways that this can be approached. There’s treating these operations as computational graphs
(like we do in the Graph API), or performing calculations in Mojo code to be run directly as a function on the CPU or GPU.
For the former, in a separate post, I talk about why we made the difficult decision to open-source and then wind down the Modular distribution of the Mojo Graph and Driver APIs, so I won’t re-hash that here. I will say that if you decided to invest in modernizing that API, I would not use the old max.tensor
Tensor as a building block. It is fundamentally incompatible with use on GPUs, and we found that its design posed a lot of problems. It may have elements to its interface that could be ported to other tensor-like types to make them easier to work with, but I wouldn’t recommend carrying its core forward.
Now, if it’s just that you want a better abstraction for Mojo tensor operations that you want to perform directly, there may be an opportunity to enhance the API around LayoutTensor
, which we use as our first-class tensor representation for CPU / GPU functions. We’re migrating off of NDBuffer
and other data types to focus on LayoutTensor
as the primary way to represent tensors inside of Mojo GPU functions like the example you point to. Many of these examples are shaped in a way to be familiar to CUDA programmers, but it’s possible that better abstractions for elementwise calculations at the LayoutTensor
level could make writing Mojo kernel functions even cleaner.