Hello Modular forumites,
Out of sheer curiosity, is there anything in the Mojo/Max stack or even in MLIR that would prevent the Modular stack to support NPUs (Q.ANT) which already support C++, Python etc.?
Hello Modular forumites,
Out of sheer curiosity, is there anything in the Mojo/Max stack or even in MLIR that would prevent the Modular stack to support NPUs (Q.ANT) which already support C++, Python etc.?
Q.ANT seem to be a bit vague about their stack and programming model, so I have no idea. Mojo is designed to be capable of being hammered into a “technically functions” shape for anything with a program counter (FPGAs, CGRAs, etc are just really weird to try to make a programming language for). NPUs are a bit of a mixed bag, and at this point they’re kind of the “everything else” category. For instance, if I took something with the programming model of the Cray 1 Supercomputer and put it on a PCIe card, I could probably call it an NPU and nobody would blink an eye. Similarly, a lot of people have called the “Taalas HC1 Technology Demonstrator” an NPU, and it bakes a model into the silicon directly, so I don’t think Mojo really has a chance of programming that.
If you want to look at a slightly better documented NPU, Tenstorrent has done an excellent job of providing documentation for their hardware. That Mojo would have absolutely zero issues talking to, and you could probably make that happen within a day or two of the compiler going open source and being able to turn on the RISC-V LLVM backend (since Tenstorrent uses a bunch of RISC-V cores).
NPUs in general don’t really interest me because from what I have seen they are highly specialized copper silicon chips as opposed the new photonic paradigm that Q.ANT is pushing forward. It seems obvious to me that old school copper silicon will not be the future of computing anymore than spinning disks or tape is the future of storage.
The Tenstorrent architecture is very distinct from NPUs.
The Tenstorrent architecture is basically a RISC -V GPU running on Network on a Chip (ÑoC).
Things to note is the that the tt-kernel source code is closed source and additionally they have the tt-lang DSL which in reference is very different from Mojo.
They use the TT-forge compiler and Tt-Metallium SDK is basically tied on greater ecosystem which is basically manually overided to run open source LLMs.
The major problem was that the TT-NN framework required converting tensors back to Pytorch just to run Models. If you buy Black hole or Wormhole architecture you have to manually overide everything ![]()
Though both Mojo and tt-forge are built on top of MLIR Tenstorrent hardware still relies on a closed ecosystem Which is Tt-Metallium is everything
and middleman systems like Apache TVM.
Validating this here is one of the discussions I and the Tenstorrent folks had in the Tenstorrent Discord Server:
To run NPU kernels in Mojo you’ll use __mlir_op and link it with their primary ISA VLIW.
Though this is very experimental ![]()
You can try this if you have Apple Neural Engine M-series on Mac (38 TOPs+) or Linux laptops likes System 76 oryx or Tuxedo that have built-in NPUs like AMD XDNA.
Different NPUs are different… like
the XDNA from AMD uses MLIR AIE(AI Engine ) compiler and others from Qualcomm TurboX and Edgecortix..
From a counterintuitive opinion NPUs are designed for the future using silicon
but they’re meant to proof that you can run heavier models with lower TOPs or lower power.
If you look at NPU : GPU ratio in TOPs to performance output is often 1 : 2.5 meaning a 60TOPs NPU gives you the edge necessary to run like a 155 TOPs​![]()