Hi, I’m curious about the implementation details of the MAX engine in Mojo. In your compilation pipeline, do you rely on pre-written libraries or hand-written kernels (like Torch Inductor or TensorRT), or is the approach entirely focused on code generation and transformation, using MLIR passes to ensure performance and portability across hardware?
Hi Sarthak,
If you’re interested in implementation details, I’d check out some of the technical talks on the Modular blog, e.g. the LLVM DevMtg talk from a year ago.
To answer the question in a short way: our approach is based on hand-written kernels (more similar to triton lang) with a very fancy set of compiler and runtime technologies that include automatic fusion and memory planning etc, that benefit from being able to see into the IR representation of those kernels. This is a novel approach, that moves forward from the “put everything into the compiler” or “use vendor libraries” approach, and is uniquely extensible.
We are working on releasing our GPU support soon, and that does NOT use NVIDIA libraries like cudnn and cublas. We (intentionally) haven’t explained all of this yet, but will be sharing more as it becomes generally available.
-Chris
I’m rooting for y’all, we can’t wait!