How do I `import linalg`?

alpha21164enthusiast · May 11, 2025, 3:26pm

Hi, sorry for the really dumb question, but I can’t quite figure out how to import linalg (and by extension other stuff from MAX AI kernels) into my simple mojo project. I created my project using magic following the gpu-intro tutorial. I’ve randomly tried magic add max and magic add linalg, but neither seems to have worked. Any tips?

Thanks!

BradLarson · May 11, 2025, 4:28pm

If you have max as part of your Magic project (which should be the default if you initialize a project with magic init gpu-intro --format mojoproject), you should have the Mojo kernel library available by default as part of the MAX framework. An import like

from linalg.matmul import matmul

should just work in a Mojo file.

Now, for modifying these and then using your modified versions of the kernel library as an import, I think we’re just getting the infrastructure in place to do that. We’ll post instructions on how to build local versions of the kernels soon.

alpha21164enthusiast · May 12, 2025, 1:02am

Huh, I don’t quite think this works? I have the following output:

(gpu-intro) ubuntu@209-20-158-194:~/gpu-intro$ cat test.mojo
from linalg.matmul import matmul

def main():
    print("hello world")
(gpu-intro) ubuntu@209-20-158-194:~/gpu-intro$ mojo build test.mojo
/home/ubuntu/gpu-intro/test.mojo:1:6: error: unable to locate module 'linalg'
from linalg.matmul import matmul
     ^
/home/ubuntu/gpu-intro/test.mojo:1:13: error:
from linalg.matmul import matmul
            ^

Any thoughts what I might be doing wrong? I initialized my package exactly with the command you specified.

BradLarson · May 12, 2025, 3:43pm

Sorry about that, I was working off of an internal example and forgot that we didn’t package linalg as an external Mojo package. We’re getting the packaging all set up for building the various kernel libraries, which will make this a lot easier to work with.

In the meantime, you might be able to clone the modular repository locally and explicitly import the source for the linalg module during compilation:

magic run mojo -I [path to modular clone]/max/kernels/src file.mojo

alpha21164enthusiast · May 12, 2025, 3:49pm

Thanks! I’ll give that a shot.

A related question: because linalg is not shipped as an external package, what is the recommended way to multiply two matrices on the GPU? I’m trying to check the correctness of the gemm kernel I wrote against some reference implementation.

BradLarson · May 13, 2025, 3:33pm

To date, the way that we’d had people interface with accelerated matrix multiplication was at the operation level in the graph compiler. The graph compiler could then target specific hardware, and dispatch to the right matmul implementation. We recently open-sourced all the definitions for each MAX graph operation, and those operations can be found in a giant file here.

The top-level matmul function called in the mo.matmul operation there has a lot of specializations, but let’s try to test just that. For that, I look to the linalg unit tests for matmul. I’ve cribbed together a pretty crude port of that test to an isolated Mojo file here:

from buffer import NDBuffer
from buffer.dimlist import DimList
from linalg.matmul import matmul
from linalg.packing import pack_matmul_b_shape_func
from memory import UnsafePointer
from testing import assert_almost_equal, assert_equal
from utils.index import Index, IndexList


fn gemm_naive[](
    a: NDBuffer,
    b: NDBuffer,
    c: NDBuffer[mut=True, *_],
    m: Int,
    n: Int,
    k: Int,
):
    for i in range(m):
        for p in range(k):
            for j in range(n):
                var a_val = a[i, p].cast[c.type]()
                var b_val = b[p, j].cast[c.type]()
                c[i, j] += a_val * b_val

alias alignment = 64

def test_matmul[
    a_type: DType,
    a_shape: DimList,
    b_type: DType,
    b_shape: DimList,
    c_type: DType,
    c_shape: DimList,
    transpose_b: Bool,
    b_packed: Bool,
    saturated: Bool,
](m: Int, n: Int, k: Int):
    var a_ptr = UnsafePointer[Scalar[a_type], alignment=alignment].alloc(m * k)
    var b_ptr = UnsafePointer[Scalar[b_type], alignment=alignment].alloc(k * n)
    var b = NDBuffer[b_type, 2, _, b_shape](b_ptr, Index(k, n))

    var padded_n_k = IndexList[2]()
    padded_n_k = pack_matmul_b_shape_func[
        a_type,
        a_shape,
        b_type,
        b_shape,
        c_type,
        c_shape,
        transpose_b,
        True,
    ](b)

    var padded_n = padded_n_k[1] if b_packed else n
    var padded_k = padded_n_k[0] if b_packed else k

    var bp_ptr = UnsafePointer[Scalar[b_type], alignment=alignment].alloc(
        padded_k * padded_n
    )
    var c0_ptr = UnsafePointer[Scalar[c_type], alignment=alignment].alloc(m * n)
    var c1_ptr = UnsafePointer[Scalar[c_type], alignment=alignment].alloc(m * n)

    var a = NDBuffer[a_type, 2, _, a_shape](a_ptr, Index(m, k))

    var bp = NDBuffer[b_type, 2, _, DimList.create_unknown[2]()](
        bp_ptr, Index(padded_k, padded_n)
    )
    var c = NDBuffer[c_type, 2, _, c_shape](c0_ptr, Index(m, n))

    var golden = NDBuffer[c_type, 2, _, c_shape](c1_ptr, Index(m, n))

    # saturated VNNI only has a range [0,127] for the input a
    var vnni_range: Int = 128 if saturated else 256
    var cnt: Int = 0
    for i in range(m):
        for p in range(k):
            # uint8 but limited to [0,127]
            a[IndexList[2]((i, p))] = cnt % vnni_range
            cnt += 1

    cnt = 0
    for p in range(k):
        for j in range(n):
            # int8 [-128, 127]
            b[IndexList[2]((p, j))] = cnt % 256 - 128
            bp[IndexList[2]((p, j))] = b[IndexList[2]((p, j))]
            cnt += 1

    for i in range(m):
        for j in range(n):
            c[IndexList[2]((i, j))] = 0
            golden[IndexList[2]((i, j))] = c[IndexList[2]((i, j))]

    matmul[
        transpose_b=transpose_b,
        b_packed=b_packed,
        saturated_vnni=saturated,
    ](c, a, rebind[NDBuffer[b_type, 2, bp.origin, b_shape]](bp))

    gemm_naive(a, b, golden, m, n, k)

    for i in range(m):
        for j in range(n):
            var msg = String(
                "values do not agree for ",
                m,
                "x",
                n,
                "x",
                k,
                " using the dtype=",
                a_type,
                ",",
                b_type,
                ",",
                c_type,
            )

            @parameter
            if c_type.is_floating_point():
                assert_almost_equal(c[i, j], golden[i, j], msg)
            else:
                assert_equal(c[i, j], golden[i, j], msg)

    a_ptr.free()
    b_ptr.free()
    bp_ptr.free()
    c0_ptr.free()
    c1_ptr.free()

def main():
    alias a_shape = DimList.create_unknown[2]()
    alias b_shape = DimList.create_unknown[2]()
    alias c_shape = DimList.create_unknown[2]()
    test_matmul[DType.float32,
        a_shape,
        DType.float32,
        b_shape,
        DType.float32,
        c_shape, transpose_b=False, b_packed=False, saturated=False](256, 256, 256)

If you clone the modular repository at a directory level parallel to this project, you should be able to run

mojo -I ../modular/max/kernels/src/ matmul_test.mojo

This ran locally on my machine, and the matmul passed the test vs. a naive matmul in that same file. There are a lot of specializations below this that you might be able to pull out and test in a similar manner, but this does seem to work against the modular repository if you manually point the imports to the kernels directory inside there.

Topic		Replies	Views
Examples of custom CPU / GPU operations in Mojo MAX discussion , 24_6	28	1064	April 9, 2025
Support for Turing Architecture? MAX	9	186	May 4, 2025
Looking for examples of mulit-gpu usage with Mojo GPU Programming gpu	3	203	April 4, 2025
Problem statement Mojo	1	82	March 13, 2025
Hand written kernels in MAX MAX discussion	2	196	December 10, 2024

How do I `import linalg`?

Related topics