Defining GPU Thread-Local Variables in Mojo

Hi, I want to check whether doing var local_sum: output.element_type = 0 creates a thread-local variables, and whether, in general, all variables are thread-local unless otherwise in the Mojo GPU programming model.

From a claude code search it said that all variables, unless explicitly declared, would be thread-local to the gpu, which makes sense and is efficient, but I couldn’t find it in the documentation of var (Variables | Modular).

This question came up when I was going through GPU puzzles no.11. This is the solution given, which included the var local_sum: output.element_type = 0 line.

fn conv_1d_simple[
    in_layout: Layout, out_layout: Layout, conv_layout: Layout
](
    output: LayoutTensor[mut=False, dtype, out_layout],
    a: LayoutTensor[mut=False, dtype, in_layout],
    b: LayoutTensor[mut=False, dtype, conv_layout],
):
    global_i = block_dim.x * block_idx.x + thread_idx.x
    local_i = thread_idx.x
    shared_a = tb[dtype]().row_major[SIZE]().shared().alloc()
    shared_b = tb[dtype]().row_major[CONV]().shared().alloc()
    if global_i < SIZE:
        shared_a[local_i] = a[global_i]
    else:
        shared_a[local_i] = 0

    if global_i < CONV:
        shared_b[local_i] = b[global_i]

    barrier()

    if global_i < SIZE:
        # Note: using `var` allows us to include the type in the type inference
        # `out.element_type` is available in LayoutTensor
        var local_sum: output.element_type = 0

        # Note: `@parameter` decorator unrolls the loop at compile time given `CONV` is a compile-time constant
        # See: https://docs.modular.com/mojo/manual/decorators/parameter/#parametric-for-statement
        @parameter
        for j in range(CONV):
            # Bonus: do we need this check for this specific example with fixed SIZE, CONV
            if local_i + j < SIZE:
                local_sum += shared_a[local_i + j] * shared_b[j]

        output[global_i] = local_sum

A big thank you in advance!

1 Like