GPU Puzzles P09 Shared memory indexing issue

Augment · June 20, 2025, 9:58pm

I am looking at GPU problem p09 with LayoutTensor, and I have this solution:

fn pooling[
    layout: Layout
](
    output: LayoutTensor[mut=True, dtype, layout],
    a: LayoutTensor[mut=True, dtype, layout],
    size: Int,
):
    # Allocate shared memory using tensor builder
    shared = tb[dtype]().row_major[TPB]().shared().alloc()

    global_i = block_dim.x * block_idx.x + thread_idx.x
    local_i = thread_idx.x

    if global_i < size:
        shared[local_i] = a[global_i]

    barrier()

    if global_i < size:
        acc = Float32(0)
        for j in range(max(local_i-2, 0), local_i+1):
            acc += shared[j][0]

        output[global_i] = acc

For some reason, it seems to fail for thread_idx = 1

out: HostBuffer([0.0, 0.0, 3.0, 6.0, 9.0, 12.0, 15.0, 18.0])
expected: HostBuffer([0.0, 1.0, 3.0, 6.0, 9.0, 12.0, 15.0, 18.0]) 
Unhandled exception caught during execution: At /home/arseni/repositories/mojo-gpu-puzzles/problems/p09/p09_layout_tensor.mojo:80:29: AssertionError: left == right comparison failed: left: 0.0

I can see the suggested solution in the docs, but I want to understand why this does not work, or how you would debug something like this in Mojo/lldb. Is there some crux behind indexing into shared/SIMD memory? Or is it just some indexing issue?

steepcurve · June 22, 2025, 12:43pm

local_i is UInt, so local_i-2 may underflow.

Augment · June 27, 2025, 5:57pm

Youre right, wrapping the local_i-2 in Int solves the problem, thank you.

Topic		Replies	Views
Leetgpu, tensara how to handle shared memory? GPU Programming gpu	1	41	June 26, 2025
GPU tensor creation? GPU Programming	1	64	May 10, 2025
`prefix_sum` incorrect results with `gpu.warp.prefix_sum` and `gpu.block.prefix_sum` Mojo gpu	3	89	May 7, 2025
Doubt related to Mojo and direct GPU memory access GPU Programming	4	151	April 17, 2025
Mojo manual gpu basics exercise does not compile GPU Programming 25_3	7	128	April 2, 2025

GPU Puzzles P09 Shared memory indexing issue

Related topics