GPU puzzles mental model

AffableHoneyBadger · November 11, 2025, 9:53pm

Spoiler alert: I’m asking a question which references the solution to puzzle 4: 2D Map from the Modular puzzle exercises

Modular’s puzzle series is my first time working with any sort of GPU programming. I’ve been reading through the puzzle docs along with doing the puzzles. I’m now trying to understand the “Raw Memory Approach” solution to Puzzle 4.

When talking about GPU threads, I think I understand the basic concept of (i, j) coordinates. Perhaps not identical to, but similar to x, y coordinates on a map, or other “chart-like” object. Rows and columns. However, the solution listed says:

output[row * size + col] = a[row * size + col] + 10.0

1. row * size + col seems like it would evaluate to a single value, not a pair of “coordinate values” I was expecting.

I’m also not sure I understand what size is doing here (I mean functionally, not the basic multiplication) and why we need it.

These combined, makes me think my mental model of GPU threads is at least off the mark, if not completely wrong. Was wondering if anyone might be able to help or point me towards other useful resources.

Thanks!

And also thanks to those on the Modular team for making these puzzles!

jklaivins · November 11, 2025, 10:40pm

Someone else probably can give a more detailed answer. I haven’t worked deeply with the mojo puzzles, but I have been working through a cuda c programming textbook and have been writing the mojo equivalents to their cuda c code.

The 2/3d coords are purely logical / mental. The actual memory is 1 dimensional.

Found this which might explain some of it Reddit - The heart of the internet

At the end of the day you’re working with a 1d global memory. The x,y,z is purely logical / mental concept for humans. This is important for gpu’s original purpose of x,y pixels on an image. When you’re writing a image processor algorithm, x,y is more intuitive than just 1d ops. And in ML we have convolutional neural networks that have multiple dimensions.

For example below is CUDA C thats summing 2 “2d” matrices together where nx and ny is the width and height.

__global__ void sumMatrixOnGPU2D(float *MatA, float *MatB, float *MatC, int nx,
                                 int ny) {

  unsigned int ix = threadIdx.x + blockIdx.x * blockDim.x;
  unsigned int iy = threadIdx.y + blockIdx.y * blockDim.y;
  // idx is the global linear memory idx.
  unsigned int idx = iy * nx + ix;

  if (ix < nx && iy < ny) {
    MatC[idx] = MatA[idx] + MatB[idx];
  }
}

There is CUDA Refresher: The CUDA Programming Model | NVIDIA Technical Blog which has literal 2d matrices, so I don’t think there is much stopping you from going this route.

Additionally I think in later puzzles they present Using LayoutTensor | Modular which probably do what you’re looking for in x/y indexing, however their backend unlying storage is still a linear layout.

florals.p · November 12, 2025, 9:35am

If you look at the explicit declarations of out and a:

var a: DeviceBuffer[dtype] = ctx.enqueue_create_buffer[dtype](SIZE * SIZE).enqueue_fill(0)
var out: DeviceBuffer[dtype] = ctx.enqueue_create_buffer[dtype](SIZE * SIZE).enqueue_fill(0)

it’s clear that both are contiguous blocks of GPU memory, each large enough to hold SIZE × SIZE elements.

The kernel uses two-dimensional thread indices (thread_idx.x, thread_idx.y), so you need to map those 2D coordinates into a single linear memory address:

var row: UInt = thread_idx.y
var col: UInt = thread_idx.x

if (row < size) and (col < size):
    var i: UInt = row * size + col
    output[i] = a[i] + 10

Topic		Replies	Views
Leetgpu, tensara how to handle shared memory? GPU Programming gpu	1	138	June 26, 2025
GPU Puzzles P09 Shared memory indexing issue Standard Library gpu	2	96	June 27, 2025
GPU Programming Manual Community Showcase gpu , docs , modular-content	18	587	September 22, 2025
Idioms for using an index tensor in kernel computations GPU Programming	7	84	June 6, 2025
Defining GPU Thread-Local Variables in Mojo GPU Programming discussion	0	40	July 9, 2025

GPU puzzles mental model

Related topics