Is there a way to make numpy access gpu memory?

init_bug · April 9, 2026, 12:19am

hello guys !
Here is my case:

i have a 512*512 image with

def main() raises:
with DeviceContext() as ctx:
var out_r_buf = ctx.enqueue_create_buffer[dtype](SIZE * SIZE)
var out_g_buf = ctx.enqueue_create_buffer[dtype](SIZE * SIZE)
var out_b_buf = ctx.enqueue_create_buffer[dtype](SIZE * SIZE)

    out_r_buf.enqueue_fill(0)
    out_g_buf.enqueue_fill(0)
    out_b_buf.enqueue_fill(0)

    var out_r = LayoutTensor[dtype, layout_2d, MutAnyOrigin](out_r_buf)
    var out_g = LayoutTensor[dtype, layout_2d, MutAnyOrigin](out_g_buf)
    var out_b = LayoutTensor[dtype, layout_2d, MutAnyOrigin](out_b_buf)

    comptime kernel = render_rgb_blocks_2d[layout_2d, layout_2d, layout_2d]

    ctx.enqueue_function[kernel, kernel](
        out_r,
        out_g,
        out_b,
        SIZE,
        grid_dim=BLOCKS_PER_GRID,
        block_dim=THREADS_PER_BLOCK,
    )

    ctx.synchronize()

and i am doing this to copy back to cpu after rendring/modifying pixels

`with out_r_buf.map_to_host() as r:
    with out_g_buf.map_to_host() as g:
        with out_b_buf.map_to_host() as b:
            for j in range(SIZE):
                var row_list = Python.list()

                for i in range(SIZE):
                    var k = j * SIZE + i

                    var rv = r[k]
                    if rv < 0.0:
                        rv = 0.0
                    elif rv > 255.0:
                        rv = 255.0

                    var gv = g[k]
                    if gv < 0.0:
                        gv = 0.0
                    elif gv > 255.0:
                        gv = 255.0

                    var bv = b[k]
                    if bv < 0.0:
                        bv = 0.0
                    elif bv > 255.0:
                        bv = 255.0

                    row_list.append(
                        Python.list(
                            Int(rv),
                            Int(gv),
                            Int(bv),
                        )
                    )

                py_rows.append(row_list)`

is there a way to make that buffer available to numpy directly with something like this

with out_r_buf.map_to_host() as r:
with out_g_buf.map_to_host() as g:
with out_b_buf.map_to_host() as b:

np_r = np.frombuffer(r, dtype="float32").reshape((SIZE, SIZE))
np_g = np.frombuffer(g, dtype="float32").reshape((SIZE, SIZE))
np_b = np.frombuffer(b, dtype="float32").reshape((SIZE, SIZE))

# Stack into RGB image
img = np.stack([np_r, np_g, np_b], axis=-1)

# Clamp and convert
img = np.clip(img, 0, 255).astype("uint8")

i am having this error

error: invalid call to ‘call’: could not convert element of ‘args’ with type ‘HostBuffer[DType.float32]’ to expected type ‘ConvertibleToPython & Copyable’
np_b = np.frombuffer(b, dtype=“float32”).reshape((SIZE, SIZE))

From the puzzles and the docs i understand what the builders are trying to do

Copy from GPU to CPU then read it
but it felt like I can give that gpu address to a python object like numpy
and then i understood the format is different to what numpy could understand

so my question is
is there a way to bridge it without having to copy back to cpu ?

init_bug · April 9, 2026, 12:41am

i did a great optimization but its like cheating i got rid of the python useless loops

with out_r_buf.map_to_host() as r:
    with out_g_buf.map_to_host() as g:
        with out_b_buf.map_to_host() as b:
            #  This is the improvement after some AI testing A FLAT list
            flat = Python.list()

            for k in range(SIZE \* SIZE):

                var rv = r\[k\]
                if rv < 0.0:
                    rv = 0.0
                elif rv > 255.0:
                    rv = 255.0

                var gv = g\[k\]
                if gv < 0.0:
                    gv = 0.0
                elif gv > 255.0:
                    gv = 255.0

                var bv = b\[k\]
                if bv < 0.0:
                    bv = 0.0
                elif bv > 255.0:
                    bv = 255.0

                # append as flat RGB stream
                flat.append(Int(rv))
                flat.append(Int(gv))
                flat.append(Int(bv))

            # Convert to NumPy
            img = np.array(flat, dtype="uint8")

            # reshape into (H, W, 3)
            img = img.reshape(SIZE, SIZE, 3)

            # save
            pil_image.fromarray(img).save("output.png")

print("Saved output.png")

this way i am streaming to a flat list then normalize it to a numpy object

Topic		Replies	Views
Doubt related to Mojo and direct GPU memory access GPU Programming	4	238	April 17, 2025
Python.array() or Python.memoryview() converters for Mojo types Mojo gpu , python-interop	1	58	February 7, 2026
Mojo manual gpu basics exercise does not compile GPU Programming 25_3	7	200	April 2, 2025
MAX Graph Python API built-in ops fail to compile for GPU - what's the correct pattern? MAX discussion , gpu	3	60	January 19, 2026
How should I invoke `vendor_blas.matmul`? GPU Programming	1	95	May 28, 2025

Is there a way to make numpy access gpu memory?

Related topics