hello guys !
Here is my case:
i have a 512*512 image with
def main() raises:
with DeviceContext() as ctx:
var out_r_buf = ctx.enqueue_create_buffer[dtype](SIZE * SIZE)
var out_g_buf = ctx.enqueue_create_buffer[dtype](SIZE * SIZE)
var out_b_buf = ctx.enqueue_create_buffer[dtype](SIZE * SIZE)
out_r_buf.enqueue_fill(0)
out_g_buf.enqueue_fill(0)
out_b_buf.enqueue_fill(0)
var out_r = LayoutTensor[dtype, layout_2d, MutAnyOrigin](out_r_buf)
var out_g = LayoutTensor[dtype, layout_2d, MutAnyOrigin](out_g_buf)
var out_b = LayoutTensor[dtype, layout_2d, MutAnyOrigin](out_b_buf)
comptime kernel = render_rgb_blocks_2d[layout_2d, layout_2d, layout_2d]
ctx.enqueue_function[kernel, kernel](
out_r,
out_g,
out_b,
SIZE,
grid_dim=BLOCKS_PER_GRID,
block_dim=THREADS_PER_BLOCK,
)
ctx.synchronize()
and i am doing this to copy back to cpu after rendring/modifying pixels
`with out_r_buf.map_to_host() as r:
with out_g_buf.map_to_host() as g:
with out_b_buf.map_to_host() as b:
for j in range(SIZE):
var row_list = Python.list()
for i in range(SIZE):
var k = j * SIZE + i
var rv = r[k]
if rv < 0.0:
rv = 0.0
elif rv > 255.0:
rv = 255.0
var gv = g[k]
if gv < 0.0:
gv = 0.0
elif gv > 255.0:
gv = 255.0
var bv = b[k]
if bv < 0.0:
bv = 0.0
elif bv > 255.0:
bv = 255.0
row_list.append(
Python.list(
Int(rv),
Int(gv),
Int(bv),
)
)
py_rows.append(row_list)`
is there a way to make that buffer available to numpy directly with something like this
with out_r_buf.map_to_host() as r:
with out_g_buf.map_to_host() as g:
with out_b_buf.map_to_host() as b:
np_r = np.frombuffer(r, dtype="float32").reshape((SIZE, SIZE))
np_g = np.frombuffer(g, dtype="float32").reshape((SIZE, SIZE))
np_b = np.frombuffer(b, dtype="float32").reshape((SIZE, SIZE))
# Stack into RGB image
img = np.stack([np_r, np_g, np_b], axis=-1)
# Clamp and convert
img = np.clip(img, 0, 255).astype("uint8")
i am having this error
error: invalid call to âcallâ: could not convert element of âargsâ with type âHostBuffer[DType.float32]â to expected type âConvertibleToPython & Copyableâ
np_b = np.frombuffer(b, dtype=âfloat32â).reshape((SIZE, SIZE))
From the puzzles and the docs i understand what the builders are trying to do
Copy from GPU to CPU then read it
but it felt like I can give that gpu address to a python object like numpy
and then i understood the format is different to what numpy could understand
so my question is
is there a way to bridge it without having to copy back to cpu ?