Purpose of num_threads in copy_dram_to_sram_async

cudawarped · July 22, 2025, 7:19am

I’m confused as to the purpose of num_threads is inside copy_dram_to_sram_async. The documentation states

num_threads (Int): Total number of threads participating in the copy operation. Defaults to the size of src_thread_layout.

however from the implementation

alias num_busy_threads = src_thread_layout.size()

# We know at compile time that only partial threads copy based on the size
# of input tensors. Return if current thread doesn't have work.
@parameter
if num_threads > num_busy_threads:
    if thread_idx.x >= num_busy_threads:
        return

implies it is there to disable threads are not part of the copy operation. e.g. Given a 1d array of 1024 elements, a 1d block of 1024 threads and a thread_layout for the copy operation of Layout.row_major(1, 32)

alias layout = Layout.row_major(1, 1024)
input = LayoutTensor[mut=False, dtype, layout](inp.unsafe_ptr())
...
shared = tb[dtype]().row_major[1, 1024]().shared().alloc()
alias load_layout = Layout.row_major(1, 32)
copy_dram_to_sram_async[thread_layout=load_layout, num_threads=1024](shared, input)

num_threads=1024 is disable threads 32,...,1023 from issuing extra copy operations.

In this case num_threads would be the total number of threads in the block not the number participating in the copy operation. Is this correct?

Ehsan · July 22, 2025, 5:42pm

That’s correct! the documentation needs to be fixed. It should be

num_threads (Int): Total number of threads in the thread block. Threads beyond src_thread_layout.size() will be disabled and not participate in the copy operation.

cc @arthur

Topic		Replies	Views
Configure AsyncRT Parallelism Level? Standard Library	7	74	August 12, 2025
Questions regarding puzzle 14 GPU Programming	9	100	July 8, 2025
How do I sync threads between blocks i.e. device-wide? GPU Programming	7	72	August 14, 2025
GPU Puzzles P09 Shared memory indexing issue Standard Library gpu	2	87	June 27, 2025
Defining GPU Thread-Local Variables in Mojo GPU Programming discussion	0	33	July 9, 2025

Purpose of num_threads in copy_dram_to_sram_async

Related topics