Does Mojo support Parent/Child Grids (nested kernels)?

jklaivins · November 15, 2025, 9:31pm

I’m looking for an equivalent of:

__global__ void nestedHelloWorld(int const iSize, int iDepth) {
    int tid = threadIdx.x;
    printf("Recursion=%d :Hello World from thread %d block %d\n",
        iDepth, tid,blockIdx.x);
    if (iSize == 1) return;
    // Decrease the number of threads by the power of 2 (rshift)
    int nthreads = iSize >> 1;
    if (tid == 0 && nthreads > 0 ) {
        // Dynamic parallelism - requires -rdc=true for nvcc compilation
        // clangd doesn't support CUDA dynamic parallelism checking
        nestedHelloWorld<<<1, nthreads>>>(nthreads, ++iDepth); // clangd-ignore
        printf("--------> nested execution depth: %d\n",iDepth);
    }
}

Its unclear to me whether mojo supports directly via the gpu module or if this is only possible via jumping to max..

Caroline · February 3, 2026, 4:25pm

Hi @jklaivins, sorry for the delayed response on this one! Mojo doesn’t currently support nested kernels/dynamic parallelism.

system · February 10, 2026, 4:26pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPU programming: Call a kernel function inside another kernel function? Mojo	1	53	February 5, 2026
How to package/interface with a GPU kernel with dynamic sized tensors (dynamic LayoutTensor) GPU Programming	15	406	July 12, 2025
Launch_bounds support for GPU code Mojo gpu	5	63	February 25, 2026
Why does mojo build with matmul_gpu generate different kernels for different M at runtime? Mojo gpu	4	157	September 22, 2025
GPU Programming Manual Community Showcase gpu , docs , modular-content	18	766	September 22, 2025

Does Mojo support Parent/Child Grids (nested kernels)?

Related topics