Mojo manual gpu basics exercise does not compile

StrGenIx · March 29, 2025, 5:24pm

I was working through the Mojo GPU basics, but when I get to the exercise it does not compile and the description does not quite match the solution implementation.

Description mismatch:

Create an host and device buffer for the output of DType Int64 , with 8 elements, don’t forget to zero the values with enqueue_memset()

But the implementation still uses DType.float32 for the output buffers.

Exercise solution does not compile:

When trying to build the example, it errors with:

error: ‘DeviceBuffer[float32]’ is not subscriptable, it does not implement the __getitem__/__setitem__ methods
output_buffer[block_idx.x] = value
~~~~~~~~~~~~~^
mojo: error: failed to parse the provided Mojo source module

Excercise solution implementation

from gpu import thread_idx, block_idx, warp, barrier
from gpu.host import DeviceContext, DeviceBuffer
from gpu.memory import AddressSpace
from memory import stack_allocation
from layout import Layout, LayoutTensor
from math import iota
from sys import sizeof


def main():
    ctx = DeviceContext()

    alias dtype_f32 = DType.float32
    alias elements_f32 = 32
    alias blocks_f32 = 8
    alias threads_f32 = elements_f32 // blocks_f32

    # Create buffers
    var in_buffer_host = ctx.enqueue_create_host_buffer[dtype_f32](elements_f32)
    var in_buffer_device = ctx.enqueue_create_buffer[dtype_f32](elements_f32)
    var out_buffer_host = ctx.enqueue_create_host_buffer[dtype_f32](blocks_f32)
    var out_buffer_device = ctx.enqueue_create_buffer[dtype_f32](blocks_f32)

    # Zero output buffer values
    ctx.enqueue_memset(out_buffer_device, 0)
    ctx.synchronize()

    # Fill in input values sequentially and copy to device
    iota(in_buffer_host.unsafe_ptr(), elements_f32)
    in_buffer_host.enqueue_copy_to(in_buffer_device)

    # Create the LayoutTensor
    alias LayoutF32 = Layout.row_major(blocks_f32, threads_f32)
    alias InputTensorF32 = LayoutTensor[dtype_f32, LayoutF32, MutableAnyOrigin]
    var float_tensor = InputTensorF32(in_buffer_device)

    fn reduce_sum_f32(
        in_tensor: InputTensorF32, output_buffer: DeviceBuffer[dtype_f32]
    ):
        var value = in_tensor.load[1](block_idx.x, thread_idx.x)
        value = warp.sum(value)
        if thread_idx.x == 0:
            output_buffer[block_idx.x] = value

    ctx.enqueue_function[reduce_sum_f32](
        float_tensor, out_buffer_device, grid_dim=8, block_dim=4
    )

    out_buffer_device.enqueue_copy_to(out_buffer_host)

    ctx.synchronize()

    print(out_buffer_host)

ferdinand · March 30, 2025, 2:02pm

I ran into this too.

You are likely on the nightly right?
Revert to the latest stable release (25.2.0) and it should work.

joshpeterson · March 31, 2025, 11:11am

Thanks for raising this issue! @StrGenIx Do you mind creating a bug report here: GitHub · Where software is built? Then we can provide feedback to you when this is corrected.

StrGenIx · March 31, 2025, 11:30am

Thank you. I’ve tested it with 25.2 and then it works.

StrGenIx · March 31, 2025, 11:38am

Thank you, I have submitted a documentation issue on Github.

joshpeterson · March 31, 2025, 12:26pm

Thank you!

BradLarson · April 2, 2025, 1:15pm

@jack.clayton can confirm, but I believe his fix for this rolled out with the latest nightly. If you’re still having any issues with the exercises, please let us know. Thanks again for reporting!

jack.clayton · April 2, 2025, 2:12pm

Yeah the fix is live now thanks for the report

Topic		Replies	Views
GPU Float64 memset support Mojo discussion , 25_1	0	71	February 17, 2025
Doubt related to Mojo and direct GPU memory access GPU Programming	4	151	April 17, 2025
GPU Programming Manual Community Showcase gpu , docs , modular-content	17	474	March 26, 2025
Examples of programming GPU functions using the Mojo MAX Driver API MAX discussion , gpu , 25_1	5	356	April 26, 2025
`prefix_sum` incorrect results with `gpu.warp.prefix_sum` and `gpu.block.prefix_sum` Mojo gpu	3	89	May 7, 2025

Mojo manual gpu basics exercise does not compile

Related topics