LayoutTensor - Type conversion Issue

I ran into a type conversion issue in the code below and am having trouble resolving it. I’m using p04_layout_tensor.mojo as a reference, and cannot see the significant difference between my code and the puzzle’s. It seems straightforward and that it should work, but…no.
This code snippet is from my hackathon project (pendulum). Any help/guidance appreciated!
Mojo 25.5.0.dev2025063020

Error: cannot implicitly convert ‘SIMD[float32, init[::Origin[::Bool(IntTuple(1), IntTuple(1)).size()]’ value to ‘SIMD[float32, 1]’
Code: sum = sum + input_buffer[0, j] * weight

from gpu import thread_idx, block_dim, block_idx
from layout import Layout, LayoutTensor

# GPU kernel with tensor indexing issues (for demonstration)
fn gpu_neural_network_kernel_with_indexing_issues(
    output_buffer: LayoutTensor[
        mut=True, DType.float32, Layout.row_major(1, 3)
    ],
    input_buffer: LayoutTensor[mut=True, DType.float32, Layout.row_major(1, 4)],
):
    """GPU neural network kernel that demonstrates tensor indexing type conversion issues.
    """
    idx = thread_idx.x + block_idx.x * block_dim.x

    if idx < 3:  # 3 outputs
        # Compute neural network output exactly like CPU version
        # Use identical weight formula: (i + j + 1) * 0.1
        var sum: Float32 = 0.0

        # This will cause tensor indexing type conversion issues
        for j in range(4):  # 4 inputs
            weight = Float32(idx + j + 1) * 0.1

            # SIMD[float32, __init__[::Origin[::Bool(IntTuple(1), IntTuple(1)).size()]
            value = input_buffer[0, j]

            sum = (
                sum + input_buffer[0, j] * weight
            )  # Type conversion issue here

        # Apply real tanh activation
        var tanh_result: Float32
        if sum > 5.0:
            tanh_result = 1.0
        elif sum < -5.0:
            tanh_result = -1.0
        else:
            # High-quality tanh approximation
            abs_sum = sum if sum >= 0.0 else -sum
            sum_squared = sum * sum
            denominator = 1.0 + abs_sum / 3.0 + sum_squared / 15.0
            tanh_result = sum / denominator

        # Store result - this may also have indexing issues
        output_buffer[0, idx] = tanh_result

To answer my own question, Mojo data types derived from DType are SIMD, meaning that they are vectors. For example, to convert a value from SIMD DType.float32 to scalar Float32, change
from value = input_buffer[0, j]
to value = input_buffer[0, j][0]

Both I and my AI intern (Augment Code w. Claude Sonnet 4) are learning (slowly). :smiling_face_with_sunglasses:

# GPU kernel with proper tensor indexing and SIMD vector extraction
fn gpu_neural_network_kernel(
    output_buffer: LayoutTensor[
        mut=True, DType.float32, Layout.row_major(1, 3)
    ],
    input_buffer: LayoutTensor[mut=True, DType.float32, Layout.row_major(1, 4)],
):
    """GPU neural network kernel - functionally equivalent to CPU with proper tensor indexing.
    Uses SIMD vector extraction ([0]) and compile-time loop optimization.
    """
    idx = thread_idx.x + block_idx.x * block_dim.x

    if idx < 3:  # 3 outputs
        # Compute neural network output exactly like CPU version
        # Use identical weight formula: (i + j + 1) * 0.1
        var sum: Float32 = 0.0

        # Compile-time loop unrolling for GPU performance optimization
        @parameter
        for j in range(4):  # 4 inputs
            weight = Float32(idx + j + 1) * 0.1
            # Proper tensor indexing with SIMD vector extraction
            sum = sum + input_buffer[0, j][0] * weight  # <-- Fixed here

        # Apply real tanh activation
        var tanh_result: Float32
        if sum > 5.0:
            tanh_result = 1.0
        elif sum < -5.0:
            tanh_result = -1.0
        else:
            # High-quality tanh approximation
            abs_sum = sum if sum >= 0.0 else -sum
            sum_squared = sum * sum
            denominator = 1.0 + abs_sum / 3.0 + sum_squared / 15.0
            tanh_result = sum / denominator

        # Store result
        output_buffer[0, idx] = tanh_result
1 Like

This is a very common point of friction that we’d like to improve in LayoutTensor. That said, yes, you found the right fix!

1 Like