How to iterate over a `List` using `SIMD` in Mojo

martinvuyk · December 9, 2024, 7:26pm

from sys.info import simdwidthof


fn main():
    values = List[UInt8](1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    length = len(values)  # 10

    for element in values:
        _ = element  # iterates per element

    ptr = values.unsafe_ptr()

    for i in range(length // 4):  # 2
        var vec = (ptr + i * 4).load[width=4]()
        _ = vec  # it's a SIMD[DTYpe.uint8, 4]

    alias width = simdwidthof[UInt8]()

    for i in range(length // width):  # depends on the CPU
        var vec = (ptr + i * width).load[width=width]()
        _ = vec  # it's a SIMD[DTYpe.uint8, width]

    # the rest of the data i.e. length % width can be processed sequentially

owenhilyard · December 9, 2024, 8:29pm

Does it make sense to try to make an API for this which produces a SIMD list iterator, leaving the drain loop to the user? I’m not sure if we want to keep using unsafe for this.

martinvuyk · December 9, 2024, 8:36pm

I think this will be better handled by adding map, filter, sum, reduce, etc. It feels weird to give an incomplete iterator, but maybe we can if they aren’t achievable with those APIs we could offer a helper function that does that (I wouldn’t want to add it to the type’s public API though)

lesoup-mxd · December 9, 2024, 8:39pm

You’re right) Less room for mistakes this way)

owenhilyard · December 9, 2024, 8:43pm

For iterators, would we want a SIMDIterator which takes a tuple of functions and implement something which takes a width parameter on top of that? I’m thinking of masked processing (like SVE), letting you process a variable number of elements and do the drain loop in a single iteration with different intrinsics in some cases.

martinvuyk · December 9, 2024, 8:55pm

That feels like a very powerful abstraction, many types can yield SIMD vectors. I’m thinking we could simply make SpanIterator do that (for now, or maybe permanently).

iterator = iter(span)
for v in iterator.vectors():
    _ = v # SIMD[DType.?, simdwidthof[DType.?]()]
for s in iterator.scalars():
    _ = s # Scalar[DType.?]

owenhilyard · December 9, 2024, 9:10pm

The reason I’m leaning towards asking for multiple functions is to handle drain loops inline, or deal with platforms that natively have masks and will let you write one function for all of it like SVE.

martinvuyk · December 9, 2024, 9:21pm

wouldn’t that be a low-level optimization that we could bake into Span.map()'s implementation ?

owenhilyard · December 9, 2024, 9:30pm

We can have a version which takes a scalar and drops it into algorithm.functional.vectorize, but part of Mojo’s appeal is letting people use those low-level optimizations. If we don’t expose it in some way, we violate the zero-overhead principle because someone could write a better one themselves. From the “tuple/comptime list of functions” version, we can implement more friendly and easier to use ones, like handling one which is generic over width or taking a single vector and a scalar, or just a scalar.

system · June 7, 2025, 9:31pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to apply a vectorized function to a List inplace in Mojo Mojo discussion , 24_5 , docs	4	109	June 14, 2025
How to reverse data using SIMD in Mojo Mojo discussion , 24_5 , docs	1	64	June 12, 2025
How to uppercase and lowercase ASCII strings using SIMD in Mojo Mojo discussion , 24_5 , docs	6	104	June 12, 2025
How to check if a scalar value is contained in a List using SIMD Mojo discussion , 24_5 , docs	1	59	June 12, 2025
Question on UnsafePointer and SIMD Mojo	3	64	April 27, 2025

How to iterate over a `List` using `SIMD` in Mojo

Related topics