Is this Fastest Way ? SIMD - checking whether "True" occurred within a given range

The code performs a billion iterations. In each iteration, it carries out complex SIMD calculations (element-wise subtraction and modulo) on 64-bit vectors. Then, it attempts to compare the scalar value to_find with the fourth element of the resulting SIMD vector. If found, it records the information by adding the result (True as 1, False as 0) to FF.


Is there a faster way to do this? I’m only interested in whether True occurred within a given range. For large range values, e.g., 2**50, the time taken is terrible.

fn main() raises:
    alias F = SIMD[DType.uint64, 4]
    var n = F(18446744073709551615, 18446744073709551614, 13451932020343611451, 13822214165235122497)
    var p = F(18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744069414583343)

    var to_find = UInt64(4624529904179460815)  
    var FF = Int()    

    for i in range(2**30):
        FF = FF.__add__((to_find in ((p - i).__mod__(n))[3]))

    print("founded result :", FF, "  --- 1-yes 0-no")

You probably want SIMD.reduce_bit_count() > 0

can you show how to do? I’m still learning ..:slight_smile:

fn main() raises:
    alias F = SIMD[DType.uint64, 4]
    var n = F(18446744073709551615, 18446744073709551614, 13451932020343611451, 13822214165235122497)
    var p = F(18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744069414583343)

    var to_find = UInt64(4624529904179460815)
    var FF = Int()
    alias B = SIMD[DType.bool, 1]
    var FB = B()

    for i in range(2**30):
        # FF = FF.__add__((to_find in ((p - i).__mod__(n))[3]))
        FB = B(to_find in ((p - i).__mod__(n))[3]).reduce_bit_count() > False

    print("founded result :", FB, "  --- 1-yes 0-no")

unfornatelly not work

alias F = SIMD[DType.uint64, 4]
var n = F(18446744073709551615, 18446744073709551614,13451932020343611451,13822214165235122497)
var p = F(18446744073709551615,18446744073709551615,18446744073709551615,18446744069414583343)

var to_find = UInt64(4624529904179460815)  
var FF = UInt64()


for i in range(2**30):
    var modified = (p - i).__mod__(n)
    FF += (to_find in modified).cast[DType.uint64]()
    
         
print("founded result :",FF,"  --- 1-yes 0-no")

I thought you were actually doing a vectored comparison, but since you only consider the last entry in the vector this works fine.

I got /home/ubuntu/example-project/test.mojo:12:36: error: ‘Bool’ value has no attribute ‘cast’
FF += (to_find in modified).castDType.uint64

which version you have? I got Nightly 2025.07.3007

Whops, I’m on a weird version. Use this instead of that line:

FF += 1 if to_find in modified else 0

as benchmark for 2**45 is the same. as mine version. I think it should be writing directly from interpreter as new instruction.

The thing you’re benchmarking is a serial dependency chain, SIMD won’t help you here because you’re ignoring the first 3 lanes of the 4 you have in your code. Mine does actually do a vectored compare since I tossed that out, but you won’t make this much faster since the CPU is stuck running a fairly tight loop.