The code performs a billion iterations. In each iteration, it carries out complex SIMD calculations (element-wise subtraction and modulo) on 64-bit vectors. Then, it attempts to compare the scalar value to_find
with the fourth element of the resulting SIMD vector. If found, it records the information by adding the result (True as 1, False as 0) to FF
.
Is there a faster way to do this? I’m only interested in whether True
occurred within a given range. For large range values, e.g., 2**50
, the time taken is terrible.
fn main() raises:
alias F = SIMD[DType.uint64, 4]
var n = F(18446744073709551615, 18446744073709551614, 13451932020343611451, 13822214165235122497)
var p = F(18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744069414583343)
var to_find = UInt64(4624529904179460815)
var FF = Int()
for i in range(2**30):
FF = FF.__add__((to_find in ((p - i).__mod__(n))[3]))
print("founded result :", FF, " --- 1-yes 0-no")
You probably want SIMD.reduce_bit_count() > 0
can you show how to do? I’m still learning ..
fn main() raises:
alias F = SIMD[DType.uint64, 4]
var n = F(18446744073709551615, 18446744073709551614, 13451932020343611451, 13822214165235122497)
var p = F(18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744069414583343)
var to_find = UInt64(4624529904179460815)
var FF = Int()
alias B = SIMD[DType.bool, 1]
var FB = B()
for i in range(2**30):
# FF = FF.__add__((to_find in ((p - i).__mod__(n))[3]))
FB = B(to_find in ((p - i).__mod__(n))[3]).reduce_bit_count() > False
print("founded result :", FB, " --- 1-yes 0-no")
unfornatelly not work
alias F = SIMD[DType.uint64, 4]
var n = F(18446744073709551615, 18446744073709551614,13451932020343611451,13822214165235122497)
var p = F(18446744073709551615,18446744073709551615,18446744073709551615,18446744069414583343)
var to_find = UInt64(4624529904179460815)
var FF = UInt64()
for i in range(2**30):
var modified = (p - i).__mod__(n)
FF += (to_find in modified).cast[DType.uint64]()
print("founded result :",FF," --- 1-yes 0-no")
I thought you were actually doing a vectored comparison, but since you only consider the last entry in the vector this works fine.
I got /home/ubuntu/example-project/test.mojo:12:36: error: ‘Bool’ value has no attribute ‘cast’
FF += (to_find in modified).castDType.uint64
which version you have? I got Nightly 2025.07.3007
Whops, I’m on a weird version. Use this instead of that line:
FF += 1 if to_find in modified else 0
as benchmark for 2**45 is the same. as mine version. I think it should be writing directly from interpreter as new instruction.
The thing you’re benchmarking is a serial dependency chain, SIMD won’t help you here because you’re ignoring the first 3 lanes of the 4 you have in your code. Mine does actually do a vectored compare since I tossed that out, but you won’t make this much faster since the CPU is stuck running a fairly tight loop.