SIMD-Optimized Bloom Filters in Mojo for Large-Scale Systems

seiflotfy · August 21, 2025, 11:04pm

As a way to dive deeper into Mojo, I implemented two variants of Bloom filters:

Standard Bloom Filter (with enhanced double hashing)
Split Block Bloom Filter (SIMD/cache-optimized)

Both share a clean API for create/add/contains/merge/serialize/etc. and come with tests + benchmarks.

Repo here: https://github.com/axiomhq/mojo-bloomfilter

I’d love feedback from the community - especially around performance, API design, and Mojo best practices. Anything I could do more idiomatically, or optimizations I should try next?

ephemer · August 30, 2025, 1:49pm

Cool stuff! Thanks for sharing.

Since you asked for feedback: I’d appreciate more explanation in the docs about when to use fpr vs bpv.

On the same token, I found those two initializers pretty offputting as the entrypoints to this library. „create_for_fpv“ → „huh??“. This is subjective but IMO one should simply be picked as a default, unnamed, initializer (probably fpr). I would then write out fpr as false_positive_rate in the argument name.

Topic		Replies	Views
Lifetime/performance questions Mojo	12	272	October 10, 2025
HyperLogLog in mojo Community Showcase	13	256	September 28, 2025
I have discovered a suspect efficiency anomaly in the mojo compiler, how to proceed? Mojo discussion , mojo-compiler , 25_1	20	341	March 8, 2025
A Benchmark with Files and Bytes (standard benchmark warnings apply) Community Showcase discussion	10	292	July 5, 2025
Keccak 256 in mojo Community Showcase	1	126	November 4, 2025

SIMD-Optimized Bloom Filters in Mojo for Large-Scale Systems

Related topics