SIMD-Optimized Bloom Filters in Mojo for Large-Scale Systems

As a way to dive deeper into Mojo, I implemented two variants of Bloom filters:

  • Standard Bloom Filter (with enhanced double hashing)

  • Split Block Bloom Filter (SIMD/cache-optimized)

Both share a clean API for create/add/contains/merge/serialize/etc. and come with tests + benchmarks.

Repo here: https://github.com/axiomhq/mojo-bloomfilter

I’d love feedback from the community - especially around performance, API design, and Mojo best practices. Anything I could do more idiomatically, or optimizations I should try next?

4 Likes

Cool stuff! Thanks for sharing.

Since you asked for feedback: I’d appreciate more explanation in the docs about when to use fpr vs bpv.

On the same token, I found those two initializers pretty offputting as the entrypoints to this library. „create_for_fpv“ → „huh??“. This is subjective but IMO one should simply be picked as a default, unnamed, initializer (probably fpr). I would then write out fpr as false_positive_rate in the argument name.