Ish: a grep-like CLI tool for filtering using alignment

This tool uses the current best available local and semi-global SIMD algorithms to do optimal alignments between the input query, and the records being searched. I’ve been working on it a showcase for Mojo in Bioinformatics. There is also a GPU implementation that is respectably fast assuming large enough inputs. Check it out and feel free to leave any feedback!

Try it out locally with:

pixi global install -c conda-forge -c https://repo.prefix.dev/modular-community -c https://conda.modular.com/max ish

Repo: GitHub - BioRadOpenSource/ish: Alignment-based filtering CLI tool
Preprint: https://www.biorxiv.org/content/10.1101/2025.06.04.657890v1

5 Likes

Thanks for sharing the presentation today.

I was curious what’s a normal width and length for bioinformatics. I saw the 16 and 32 widths and lengths from the hundreds to the thousands.

1 Like

It can vary greatly!

For simd widths, generally whatever the hardware can offer. For what I was presenting though, there is a lower limit where I want to make sure that the simd vector width isn’t much larger than the query length. AVX512 can sometimes be too much in that case.

For query lengths, for bioinformatics it clusters around 100-200, and then there are more rare, but still prevalent lengths in the thousands. So the speed ups shown for simd strip mining / AVX512 have some very real real-world use cases.

But ish does also want to support normal “grep” like searches, which are probably less than length 50.

So, it really covers the whole spectrum of simd widths and query lengths!

1 Like