Bison – A pandas-compatible DataFrame library for Mojo 🔥

Hey everyone! I’ve been building bison, a Mojo DataFrame library with a pandas-compatible API.

:link: GitHub: GitHub - JRedrupp/bison: A Mojo DataFrame library with a pandas-compatible API · GitHub

What is it?

The goal is simple: swap import pandas as pd for import bison as bs with minimal changes to your calling code. Bison wraps pandas DataFrames and exposes the same API, with methods being progressively ported to native Mojo implementations.

Current status

  • 299 methods implemented across DataFrame, Series, GroupBy, and accessors
  • Core aggregation, statistics, and interop methods run natively in Mojo
  • from_pandas() and to_pandas() for seamless interop
  • Methods not yet ported raise a clear bison.<method>: not implemented error
Category Implemented Stubs
DataFrame 102 38
Series 95 3
GroupBy (DataFrame) 21 0
GroupBy (Series) 17 0

Quick example

import bison as bs
from python import Python

def main() raises:
    var pd = Python.import_module("pandas")
    var pd_df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

    var df = bs.DataFrame.from_pandas(pd_df)
    print(df.shape())    # (3, 2)
    var totals = df.sum()  # Series: a=6.0, b=15.0

Install

curl -fsSL https://pixi.sh/install.sh | sh
git clone https://github.com/JRedrupp/bison.git
cd bison && pixi install

Would love feedback from the community - especially on which methods to prioritize next! Contributions are very welcome (the compatibility table in the README shows exactly what’s left to implement).

5 Likes

You might want to reconsider the name. Bison is part of the small list of software you need to build the Linux kernel: Minimal requirements to compile the Kernel — The Linux Kernel documentation

This looks pretty interesting so far, any benchmarks vs Pandas and Polars?

Okay yes - I see that could get confusing…
Let me reconsider alternative names.

Have done some comparisons so far, there’s obviously a long way to go in order to match pandas in some areas, we aren’t yet using any SIMD here.
You can see the live benchmarks here:

For a few of the more complex areas we will fall back to Pandas currently, e.g. .query

I’m interested to see this improve over time!

Another (obviously more important) reason to change the name is to stick with the bear theme. Pandas, Polars, etc.

1 Like

I wonder how the performance can be improved if it was built on top of marrow?
at least a zero-copy interop can be used between mojo, python, and rust.

Also bs as an alias is not the best :wink:

Is it “styled” on Pandas 3+ or 2.x.y ?

Does it make sense to adopt a more general https://ibis-project.org or Narwhals approach? (See also the Marrow comment in the thread.

Nice work!

I guess I’ve been wondering about how to decide between using native Python packages via Python interop vs pure Mojo.

& @owenhilyard
Thanks for the heads up on the naming!
The GNU Bison conflict is a legitimate concern - will probably cause confusion and make it harder to search for

I’m open to a rename and - given the pandas/polars bear theme - it feels right to stay in that family. A few options i’ve come up with:

  • Grizzly — the obvious heavy hitter
  • Kodiak — Kodiak bear; has a clean, modern feel
  • Bjorn — Old Norse for “bear”; short and distinctive
  • Bruin — archaic English/Dutch for brown bear
  • Ursa — Latin for bear; elegant and brie
  • Kermode — the rare spirit bear of British Columbia; unique enough that it’s unlikely to clash with anything

Would love to hear what you gravitate towards or if you have any other suggestions.

Personally I’m thinking import bjorn as bj

Thanks for flagging Marrow - zero-copy interop with the broader Arrow ecosystem is definitely on the roadmap. We’re currently finishing up native expression parsing for query / eval, but Marrow is high on the list after that. Arrow’s columnar memory layout would also give us a much cleaner foundation for vectorized operations down the line, so it feels like the right architectural move before we invest heavily in SIMD. Will keep the thread updated as that progresses.

On the pandas version: bison targets pandas 2.x compatibility.

On Narwhals - yes, that’s something we want to support. The plugin approach means it wouldn’t require core changes to bison, just a small companion package. Ibis is a different story; it’s a query-compilation layer rather than a drop-in replacement target, so that’s probably out of scope for what we’re building here.