Apache Arrow is a universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics — a language-independent, column-oriented memory format organized for efficient analytic operations on modern hardware. It powers Pandas 2.0, Polars, Spark, DataFusion and virtually every modern data tool and formats like Apache Parquet. PyArrow alone — just one implementation out of a dozen, just one distribution channel — is downloaded 300 million times a month. Arrow isn’t a niche format; it’s infrastructure. Mojo needs it as a first-class citizen.
That’s why I started Marrow: a native Apache Arrow implementation in Mojo.
Where it is today
The core abstractions are in place — arrays, builders, compute kernels, Python bindings, and zero-copy interop with PyArrow via the C Data Interface. The implementation is actively growing toward feature parity with other Arrow implementations, with new types and kernels added regularly. There is also early experimental GPU support via Mojo’s DeviceContext.
Performance
Early benchmarks are promising — take them with a grain of salt since PyArrow backed by Arrow C++ is a heavily optimized library and benchmarking is still ongoing. On Python-to-Arrow conversions, Marrow is already 1.3–3.9x faster than PyArrow for numeric, string, and nested list types. Bitmap operations (popcount, AND, OR, invert) benefit nicely from Mojo’s clean SIMD abstractions. Even at this experimental stage, computations with pre-loaded Arrow arrays on GPU show promising numbers.
Come contribute
Lots of room to grow — datetime/decimal/dictionary types, C Data Interface completion, more kernels. If you’re into systems programming, data formats, or GPU compute in Mojo, jump in.
I had the same idea in mind for month now of building an Arrow lib in Mojo as it is fundamental in todays data stack
Initially was thinking (very ambitious / unrealistic) of rebuilding (basic functinality of) data processing frameworks like polars / spark / DataFusion which ofc is yeeeeeears++ of work but depending on how much the AI/Agents evolve slightly more realistic
Do you see Marrow as simply the implementation of the Arrow standard in Mojo or do you see it becoming more tightly coupled to higher level abstractions like the relationship between arrow-rs and `polars` ?
I would love to see and collaborate with someone on a geospatial observation data ingest, QC and even post processing.
I originally planned to implement a low-level only arrow library suitable for the memory representation and interoperability but I was too curious about what compute performance could it achieve so I went ahead and implemented a basic expression system (long run can be somewhat similar to ibis) and a compute layer. After some tuning I managed to reach e.g. 2x better single-threaded join performance than polars (despite that polars is highly optimized) and several times better performance than Arrow C++’s compute layer. Essentially 80-90 percent of my existing benchmarks show better performance than the other two libraries which makes me pretty optimistic.
So probably it will (already) include an execution layer. I’m generally open to include relevant features.