Marrow — Apache Arrow in Mojo

kszucs · March 13, 2026, 7:49am

Apache Arrow is a universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics — a language-independent, column-oriented memory format organized for efficient analytic operations on modern hardware. It powers Pandas 2.0, Polars, Spark, DataFusion and virtually every modern data tool and formats like Apache Parquet. PyArrow alone — just one implementation out of a dozen, just one distribution channel — is downloaded 300 million times a month. Arrow isn’t a niche format; it’s infrastructure. Mojo needs it as a first-class citizen.

That’s why I started Marrow: a native Apache Arrow implementation in Mojo.

Where it is today

The core abstractions are in place — arrays, builders, compute kernels, Python bindings, and zero-copy interop with PyArrow via the C Data Interface. The implementation is actively growing toward feature parity with other Arrow implementations, with new types and kernels added regularly. There is also early experimental GPU support via Mojo’s DeviceContext.

Performance

Early benchmarks are promising — take them with a grain of salt since PyArrow backed by Arrow C++ is a heavily optimized library and benchmarking is still ongoing. On Python-to-Arrow conversions, Marrow is already 1.3–3.9x faster than PyArrow for numeric, string, and nested list types. Bitmap operations (popcount, AND, OR, invert) benefit nicely from Mojo’s clean SIMD abstractions. Even at this experimental stage, computations with pre-loaded Arrow arrays on GPU show promising numbers.

Come contribute

Lots of room to grow — datetime/decimal/dictionary types, C Data Interface completion, more kernels. If you’re into systems programming, data formats, or GPU compute in Mojo, jump in.

https://github.com/kszucs/marrow

clattner · March 13, 2026, 9:24pm

This is awesome, I’ve heard of a lot of people interested in Mojo for data processing. I’m thrilled to see this!

melodyogonna · March 14, 2026, 10:46am

This is great. Have you considered taking advantage of Mojo’s support for custom Literals to do something like:

var x:Int64Array = [10,20,40]

ref: modular/mojo/proposals/collection-literal-design.md at main · modular/modular · GitHub

kszucs · March 16, 2026, 7:17pm

Good idea, I wasn’t aware of that. Implemented at feat(arrays): add list literal support for PrimitiveArray and StringA… · kszucs/marrow@04b68b7 · GitHub

kszucs · March 16, 2026, 7:21pm

Glad to hear that! Mojo sits in the sweet spot for data processing and arrow is the bridge to that ecosystem.

JulianJS · April 2, 2026, 5:39am

amazing!

I had the same idea in mind for month now of building an Arrow lib in Mojo as it is fundamental in todays data stack

Initially was thinking (very ambitious / unrealistic) of rebuilding (basic functinality of) data processing frameworks like polars / spark / DataFusion which ofc is yeeeeeears++ of work but depending on how much the AI/Agents evolve slightly more realistic

Curious · April 3, 2026, 2:42am

Do you see Marrow as simply the implementation of the Arrow standard in Mojo or do you see it becoming more tightly coupled to higher level abstractions like the relationship between arrow-rs and `polars` ?

I would love to see and collaborate with someone on a geospatial observation data ingest, QC and even post processing.

kszucs · April 3, 2026, 6:15am

I originally planned to implement a low-level only arrow library suitable for the memory representation and interoperability but I was too curious about what compute performance could it achieve so I went ahead and implemented a basic expression system (long run can be somewhat similar to ibis) and a compute layer. After some tuning I managed to reach e.g. 2x better single-threaded join performance than polars (despite that polars is highly optimized) and several times better performance than Arrow C++’s compute layer. Essentially 80-90 percent of my existing benchmarks show better performance than the other two libraries which makes me pretty optimistic.

So probably it will (already) include an execution layer. I’m generally open to include relevant features.

Topic		Replies	Views
High performance, fixed size 1D and 2D arrays on a CPU General	4	172	July 29, 2025
Mojo Q3 Roadmap Update Official Announcements	2	1570	July 20, 2025
Initial support for calling Mojo from Python Python Interop	10	1461	August 13, 2025
DuckDB.mojo: Mojo bindings for DuckDB Community Showcase	7	504	February 25, 2026
Cairo and Datashader rewrites in Mojo opinion needed Mojo	20	176	February 26, 2026

Marrow — Apache Arrow in Mojo

Where it is today

Performance

Come contribute

Related topics