HDF5 bindings in Mojo!

hdf5-mojo: High level HDF5 bindings for Mojo (first release) :tada:

I’ve been working on porting a lot of my scientific stack and well-known computational & physics libraries to Mojo (look forward to some more library spam from me xD), and HDF5 is a core dependency in that workflow, so this happened :slight_smile:

It’s honestly been really cool to see how much Mojo has evolved over the past ~2 years. Early on, porting scientific libraries was… painful :sweat_smile:. Now it’s almost smooth sailing, with a few hiccups here and there. Super excited for what’s coming next!!!

This is an early release of hdf5-mojo, a set of high-level bindings to the HDF5 C library with a Mojo-friendly API. It’s usable today, but still evolving.


What this is right now

A simple two-layer design:

  • Low-level (hdf5/bindings.mojo)
    Thin FFI wrapper over the HDF5 C API (HDF5Lib) for cases where you need full control.

  • High-level (hdf5/api.mojo)
    Ergonomic interface (H5File, NDArray) for most use cases.

In practice, you’ll almost always use the high-level API.


What works now

  • Read 1D / 2D datasets without knowing shapes ahead of time
  • Read scalar attributes from groups/datasets
  • Create files and write datasets (row-major)
  • Safe group creation via require_group
  • Automatic HDF5 library discovery via $CONDA_PREFIX (pixi-friendly)

:sparkles: Example

Check out the examples for a more complete walkthrough with sample data.

from hdf5 import H5File

def main() raises:
    var f = H5File("data.h5")

    # Read scalar attributes
    var emin   = f.read_scalar_attr[DType.float64]("/group", "min_energy")
    var nnodes = f.read_scalar_attr[DType.int32]("/group", "number_energy_nodes")

    # Read 1-D and 2-D datasets (sizes discovered automatically)
    var xs  = f.read_1d[DType.float64]("/group/dataset")
    var mat = f.read_2d[DType.float64]("/group/matrix")

    print(xs[0], mat[0, 0])

    xs.free()
    mat.free()
    f.close()
6 Likes

Wow, this is really cool! I’m thrilled to see integration into the HDF world. I suspect Mojo is really useful to many folks in the space. Do you have an idea of what a cool demo built on top of HDF support could be?

Thanks! Really appreciate it! Mojo has made this a lot more practical than it used to be.

I’m mainly using HDF5 right now to port some particle physics simulation libraries and other scientific libraries (I will be posting updates on this soon! some cool gsl stuff!) that depend heavily on it. On top of that, we just added some foundational layers for GPU support in NuMojo! So I think a pretty compelling demo would be:

  • loading HDF5 datasets directly in Mojo.
  • reading them straight into a NuMojo NDArray buffer
  • running processing on CPU/GPU seamlessly, and writing results back to HDF5.

HDF5 still has to materialize data from disk, but since it reads into a user-provided buffer, we can skip intermediate representations and write directly into our array memory. That keeps the pipeline pretty tight after I/O. Another direction I’m interested in is streaming large datasets (very common in particle physics) in chunks and applying GPU ops on the fly.

Still exploring, but definitely feels like there’s some fun stuff to build here!

1 Like

Ideally one day we would have an xarray like API so that calculations like mine written in Mojo could take advantage of parallel IO as well.