Mojo-bindgen: Automatic C Binding Generation for Mojo

mojo-bindgen is a tool for generating Mojo bindings directly from C headers using libclang, — similar in spirit to rust-bindgen, but targeting Mojo.
It parses header files, lowers it into IR, and maps to Mojo code that try to preserve the layout and ABI.

Why this exists

This project initially started out of practical need to generate accurate bindings for Cairo beyond LLMs and bespoke bindgen scripts. In the end - after going through the rabbit-hole of C quirks - it evolved into its own project which aims to provide reliable bindgen.

Usage

Install from PyPI:

  pip install mojo-bindgen

Note: make sure you also install libclang (check details in the readme)

Generate bindings:

  mojo-bindgen <header.h> -o <file>.mojo --linkage <owned_dl_handle|external_call>

Examples

Quick example

// logger.h
typedef void (*log_callback_t)(const char *msg);

typedef struct logger_config {
	int level;
	log_callback_t callback;
 } logger_config;

int init_logger(const logger_config *config);
# logger.mojo
from std.ffi import external_call, c_char, c_int

comptime log_callback_t = def (msg: UnsafePointer[c_char, ImmutExternalOrigin]) thin abi("C") -> None

@fieldwise_init
struct logger_config(TrivialRegisterPassable):
	var level: c_int
	var callback: log_callback_t

def init_logger(config: UnsafePointer[logger_config, ImmutExternalOrigin]) abi("C") -> c_int:
	return external_call["init_logger", c_int, UnsafePointer[logger_config, ImmutExternalOrigin]](config)

Real-world examples

The repository also includes working examples and functional tests for moderately complex C headers:

  • SQLite
  • Cairo
  • libpng

What works today

mojo-bindgen is still alpha and evolves quickly, but it already supports a useful slice of real C headers and is practical today as a starting point for generating bindings.

Core pipeline & Types

  • libclang-based parsing (full C frontend)
  • configurable compile args
  • primitives, typedef chains
  • pointers with const-aware mutability
  • arrays (fixed and incomplete)
  • typedef and typedef chains are presrved via comptime declarations
  • complex numbers and vector types (maps to ComplexSIMD and SIMD[dtype, n] vectors respectively)
  • Atomics (maps to std.atomic.Atomic)

Records

  • structs (including mixed layouts)
  • incomplete records are mapped to opaque structs
  • bitfields (backing storage + generated accessors)
  • unions → UnsafeUnion[...] or opaque fallback
  • layout-preserving opaque storage as a fallback.

Functions & FFI

  • non-variadic functions
  • function pointers and callbacks
  • two linkage modes:
    • external_call
    • owned_dl_handle

Globals & Macros

  • globals via wrapper structs (with load() / store())
  • constants and foldable macros → comptime
  • literals (int/float/string/char)
  • foldable expressions and sizeof

Current limitations

  • Macros: function-like macros, predefined macros, and more complex preprocessor behavior are preserved and emitted as comments to be reviewed by end-user.
  • Variadics: variadic C functions are not supported yet, mojo itself had some recent advancement in supporting variadic function, so this may change soon.
  • Non-prototype / K&R-style functions: older C function declaration are partially modeled, but honestly, don’t use those.
  • Records with hostile layouts: some packed, ABI-sensitive, or otherwise difficult record layouts cannot be emitted as fully typed Mojo structs and fall back to opaque storage or in some edgecases would emit the wrong layout .
  • Anonymous members : anonymous struct/union members are preserved, but they are not automatically promoted into a flattened parent record.
  • Linkage edge cases: inline, compiler-extension hints are stripped for now, which may result in linking against absent symbols, requires manual review.

Repository

https://github.com/MoSafi2/mojo_bindgen

This is fantastic! I’ve been poking at it a bit and I found a few quick issues:

  • It doesn’t handle libraries that import arch-specific headers well. Those are almost always compiler specific and gcc is the default compiler, and the way that cc is handled it defaults to gcc’s compiler-specific headers on Linux.
  • Could it be made to respect the CFLAGS and CC environment variables?
  • It looks like you don’t handle newlines in macros.
  • You might need some form of forward declaration handling. I got a LOT of duplicate definitions when dealing with libraries which heavily use PIMPL (many ABI stable C libraries).

I did throw it at a “use the whole language” library (DPDK), so that probably surfaces a lot of weird stuff.

Hello Owen, thanks for taking a look

You might need some form of forward declaration handling. I got a LOT of duplicate definitions when dealing with libraries which heavily use PIMPL (many ABI stable C libraries).

I added a small canonicalization pass now which should deduplicate multiple forward declaration, so they should now resolve to one opaque mojo struct. if you install the package again from source, this should be solved (should be also available in the next release).

It looks like you don’t handle newlines in macros.

could you give me an example?

Also, it would be great if you point me to a DPDK header file that can be used as a north star to benchmark progress? currently, I think I will work on stabilzing the internals first, and then I can start focusing on supporting more use cases and it will be great to be guided by practical examples.

could you give me an example?

Also, it would be great if you point me to a DPDK header file that can be used as a north star to benchmark progress? currently, I think I will work on stabilzing the internals first, and then I can start focusing on supporting more use cases and it will be great to be guided by practical examples.

Sadly different parts of DPDK are using different parts of C, so I don’t have one there. However, DPDK does use a very good subset of C so I like to use it to test binding generators.