Very slow compile times with FFI functions

I’m working on generating Mojo bindings for the OpenGL Core API by parsing the official Khronos XML registry. This process generates around 700 functions, but I’m running into severe compile-time performance issues.

My goal is to create something similar to the files generated by GLAD, where functions are loaded via a window manager’s function pointer loader. I’ve tried two main approaches:

  1. Single Struct: First, I tried putting all the function pointers into a single, large GL struct (see here). The main drawback of this approach is that the API isn’t ergonomic, as this struct would need to be passed everywhere.
  2. Global Functions: Next, I tried using global variables with _Global to hold the function pointers for each of the ~700 OpenGL functions (see here). However, the compile time remained extremely slow.

I attempted a few optimizations, such as removing @always_inline decorators and directly generating the initialization code with a Python script (instead of using a generic load_proc function), but none of them helped.

Additionally, I’ve found that the generated LLVM IR contains all the imported symbols, even those that aren’t used in my code. I suspect this is not optimal, but I’m unsure how to encourage the compiler to optimize away these unused global variables, especially since they are all initialized at startup. (For example, you can see this by generating the LLVM IR for a simple script that only displays a single color: hello.mojo).

Does anyone have any ideas what might be causing this compiler slowdown? Would splitting the bindings across multiple files (e.g., by OpenGL version) be a sound strategy, or are there better patterns in Mojo for managing large C APIs like this?

I’d appreciate any advice or insights from the Modular team or the community.

1 Like

I’m running into severe compile-time performance issues

Can you quantify what you are experiencing? Some numbers you are seeing would help as a data point.

I ran a simple benchmark for build times in python, the numbers are below. I build hello.mojo that I referenced above, but for [SDL ONLY] benchmark I commented all parts that use opengl, including import.
Also to make sure that compiler does not reuse cached result and to simulate how that actual user code is edited, every new run, I iterate the second argument to sdl.create_window i.e. 1024->1025->....

Running benchmark [SDL ONLY] 5 times...
Run 1/5... 3.8506s
Run 2/5... 3.7119s
Run 3/5... 3.5159s
Run 4/5... 3.3373s
Run 5/5... 3.1834s

--- RESULTS ---
Number of runs: 5
Times: ['3.8506s', '3.7119s', '3.5159s', '3.3373s', '3.1834s']
Average: 3.5198 seconds
Standard deviation: 0.2706 seconds
Min: 3.1834 seconds
Max: 3.8506 seconds

With opengl build time jumped to about 30 seconds.

Running benchmark [OPENGL] 5 times...
Run 1/5... 33.4588s
Run 2/5... 33.0559s
Run 3/5... 31.8163s
Run 4/5... 31.1647s
Run 5/5... 33.4338s

--- RESULTS ---
Number of runs: 5
Times: ['33.4588s', '33.0559s', '31.8163s', '31.1647s', '33.4338s']
Average: 32.5859 seconds
Standard deviation: 1.0385 seconds
Min: 31.1647 seconds
Max: 33.4588 seconds
1 Like