I’m working on generating Mojo bindings for the OpenGL Core API by parsing the official Khronos XML registry. This process generates around 700 functions, but I’m running into severe compile-time performance issues.
My goal is to create something similar to the files generated by GLAD, where functions are loaded via a window manager’s function pointer loader. I’ve tried two main approaches:
Single Struct: First, I tried putting all the function pointers into a single, large GL struct (see here). The main drawback of this approach is that the API isn’t ergonomic, as this struct would need to be passed everywhere.
Global Functions: Next, I tried using global variables with _Global to hold the function pointers for each of the ~700 OpenGL functions (see here). However, the compile time remained extremely slow.
I attempted a few optimizations, such as removing @always_inline decorators and directly generating the initialization code with a Python script (instead of using a generic load_proc function), but none of them helped.
Additionally, I’ve found that the generated LLVM IR contains all the imported symbols, even those that aren’t used in my code. I suspect this is not optimal, but I’m unsure how to encourage the compiler to optimize away these unused global variables, especially since they are all initialized at startup. (For example, you can see this by generating the LLVM IR for a simple script that only displays a single color: hello.mojo).
Does anyone have any ideas what might be causing this compiler slowdown? Would splitting the bindings across multiple files (e.g., by OpenGL version) be a sound strategy, or are there better patterns in Mojo for managing large C APIs like this?
I’d appreciate any advice or insights from the Modular team or the community.
I ran a simple benchmark for build times in python, the numbers are below. I build hello.mojo that I referenced above, but for [SDL ONLY] benchmark I commented all parts that use opengl, including import.
Also to make sure that compiler does not reuse cached result and to simulate how that actual user code is edited, every new run, I iterate the second argument to sdl.create_window i.e. 1024->1025->....
Running benchmark [SDL ONLY] 5 times...
Run 1/5... 3.8506s
Run 2/5... 3.7119s
Run 3/5... 3.5159s
Run 4/5... 3.3373s
Run 5/5... 3.1834s
--- RESULTS ---
Number of runs: 5
Times: ['3.8506s', '3.7119s', '3.5159s', '3.3373s', '3.1834s']
Average: 3.5198 seconds
Standard deviation: 0.2706 seconds
Min: 3.1834 seconds
Max: 3.8506 seconds
With opengl build time jumped to about 30 seconds.
Running benchmark [OPENGL] 5 times...
Run 1/5... 33.4588s
Run 2/5... 33.0559s
Run 3/5... 31.8163s
Run 4/5... 31.1647s
Run 5/5... 33.4338s
--- RESULTS ---
Number of runs: 5
Times: ['33.4588s', '33.0559s', '31.8163s', '31.1647s', '33.4338s']
Average: 32.5859 seconds
Standard deviation: 1.0385 seconds
Min: 31.1647 seconds
Max: 33.4588 seconds
Hey @clattner! Do you happen to have any thoughts on this? Not sure who’s the best person to ask about Mojo compiler internals, but given your background, you might have some insight. Any ideas on what could be causing the slowdown or how to improve compile-time performance?
I don’t think there is any way for you to profile mojo compiler on your own, we don’t have the public infrastructure for you to profile or dump timings for each pass and see what the long pole is (yes this is something we want to create). So for now, someone in our team would need to do run the profiler, dump the IR and do some analysis. Is this some non-linear explosion in IR size, or one of the passes is O(N^2), idk. I think @weiwei.chen would be interested but she is out on vacation this week.
If you can tell me how to mechanically reproduce the slow compile time, what is the example one should run, I can at the very least create an issue in our issue tracker, and make sure someone gets to it.
I cloned your repo and ran pixi run build and then mojo hello.mojo. The latter one needs sdl and opengl packages. The github readme says “… To reference sdl and opengl packages, you have to place them into your project like a submodule.” - maybe you can just post here a shell script or add a bit more detailed instructions to your readme, that would help.
If there was some previous discussion on the forum, or elsewhere, please feel free to link it here as well, so that we can avoid re-discovering the already discussed issues.
Hey Denis, thanks a lot for the response!
I’ve updated the instructions — now you should be able to get everything running with just a couple of pixi commands. Hopefully that makes things easier to reproduce. Let me know if anything breaks