Intro
I use Darktable quite a bit for my photo editing.
Even though it has GPU acceleration via OpenCL, I still find the experience noticeably laggy even on a Radeon 9070 (RDNA4).
After a little research, I learned that Darktable’s OpenCL kernels can be hit-or-miss and run faster on comparable Nvidia hardware (vs AMD).
This motivated me to investigate whether rewriting image processing kernels in Mojo would yield better performance.
About the test subject
Darktable is a modular editor. You are free to choose from several dozen modules to enable for your image processing pipeline, but a sensible default is available.
For my experiment, I picked the Sigmoid module. It’s a great default tonemapper for a processing pipeline as of Q1 2026. Ultra-short primer on tonemappers: the dynamic range of light captured by modern camera sensors is about 2x wider than the range which, for example, the sRGB colorspace can represent. A tonemapper compresses the image into a narrower range while attempting to preserve as much detail in the highlights and shadows as possible.
An important note which will be relevant later: Sigmoid has two modes. Without diving into tonemapping theory and math: 1) RGB-ratio — worse visual result / simpler computation; 2) Per-channel — more visually appealing / more involved computation.
Darktable code is organized such that modules reside in src/iop.
Some modules call GPU-accelerated subroutines which are in data/kernels.
In my experiment, I focused on the sigmoid.cl OpenCL kernel specifically and did not touch the sigmoid.c module where the bulk of the code resides.
Testing and Validation
Before measuring speed, I focused on achieving numerical parity.
I algorithmically generate a 24-megapixel test image (a typical modern sensor size) which is essentially a color and brightness gradient, apply the sigmoid transformation, grab several pixels from predefined coordinates from different regions, and compare OpenCL versus Mojo. validate_parity.py does this step.
The OpenCL benchmark outputs before/after images for visual inspection as well.
Results
| Kernel Variant | OpenCL GPU (ms) | Mojo GPU (ms) | Speedup (Mojo vs OpenCL) |
|---|---|---|---|
| RGB Ratio | 1.43 | 1.52 | 0.94x |
| Per-Channel | 3.46 | 1.56 | 2.22x |
Code
I invite you to explore my fork of Darktable and see if I made any silly mistakes. benchmark/ is the folder of interest.
The idea is simple: represent image as 3d LayoutTensor of width x height x 4 (because rgba) and then crunch through it with elementwise() (shoutout to puzzle 23)
At a high level, the findings seem reasonable: Mojo and OpenCL are roughly on par when doing simple computation, but Mojo performs significantly better on more complex computation.
benchmark_sigmoid.c is a benchmark wrapper around the OpenCL kernel that ships with Darktable.
sigmoid_benchmark_gpu.mojo is a 2-in-1 implementation and benchmark (uses Mojo’s built-in bench).
Discussion and Future work
Other modules
I picked Sigmoid semi-randomly (my educated guess at what might be the most commonly used module).
There are many others that do non-trivial color manipulations.
Based on the results so far, every module that does per-channel computations could potentially benefit from a Mojo rewrite.
Full module rewrite
Most of the Sigmoid implementation resides in sigmoid.c, which calls sigmoid.cl.
It would be interesting to see what kind of speedup we could get if the whole module were rewritten in Mojo.
Shipping improvements to users
Darktable maintainers are cautious of anything that looks like what they call a “weekend project”.
The likelihood of them adopting Mojo in the near future is roughly zero.
I’m not sure at this time how to bring performance improvements to Darktable users (share your thoughts if you do!), but I imagine it will require C/Mojo interop.
Setup
- Radeon RX 9070 GPU
- Linux 6.19.3
- ROCm/HIP 7.2
- Darktable 5.4.1
- Mojo/Max 0.26.2 (Feb 22 build)
- Clang 21.1.8
Feel free to reach out if you find this kind of thing interesting and excited about Mojo applications beyond neural net training/inference.
~Max
