From OpenCL to Mojo, part 2: integrating into an existing project

Intro

In From OpenCL towards Mojo GPU kernels in digital photo editing (first look) I explored performance of an OpenCL kernel ported to Mojo.
To my delight, depending on the workload, Mojo-based kernel either performed on-par or significantly better.

As the next step, I decided try to integrate a Mojo-based kernel into an existing project.
For simplicity, I reused the kernel from the benchmarking experiment (sigmoid tonemapper).

Motivation

While it is a lot of fun to write new software from scratch, the “OpenCL graveyard” is potentially too big to rewrite all of it.
If, generally speaking, software that uses OpenCL for GPU acceleration can benefit from switching to Mojo, then there is too much software to rewrite.
A more practical approach is to rewrite only performance-critical parts.
And that’s what I attempted to do by rewriting a specific plugin for gpu-accelerated RAW photo editing software called Darktable to find out how easy or difficult it would be.

Overview

Darktable is C-based project which is a bit like a guitar pedalboard, but for photo editing: RAW image is fed into the signal chain as input, every module sequentially does its transformation of the signal, and then the final step exports signal into one of the popular image formats like JPG or PNG.

Every processing module is its own library that gets dynamically linked (thank god).
That makes it easy to integrate Mojo (compile my own lib and drop it into the correct location).

One complication that I encountered was UI code in the same library.
Darktable is GTK-based and each module has GUI-related bits of code in it.
Instead of trying to figure out how to work with GTK in Mojo, I decided to re-use existing module as much as possible and modified it to call Mojo-based library in places where it otherwise would call the OpenCL kernel.
Like this:

  • before: Darktable → libsigmoid.so → sigmoid.cl
  • after: Darktable → libsigmoid.so → libsigmoid-mojo.so

Implementation details

You can check out the diff on my fork of Darktable.
The whole thing comes down to this: in places where original code called OpenCL, refactor it to do dlopen/dlsym to call library which I compiled from Mojo code.

Like this:

typedef void (*mojo_per_channel_fn)(uintptr_t ctx, float *in, float *out, int32_t width, int32_t height, void *params);
// ...
mojo_lib = dlopen("libsigmoid_mojo.so", RTLD_LAZY | RTLD_LOCAL);
// ...
mojo_per_channel = (mojo_per_channel_fn)dlsym(mojo_lib, "sigmoid_mojo_per_channel");

Results

Behold!
This is probably the first picture on the internet that was tonemapped by code written in Mojo.

Below is the same pic without sigmoid applied (in case you wanted to see before/after)

Discussion

My experiments replacing OpenCL so far have gone very well.
I would be interested in hearing if anyone else have attempted something similar and motivation behind it.
Darktable is fast as is.
If time permits, I’m thinking to try porting something more compute intensive.

1 Like