MojoR (A "Numba" for R)

As a PhD in Statistics, my frustration is that R has no native Numba or “GPUArray” equivalent. R lacks a native LayoutTensor (in Mojo terms) that maps efficiently to R’s column-major memory, making it impossible to write GPU kernels without dropping to C++/CUDA. This has led me to rely on other languages (Python and Julia) for my research projects.

I felt Mojo would be the ideal tool to solve this, as its ability to handle manual memory layouts and compile to bare metal makes it perfect for implementing “Numba for R.”

I am building MojoR, a JIT compiler that maps dynamic R semantics into static Mojo kernels. I just benchmarked a bivariate Gibbs Sampler (sequentially dependent and impossible to vectorize in R) on the JIT.

The Results (N=100k, Thin=100, Total=10M random samples):

  • GNU R: 19.28s

  • MojoR: 0.16s

  • Speedup: ~117x

# The Code (Standard R syntax)
gibbs_kernel <- function(N, thin) {
  mat <- matrix(0, N, 2); x <- 0.0; y <- 0.0
  # sequential loop: notoriously slow in R, fast in MojoR
  for (i in 1:N) {
    for (j in 1:thin) {
      x <- rgamma(1, 3.0, y * y + 4.0)
      y <- rnorm(1, 1.0 / (x + 1.0), 1.0 / sqrt(2.0 * (x + 1.0)))
    }
    mat[i, ] <- c(x, y)
  }
  return(mat)
}

# JIT Compile
fast_gibbs <- mojor::jit(gibbs_kernel)

I am currently building out the IR to reliably accelerate R code through JIT, creating Mojo kernels that run on the CPU/GPU without leaving the R session. Perhaps this can lead to a “MojoR language” that brings Mojo’s systems-level performance directly into the R ecosystem.

9 Likes

As a long time R user (18+ years) this is interesting as a specialized compiler but how does it map to more general R data in memory, like matrices, vectors, or data.frame ? Is there any interprocess communication back and forth with mojo like in the ‘reticulate’ package for python? Can it inline raw mojo code like the ‘inline’ package for C, C++ or Fortran?

Currently, I’m focusing on numeric array support (vectors, matrices, and higher-dimensional arrays) mapped to LayoutTensor representations, mainly for numeric computing. At this time, it works as a transpiler to Mojo, with the generated Mojo code just-in-time compiled. There is no interprocess communication: the compiled functions are invoked from R through a C-callable ABI (via .Call). I’m planning to expand this toward columnar / data.frame-like structures (at least for the common numeric/columnar cases).

So this is not a reticulate-style setup with a separate runtime and message passing. The current workflow is: transpile R → generate Mojo → JIT compile → call the resulting function via .Call. Data movement is in-process, and handled through pointers and shape/stride metadata, rather than IPC.

I haven’t implemented an inline-style mechanism for embedding raw Mojo code yet, but given that the pipeline already emits and compiles Mojo, it’s a feasible extension to allow user-provided Mojo snippets or helper functions to be linked into the JIT step.