EmberRegex: An Exercise in Vibe Coding in Mojo

bgreni · March 24, 2026, 3:41pm

Historically I’ve admittedly been a bit of a pessimist about AI generated coding, “Sure it’s impressive and fun for messing around, but it doesn’t really scale to complex projects”. In recent times it’s safe to say I’ve been proven wrong. Mojo in particular has been difficult to use with LLMs due to limited data in their training sets vs other languages like Python and Javascript. So how far can you get with a fairly minimal setup (no special extra training or context other than the newly released skills). Well turns out the answer is actually pretty far.

I’ve had great success employing LLMs to find and fix tough bugs, and implement small features I couldn’t be bothered to write myself in EmberJSON. The past few days I decided to dive in and try to “vibe code” an entire project from scratch. I landed on a regex library as it satisfied a few key criteria.

Very unit testable
- Having good unit tests acts as guardrails against regressions caused by the somewhat unpredictable nature of LLM generated code
Easy to benchmark
- Acts as hard evidence to guide optimization efforts
Something I know nothing about and probably wouldn’t take the time to implement myself
- A suboptimal version of something is better than nothing

So introducing EmberRegex! A pure mojo implementation of regular expressions created almost exclusively by the grace of Claude Opus 4.6. For more in depth examples you can refer to the (also entirely AI generated) README in the repo.

from emberregex import compile

def main() raises:
    var re = compile("[a-z]+")
    var result = re.search("hello world")
    if result:
        print(result)  # MatchResult(start=0, end=5)

The repo also includes a fairly extensive benchmark suite comparing performance to the builtin python re module.

════════════════════════════════════════════════════════════════════════
  Results  (lower µs = faster;  ratio = Python ÷ Ember, >1x = Ember wins)
════════════════════════════════════════════════════════════════════════

  Benchmark                           Ember (µs)  Python (µs)   Ratio  Bar (10x = full)
  ────────────────────────────────────────────────────────────────────────────────
  throughput_literal_100B                  0.030        0.233    7.8x  ███████████████░░░░░
  throughput_literal_10KB                  0.230        4.213   18.3x  ████████████████████
  throughput_literal_100KB                 1.900       24.815   13.1x  ████████████████████
  throughput_literal_1MB                  17.410      247.505   14.2x  ████████████████████
  throughput_class_10KB                    3.910       72.085   18.4x  ████████████████████
  throughput_nomatch_100KB                 1.770       24.789   14.0x  ████████████████████
  anchor_bol                               0.020        0.219   11.0x  ████████████████████
  anchor_eol                               0.030        0.221    7.4x  ██████████████░░░░░░
  anchor_word_boundary                     0.080        0.289    3.6x  ███████░░░░░░░░░░░░░
  anchor_word_boundary_miss                0.040        0.317    7.9x  ███████████████░░░░░
  anchor_bol_miss_10KB                     0.010        0.079    7.9x  ███████████████░░░░░
  multiline_bol_findall_100_lines          3.650       15.303    4.2x  ████████░░░░░░░░░░░░
  multiline_eol_findall_100_lines          5.190       59.882   11.5x  ████████████████████
  dotall_multiline_body                    0.040        0.103    2.6x  █████░░░░░░░░░░░░░░░
  named_group_date                         0.070        0.145    2.1x  ████░░░░░░░░░░░░░░░░
  named_group_email                        0.100        0.169    1.7x  ███░░░░░░░░░░░░░░░░░
  positional_group_email                   0.100        0.167    1.7x  ███░░░░░░░░░░░░░░░░░
  neg_lookahead                            0.140        0.253    1.8x  ███░░░░░░░░░░░░░░░░░
  neg_lookbehind                           0.120        0.252    2.1x  ████░░░░░░░░░░░░░░░░
  password_validation_lookahead            0.230        0.355    1.5x  ███░░░░░░░░░░░░░░░░░
  alternation_4                            0.010        0.099    9.9x  ███████████████████░
  alternation_16                           0.010        0.101   10.1x  ████████████████████
  alternation_16_miss                      0.020        0.083    4.1x  ████████░░░░░░░░░░░░
  findall_3_matches                        0.210        0.217    1.0x  ██░░░░░░░░░░░░░░░░░░
  findall_100_matches                      2.570       12.071    4.7x  █████████░░░░░░░░░░░
  findall_500_dot_matches                  9.120       14.015    1.5x  ███░░░░░░░░░░░░░░░░░
  replace_50_matches                       2.390        6.914    2.9x  █████░░░░░░░░░░░░░░░
  replace_named_backref                    0.540        0.295    0.5x  █░░░░░░░░░░░░░░░░░░░
  split_100_parts                          2.430        8.666    3.6x  ███████░░░░░░░░░░░░░
  pathological_optional_16                 0.030     1256.715  41890.5x  ████████████████████
  pathological_dotstar_anchored_5K         1.610        1.349    0.8x  █░░░░░░░░░░░░░░░░░░░
  pathological_dotstar_miss_5K             1.640        3.186    1.9x  ███░░░░░░░░░░░░░░░░░
  pathological_triple_backref              0.120        0.112    0.9x  █░░░░░░░░░░░░░░░░░░░
  realworld_url_parse                      0.220        0.526    2.4x  ████░░░░░░░░░░░░░░░░
  realworld_phone                          0.020        0.297   14.8x  ████████████████████
  realworld_hex_color                      0.020        0.222   11.1x  ████████████████████
  realworld_semver                         0.100        0.394    3.9x  ███████░░░░░░░░░░░░░
  realworld_key_value_findall              0.430        0.844    2.0x  ███░░░░░░░░░░░░░░░░░
  realworld_html_tag_findall               0.470        0.717    1.5x  ███░░░░░░░░░░░░░░░░░
  realworld_ws_normalize                   0.240        0.743    3.1x  ██████░░░░░░░░░░░░░░
  realworld_log_search_1000_lines          4.610       10.073    2.2x  ████░░░░░░░░░░░░░░░░
  inline_ignorecase                        0.020        0.239   12.0x  ████████████████████
  inline_multiline_search                  0.080        0.310    3.9x  ███████░░░░░░░░░░░░░
  engine_dfa_no_capture                    0.010        0.264   26.4x  ████████████████████
  engine_pike_with_capture                 0.070        0.267    3.8x  ███████░░░░░░░░░░░░░
  engine_backtrack_with_backref            0.130        0.239    1.8x  ███░░░░░░░░░░░░░░░░░
  compile_wide_char_class                  4.640       13.047    2.8x  █████░░░░░░░░░░░░░░░
  compile_8_groups                        34.670       40.544    1.2x  ██░░░░░░░░░░░░░░░░░░
  compile_nested_alternation              24.540       33.997    1.4x  ██░░░░░░░░░░░░░░░░░░
  ────────────────────────────────────────────────────────────────────────────────
  EmberRegex faster: 46  |  slower: 3

  Times are µs per operation (best of 5 × 1000 iterations).
════════════════════════════════════════════════════════════════════════

For those interested I figured I would also document my process for creating this library so far.

Initial Implementation

I had Claude create a 7 step plan for implementing the library. This simplified the work needing to be done at each step, and allowed for tests to be progressively implemented to avoid major regressions. I’m not knowledgeable enough about these systems to make any other claims about the benefits of this approach, but I do think Claude has a much easier time implementing things this way then if I had asked it to tackle it all in one attempt.

Performance Optimizations

After all the features were done, it was time to start optimizing! The initial performance was quite poor, losing over 50% of the benchmark cases to python. I gave Claude one go over of “make this thing faster” and that improved a few cases, but appeared generally too unspecific to create good results. Afterwards, I identified groups of 1-3 cases I figured were related and asked it to fix the performance of those particular cases. This approach was highly successful as it allowed it to trace what logic paths those inputs would take and identify specific bottlenecks. There were still a handful of issues that I ended up fixing myself (such as copying a value instead of moving it when it was possible to do so), but I’ll chalk that up to LLMs still not having a perfect picture of what idiomatic Mojo looks like.

After enough iterations of doing this, only 3 cases remain where we are still slower than re, and with additional effort (or tokens rather) I imagine it will soon be faster in all cases.

Going forward

Perhaps the barrier between “vibe coded slop” and actually useable software is simply how much money you have to give Anthropic. Surely this effort could have been completed in a few hours had I not been constantly hitting my session usage limits on the base-level Claude Code plan. For now I will continue pushing this project forward, and look forward to any feedback and input from the community!

owenhilyard · March 24, 2026, 3:45pm

Any thoughts on applying The Impossible Optimization, and the Metaprogramming To Achieve It ?

bgreni · March 24, 2026, 4:00pm

Thats a great idea, I unfortunately forgot about that article.

Ehsan · March 24, 2026, 4:24pm

That’s awesome!

a1exmozz · March 24, 2026, 7:47pm

Nice. Did you use Modular’s new Mojo skills at all while building it?

bgreni · March 24, 2026, 8:23pm

I believe the mojo-syntax skill ended up being fairly helpful

trojan_x · March 25, 2026, 3:03pm

For me the way I use LLMs is quite different I don’t just use one LLM I use several. For example my main ones are Google Gemini 3 pro and Anthropics Claude.

One thing I noticed is that Claude is very impressive in utilizing previously trained data to generate amazing results such as the Claude C Compiler written in Rust seems like a copy of MLIR.

On the other hand Gemini is quite awesome in generating new patterns based on previous information. So if you combine their intelligence it’s massive.

bgreni · March 25, 2026, 5:55pm

Good point, this may also fall under the umbrella of “the limit is how much money I’m willing to spend on tokens”. All the different models have their strengths and tendencies so having access to multiple different ones is useful.

trojan_x · March 25, 2026, 6:15pm

Spending money in tokens isn’t pain it’s a merit in this case of LLMs.

But the greatest threat is on hallucinations however we might have some measures to do for that.

You can look at the solutions I gave and how we can implement them on systems like EmberRegex.

bgreni · March 25, 2026, 9:16pm

In the spirit of the project. I gave Claude the article and asked it to implement it. It doesn’t quite necessarily inline everything since some patterns still create recursive cycles in the current implementation, but that can probably be fixed eventually, there does seems to still be a significant speedup in some cases due to eliminating all those dead branches however.

github.com/bgreni/EmberRegex

emberregex/static.mojo

main

"""Compile-time regex: pattern is parsed and NFA is built at compile time.

Usage:
    var re = StaticRegex["\\d+\\.\\d+"]()
    var result = re.match(input)
    var result = re.search(input)

The pattern is parsed during compilation. Invalid patterns cause an abort
at compile time. The backtracking engine is specialized per-NFA-state via
comptime parameters and @always_inline, collapsing the entire NFA interpreter
into a single inlined function with zero dispatch overhead.
"""

from std.os import abort

from .constants import (
    CHAR_BACKSLASH,
    CHAR_NEWLINE,
    CHAR_NINE,
    CHAR_ONE,

This file has been truncated. show original

github.com/bgreni/EmberRegex

emberregex/static_backtrack.mojo

main

"""Compile-time specialized backtracking engine.

Each NFA state index becomes a distinct function instantiation via comptime
parameters. The compiler eliminates dead branches (comptime if) and inlines
all calls, collapsing the interpreter into specialized code equivalent to a
hand-written matcher.

Charset membership uses the precomputed 256-bit bitmap extracted at compile
time — the SIMD bitmap materializes cleanly from comptime to runtime, giving
O(1) ASCII membership tests with zero runtime overhead.
"""

from std.collections import InlineArray

from .constants import (
    CHAR_A_LOWER,
    CHAR_A_UPPER,
    CHAR_NEWLINE,
    CHAR_NINE,
    CHAR_UNDERSCORE,

This file has been truncated. show original

Thanks again to @Verdagon for writing that post!

owenhilyard · March 25, 2026, 10:08pm

That’s great to see!

trojan_x · March 26, 2026, 4:14am

I was thinking that if you could have used pikeVM instead of NFA (Non deterministic finite automata state) you might achieve better performance in emberregex/static_backtrack.mojo

Standard object graphs involve heavy pointer-chasing, which triggers CPU cache misses. By using a Pike VM approach, you flatten the NFA into a contiguous array of instructions.

Speed: Contiguous memory allows the CPU (or NPU) to prefetch instructions efficiently.

Efficiency: Smaller memory footprint (bytes per state vs. objects) keeps the entire state machine in the L1/L2 cache.

Information Source : Google Gemini 3.1 Pro

Try it this way?

bgreni · March 26, 2026, 5:59pm

I have been experimenting with that actually for certain pathological cases that would otherwise invoke exponential behaviour in the NFA. The PikeVM implementation I have is currently much slower in every case I’ve tried than the compile time optimized NFA in StaticRegex however due to the high constant overheard in PikeVm, though there may be some optimizations to be made in the PileVM with the extra compile information we have in this case.

trojan_x · March 29, 2026, 5:17am

Since it solves those Pathological cases most far crashes then experimenting on using it is quite a meticulous approach.

If you’ve experimented separately on Emberregex I’ll like to see the code base

May I have the Link for the updates or not yet?

trojan_x · March 29, 2026, 5:38am

I been spinning through IEEE research papers and websites that’s mostly because LLMs couldn’t give me enough intelligence on this PikeVM.

I’ve been looking on optimizations here is what I extracted.

While the PikeVM is slower than a standard NFA or DFA because it must track and copy capture group offsets for every “thread,” several optimizations can significantly close the gap:

I think the most efficient form of optimization is Initial DFA implementation by using a fast DFA first.

Source: GeeksforGeeks Difference between DFA and NFA - GeeksforGeeks

Another case of what I found is this:

How do you see this approach dude?

bgreni · March 30, 2026, 8:58pm

I generated a benchmark comparison between my StaticRegex and the pcre2 JIT compiled implementation. I don’t know much about pcre2 so I’m assuming this is a fair comparison, but it appears it’s not only competitive but beats it in many cases!

trojan_x · April 13, 2026, 4:29pm

Actually dude @bgreni

The main thing isn’t about understanding pcre2 it’s about destroying it.

While comparing it to Static Regex performance pcre2 has been the industrial standard for decades it’s a legacy Regex but the design philosophy is legendary.

PCRE2 has its own Compiler which uses Backtracking and separate DFAs.

Based on the previous code in comparison between your PCRE2 and Static Regex using two different BAR_COL here is what I found in the comparison:

How is this?

Mind displaying yours?

Actually the most hilarious part in this PCRE2 is that the full meaning is Pearl Compatible Regular Expression but it’s code base is a pile of C not a single Pearl code

I think another optimization for EmberRegex is to apply Multi-pattern DFA/NFA or Super-NFA. Have you tried yourself?

Topic		Replies	Views
EmberJson: JSON parsing in pure mojo Community Showcase	43	1381	May 19, 2026
I have discovered a suspect efficiency anomaly in the mojo compiler, how to proceed? Mojo discussion , mojo-compiler , 25_1	20	479	March 8, 2025
August Community Meeting: mojo-regex optimizations and Apple GPU support Content youtube	0	47	August 12, 2025
A Benchmark with Files and Bytes (standard benchmark warnings apply) Community Showcase discussion	9	365	January 6, 2025
Metaprogramming with Python in Mojo Mojo discussion	8	849	May 26, 2025

EmberRegex: An Exercise in Vibe Coding in Mojo

Initial Implementation

Performance Optimizations

Going forward

Related topics