Historically I’ve admittedly been a bit of a pessimist about AI generated coding, “Sure it’s impressive and fun for messing around, but it doesn’t really scale to complex projects”. In recent times it’s safe to say I’ve been proven wrong. Mojo in particular has been difficult to use with LLMs due to limited data in their training sets vs other languages like Python and Javascript. So how far can you get with a fairly minimal setup (no special extra training or context other than the newly released skills). Well turns out the answer is actually pretty far.
I’ve had great success employing LLMs to find and fix tough bugs, and implement small features I couldn’t be bothered to write myself in EmberJSON. The past few days I decided to dive in and try to “vibe code” an entire project from scratch. I landed on a regex library as it satisfied a few key criteria.
- Very unit testable
- Having good unit tests acts as guardrails against regressions caused by the somewhat unpredictable nature of LLM generated code
- Easy to benchmark
- Acts as hard evidence to guide optimization efforts
- Something I know nothing about and probably wouldn’t take the time to implement myself
- A suboptimal version of something is better than nothing
So introducing EmberRegex! A pure mojo implementation of regular expressions created almost exclusively by the grace of Claude Opus 4.6. For more in depth examples you can refer to the (also entirely AI generated) README in the repo.
from emberregex import compile
def main() raises:
var re = compile("[a-z]+")
var result = re.search("hello world")
if result:
print(result) # MatchResult(start=0, end=5)
The repo also includes a fairly extensive benchmark suite comparing performance to the builtin python re module.
════════════════════════════════════════════════════════════════════════ Results (lower µs = faster; ratio = Python ÷ Ember, >1x = Ember wins) ════════════════════════════════════════════════════════════════════════ Benchmark Ember (µs) Python (µs) Ratio Bar (10x = full) ──────────────────────────────────────────────────────────────────────────────── throughput_literal_100B 0.030 0.233 7.8x ███████████████░░░░░ throughput_literal_10KB 0.230 4.213 18.3x ████████████████████ throughput_literal_100KB 1.900 24.815 13.1x ████████████████████ throughput_literal_1MB 17.410 247.505 14.2x ████████████████████ throughput_class_10KB 3.910 72.085 18.4x ████████████████████ throughput_nomatch_100KB 1.770 24.789 14.0x ████████████████████ anchor_bol 0.020 0.219 11.0x ████████████████████ anchor_eol 0.030 0.221 7.4x ██████████████░░░░░░ anchor_word_boundary 0.080 0.289 3.6x ███████░░░░░░░░░░░░░ anchor_word_boundary_miss 0.040 0.317 7.9x ███████████████░░░░░ anchor_bol_miss_10KB 0.010 0.079 7.9x ███████████████░░░░░ multiline_bol_findall_100_lines 3.650 15.303 4.2x ████████░░░░░░░░░░░░ multiline_eol_findall_100_lines 5.190 59.882 11.5x ████████████████████ dotall_multiline_body 0.040 0.103 2.6x █████░░░░░░░░░░░░░░░ named_group_date 0.070 0.145 2.1x ████░░░░░░░░░░░░░░░░ named_group_email 0.100 0.169 1.7x ███░░░░░░░░░░░░░░░░░ positional_group_email 0.100 0.167 1.7x ███░░░░░░░░░░░░░░░░░ neg_lookahead 0.140 0.253 1.8x ███░░░░░░░░░░░░░░░░░ neg_lookbehind 0.120 0.252 2.1x ████░░░░░░░░░░░░░░░░ password_validation_lookahead 0.230 0.355 1.5x ███░░░░░░░░░░░░░░░░░ alternation_4 0.010 0.099 9.9x ███████████████████░ alternation_16 0.010 0.101 10.1x ████████████████████ alternation_16_miss 0.020 0.083 4.1x ████████░░░░░░░░░░░░ findall_3_matches 0.210 0.217 1.0x ██░░░░░░░░░░░░░░░░░░ findall_100_matches 2.570 12.071 4.7x █████████░░░░░░░░░░░ findall_500_dot_matches 9.120 14.015 1.5x ███░░░░░░░░░░░░░░░░░ replace_50_matches 2.390 6.914 2.9x █████░░░░░░░░░░░░░░░ replace_named_backref 0.540 0.295 0.5x █░░░░░░░░░░░░░░░░░░░ split_100_parts 2.430 8.666 3.6x ███████░░░░░░░░░░░░░ pathological_optional_16 0.030 1256.715 41890.5x ████████████████████ pathological_dotstar_anchored_5K 1.610 1.349 0.8x █░░░░░░░░░░░░░░░░░░░ pathological_dotstar_miss_5K 1.640 3.186 1.9x ███░░░░░░░░░░░░░░░░░ pathological_triple_backref 0.120 0.112 0.9x █░░░░░░░░░░░░░░░░░░░ realworld_url_parse 0.220 0.526 2.4x ████░░░░░░░░░░░░░░░░ realworld_phone 0.020 0.297 14.8x ████████████████████ realworld_hex_color 0.020 0.222 11.1x ████████████████████ realworld_semver 0.100 0.394 3.9x ███████░░░░░░░░░░░░░ realworld_key_value_findall 0.430 0.844 2.0x ███░░░░░░░░░░░░░░░░░ realworld_html_tag_findall 0.470 0.717 1.5x ███░░░░░░░░░░░░░░░░░ realworld_ws_normalize 0.240 0.743 3.1x ██████░░░░░░░░░░░░░░ realworld_log_search_1000_lines 4.610 10.073 2.2x ████░░░░░░░░░░░░░░░░ inline_ignorecase 0.020 0.239 12.0x ████████████████████ inline_multiline_search 0.080 0.310 3.9x ███████░░░░░░░░░░░░░ engine_dfa_no_capture 0.010 0.264 26.4x ████████████████████ engine_pike_with_capture 0.070 0.267 3.8x ███████░░░░░░░░░░░░░ engine_backtrack_with_backref 0.130 0.239 1.8x ███░░░░░░░░░░░░░░░░░ compile_wide_char_class 4.640 13.047 2.8x █████░░░░░░░░░░░░░░░ compile_8_groups 34.670 40.544 1.2x ██░░░░░░░░░░░░░░░░░░ compile_nested_alternation 24.540 33.997 1.4x ██░░░░░░░░░░░░░░░░░░ ──────────────────────────────────────────────────────────────────────────────── EmberRegex faster: 46 | slower: 3 Times are µs per operation (best of 5 × 1000 iterations). ════════════════════════════════════════════════════════════════════════
For those interested I figured I would also document my process for creating this library so far.
Initial Implementation
I had Claude create a 7 step plan for implementing the library. This simplified the work needing to be done at each step, and allowed for tests to be progressively implemented to avoid major regressions. I’m not knowledgeable enough about these systems to make any other claims about the benefits of this approach, but I do think Claude has a much easier time implementing things this way then if I had asked it to tackle it all in one attempt.
Performance Optimizations
After all the features were done, it was time to start optimizing! The initial performance was quite poor, losing over 50% of the benchmark cases to python. I gave Claude one go over of “make this thing faster” and that improved a few cases, but appeared generally too unspecific to create good results. Afterwards, I identified groups of 1-3 cases I figured were related and asked it to fix the performance of those particular cases. This approach was highly successful as it allowed it to trace what logic paths those inputs would take and identify specific bottlenecks. There were still a handful of issues that I ended up fixing myself (such as copying a value instead of moving it when it was possible to do so), but I’ll chalk that up to LLMs still not having a perfect picture of what idiomatic Mojo looks like.
After enough iterations of doing this, only 3 cases remain where we are still slower than re, and with additional effort (or tokens rather) I imagine it will soon be faster in all cases.
Going forward
Perhaps the barrier between “vibe coded slop” and actually useable software is simply how much money you have to give Anthropic. Surely this effort could have been completed in a few hours had I not been constantly hitting my session usage limits on the base-level Claude Code plan. For now I will continue pushing this project forward, and look forward to any feedback and input from the community!