Llama2.🔥 performance degradation after updating code to the latest Mojo compiler

Aydyn · November 20, 2025, 1:41am

Hi everyone.

I haven’t touched my llama2. github repo for a long time. More precisely since March 2024

I have finally refactored this legend today. Mojo compiler ver is 0.25.7.0

https://github.com/tairov/llama2.mojo

What I have immediately noticed - is the significantly degraded performance

Onstories15M.bin model on my Mac M1 - it shows ~170 tokens/sec throughput ..

Though on Mojo version 24.3 it shows ~1000 tok/sec…
yes, I still have a Mojo compiler from March 2024

Ofc, I might have some parts of the calculations done not in optimal way, not sure, still need to investigate..

Most of the SIMD calculations are done over UnsafePointers, avoiding any heavy data copying, so essentially it reproduce similar approach I used in first versions of llama2, using custom struct Matrix

So I don’t see any reasons why the code itself might be not optimal.

If anyone from the Mojo compiler team could have take a look and share insights on why there’s such a severe degradation ( and probably how to fix it ) I’d really appreciate it

Just in case older version code is here - GitHub - tairov/llama2.mojo at old-mojo-24.3

Aydyn · November 21, 2025, 8:58pm

After doing a deep round of profiling on macOS with Instruments (which was actually new to me - didn’t know it exist), I finally tracked down an issue.
With a mix of manual profiling and a few back-and-forths with Gemini 3.0, I found the culprit: an Accumulator struct that was being allocated on the heap and tanking performance.
After switching it to stack_allocation, the performance actually ended up higher than before

Topic		Replies	Views
I have discovered a suspect efficiency anomaly in the mojo compiler, how to proceed? Mojo discussion , mojo-compiler , 25_1	20	479	March 8, 2025
What's next for Mojo: near-term roadmap Official Announcements	3	1685	May 10, 2025
Lifetime/performance questions Mojo	12	334	October 10, 2025
Mojo vs max mandelbrot performance Mojo performance	5	327	May 12, 2025
Mojo compiler crashes in HPC environments with strict per-process virtual memory limits Mojo	9	216	April 21, 2026

Llama2.🔥 performance degradation after updating code to the latest Mojo compiler

Related topics