Llama2.🔥 performance degradation after updating code to the latest Mojo compiler

After doing a deep round of profiling on macOS with Instruments (which was actually new to me - didn’t know it exist), I finally tracked down an issue.
With a mix of manual profiling and a few back-and-forths with Gemini 3.0, I found the culprit: an Accumulator struct that was being allocated on the heap and tanking performance.
After switching it to stack_allocation, the performance actually ended up higher than before :fire: :clap:

7 Likes