Custom MultiHead Self Attention Transformer Training Phase using AMD RX 9070 XT 16GB. Python/Pythorch Vs Mojo

DarinSimmons · January 23, 2026, 11:52pm

Welcome to the forum. @BradLarson recently posted Calling all AMD RDNA users: help us bring full MAX support to your GPUs! In other words, the RDNA experience is currently functional but the performance is still being tweaked.

When you say that mojo will get 2.5TFLOPS with a tiled and Pytorch will likely get 30-50, I’m curious where you got those numbers from.

Topic		Replies	Views
Llm.🔥: GPT-2 training in pure Mojo, with hand-written CUDA and Metal GPU kernels Community Showcase discussion , gpu , mojo-compiler	2	222	July 12, 2026
Tenmo — A lean tensor + NN library in pure Mojo Community Showcase	11	428	July 10, 2026
Metaprogramming with Python in Mojo Mojo discussion	8	902	May 26, 2025
Examples of custom CPU / GPU operations in Mojo MAX discussion , 24_6	28	1770	April 9, 2025
Initial support for writing PyTorch custom ops in Mojo Python Interop gpu	1	458	July 17, 2025