Pretty excited to say that as of the last nightly (commit), we now have the ability in MAX to run some text LLMs on AMD RDNA GPUs! This has been a progressive effort over time, with some of the background explained in this earlier call-to-action.
Briefly, modern AMD GPUs have two major, and slightly divergent, architectures: RDNA and CDNA. The CDNA architecture GPUs, like the MI300X and MI355X, are used in data centers to drive the most intense AI workloads. Modular has put a lot of effort into those, with impressive results. The RDNA GPUs are more accessible to local developers as integrated or discrete GPUs in laptops and workstations, but they have some differences with their CDNA siblings. Over the last few months, we’ve been gradually enhancing the Mojo standard library and kernels to account for these differences.
The result is that now we’re starting to see the first text LLMs working end-to-end in MAX on AMD RDNA GPUs, like a 780M integrated GPU in a laptop, an RX 9070, or a Strix Halo APU. There’s still a lot more to do here, such as improving the surface of supported models, optimizing kernels to make the most out of RDNA 3 / 3.5 / 4 GPUs, or getting more of our suite of kernel unit tests to pass on RDNA.
The really exciting thing is that you can do all of this work purely in our open-source modular repository thanks to recent advances in how we build libraries and models. If you want to help us out by adding new kernel paths for your hardware, optimizing for the unique capabilities of various generations, or otherwise hack on MAX on RDNA GPUs, we welcome the assistance!