Calling all AMD RDNA users: help us bring full MAX support to your GPUs!

Ha, also realized I never came back and updated this thread to note that the original goal has been achieved and we can indeed run MAX models on AMD RDNA GPUs today. I’ve also been hacking on some enhancements to matmul and 2-D convolution for RDNA 3+ GPUs that I mention above, which have significantly improved performance over our initial naive implementations of those kernels. Models like FLUX.2-klein actually run fairly well locally on an AMD Strix Halo system (Framework Desktop) using MAX in our latest nightlies.