Modular: Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul


This is a companion discussion topic for the original entry at https://www.modular.com/blog/matrix-multiplication-on-nvidias-blackwell-part-2-using-hardware-features-to-optimize-matmul

:exploding_head:

Strong work. So dense, so deep and still only halfway on the trip.

P.s. the old picture with matrix b k,n swapped was used like @christoph_schlumpf caught earlier.

1 Like

lol, thanks for catching that, it’s fixed now.