This is a companion discussion topic for the original entry at https://www.modular.com/blog/matrix-multiplication-on-nvidias-blackwell-part-2-using-hardware-features-to-optimize-matmul
Strong work. So dense, so deep and still only halfway on the trip.
P.s. the old picture with matrix b k,n swapped was used like @christoph_schlumpf caught earlier.
1 Like
lol, thanks for catching that, it’s fixed now.