This is a companion discussion topic for the original entry at https://www.modular.com/blog/matrix-multiplication-on-nvidias-blackwell-part-1-introduction
I enjoy reading this educational post - learning a lot!
BTW: I think the K,N
of Matrix B
on this image must be switched to be consistent.
I agree. Nice catch. The following two images (one cuda core and one tensor core) would also then need to have k and n of matrix b transposed.
I grabbed a 3x5 card just to draw it out and remember my n-dimensional tensors and then thought, “The internet must have something”. Of course, it obliged. I’m sure there are better pictures than this but this one confirmed it for me.
There is of course the possibility that Modular is doing something my brain cannot conceptualize and / or visualize.
And also, way cool deep dive into matmul on the GPU, strong work Modular. I think I could even ELI5 now if I had to.
Thanks for catching this! Indeed, we mixed up the indices on those diagrams. They’re fixed now.