Modular: Matrix Multiplication on Blackwell: Part 1 - Introduction

Modular · August 29, 2025, 4:49pm

This is a companion discussion topic for the original entry at https://www.modular.com/blog/matrix-multiplication-on-nvidias-blackwell-part-1-introduction

christoph_schlumpf · August 30, 2025, 7:54am

I enjoy reading this educational post - learning a lot!

BTW: I think the K,N of Matrix B on this image must be switched to be consistent.

DarinSimmons · August 30, 2025, 8:24pm

I agree. Nice catch. The following two images (one cuda core and one tensor core) would also then need to have k and n of matrix b transposed.

I grabbed a 3x5 card just to draw it out and remember my n-dimensional tensors and then thought, “The internet must have something”. Of course, it obliged. I’m sure there are better pictures than this but this one confirmed it for me.

There is of course the possibility that Modular is doing something my brain cannot conceptualize and / or visualize.

And also, way cool deep dive into matmul on the GPU, strong work Modular. I think I could even ELI5 now if I had to.

hogepodge · September 2, 2025, 11:54pm

Thanks for catching this! Indeed, we mixed up the indices on those diagrams. They’re fixed now.

Topic		Replies	Views
Modular: Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul Content blog	3	48	September 6, 2025
Interesting article on matmul GPU Programming	0	51	July 19, 2025
Having issues with MAX' Matmul on default Google Colab GPU (T4) General	2	54	June 15, 2025
How should I invoke `vendor_blas.matmul`? GPU Programming	1	61	May 28, 2025
GPU kernel compilation error MAX	2	110	May 30, 2025

Modular: Matrix Multiplication on Blackwell: Part 1 - Introduction

Related topics