Question regarding Mojo SOTA Blackwell matmul part 2 blog: about TMA load

m4nhb4nhq · November 3, 2025, 9:40am

I had a great time reading the Mojo SOTA Blackwell matmul blog series and learned a lot about newest GPU features and architecture. While digging more deeply into each part, I’m currently confused about TMA loading from global to shared memory in part 2. In particular this explanation about consecutive TMA load operations

What confused me is since A and B matrix are both K-major (which means row-major), how does the columns of the matrix cores end up being contiguous in memory, shouldn’t it be the rows of the matrix cores instead?

Thank you in advance

Ehsan · November 3, 2025, 6:53pm

This was brought up before. I think we’re following the nvvm instrinics nvvm.cp.async.bulk.tensor.shared.cluster.global as well as the PTX instruction so the API becomes confusing. @JXL should we follow making the API more user friendly?

m4nhb4nhq · November 4, 2025, 1:41am

I’m not familar with lower ptx instructions or nvvm intrinsics yet. Are you saying that even though the data is stored with row-major layout in shared memory and global memory, the TMA will load columns of matrix cores from global memory at a time (as illustrated in the picture)?

Ehsan · November 4, 2025, 8:41pm

That’s right and is confusing! By referring to nvvm and PTX I meant basically we’re following the nvidia conventions now. Through that we’re seeing how to make the APIs more user friendly so these feedbacks are great. We’ll resolve them soon.

Topic		Replies	Views
Modular: Matrix Multiplication on Blackwell: Part 2 - Using Hardware Features to Optimize Matmul Content blog	3	65	September 6, 2025
Question regarding `copy_dram_to_sram_async` in Puzzle 16 MatMul GPU Programming gpu_puzzle	1	38	October 16, 2025
Questions regarding puzzle 14 GPU Programming	9	108	July 8, 2025
Tiled Matrix Multiplication Puzzle GPU Programming gpu_puzzle	2	73	July 4, 2025
Modular: Matrix Multiplication on Blackwell: Part 1 - Introduction Content blog	4	94	September 2, 2025

Question regarding Mojo SOTA Blackwell matmul part 2 blog: about TMA load

Related topics