I was experimenting with tiled GEMM kernel written in Mojo. Here is code . I am running this kernel on nvidia A100.
It works fine with float32 DType, but fails with bfloat16/float16, because it seems like max layout tensors doesn’t support element size < 2 as the following error message says
Traceback (most recent call last):
File "/home/ubuntu/avishnoi/need-for-speed/.venv/lib/python3.12/site-packages/max/engine/api.py", line 389, in load
_model = self._impl.compile_from_object(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Failed to run the MOToMGP pass manager:
-:1:1: error: failed to run the pass manager for offload functions
<unknown>:0:0: error: function instantiation failed
<unknown>:0:0: note: call expansion failed with parameter value(s): (..., ..., ..., ..., ...)
<unknown>:0:0: note: function instantiation failed
/home/ubuntu/avishnoi/need-for-speed/kernels/gemm/mojo/kernels/matmul.mojo:127:51: note: call expansion failed with parameter value(s): (..., ..., ..., ...)
/home/ubuntu/avishnoi/need-for-speed/kernels/gemm/mojo/kernels/matmul.mojo:21:4: note: function instantiation failed
/home/ubuntu/avishnoi/need-for-speed/kernels/gemm/mojo/kernels/matmul.mojo:73:61: note: call expansion failed with parameter value(s): (..., "swizzle": false, "eviction_policy": 0, "num_threads": 1024, "block_dim_count": 1, ..., ..., "dst.address_space`3": 3, ..., ..., ..., "dst.masked`7": false, "dst.alignment`8": 2, "src.mut`9": true, ..., ..., "src.address_space`13": 0, ..., ..., ..., "src.masked`17": false, "src.alignment`18": 2)
max/kernels/src/layout/layout_tensor.mojo:6895:4: note: function instantiation failed
max/kernels/src/layout/layout_tensor.mojo:6938:6: note: call expansion failed with parameter value(s): (..., ..., "swizzle": false, "eviction_policy": 0, "num_threads": 1024, "block_dim_count": 1, ..., ..., "dst.address_space`3": 3, ..., ..., ..., "dst.masked`7": false, "dst.alignment`8": 2, "src.mut`9": true, ..., ..., "src.address_space`13": 0, ..., ..., ..., "src.masked`17": false, "src.alignment`18": 2)
max/kernels/src/layout/layout_tensor.mojo:6724:4: note: function instantiation failed
max/kernels/src/layout/layout_tensor.mojo:6862:10: note: call expansion failed with parameter value(s): ("mut": true, ..., ..., "address_space": 3, ..., ..., ..., "is_masked": false, ..., "eviction_policy": 0, "src.mut`2x1": true, ..., ..., "src.address_space`2x5": 0, ..., ..., ...)
max/kernels/src/layout/layout_tensor.mojo:5503:8: note: function instantiation failed
max/kernels/src/layout/layout_tensor.mojo:5641:9: note: constraint failed: copy_from_async only allows 4, 8, 16 bytes element
wondering is there any specific reason for not supporting bfloat16/float16?