I am trying the following code to see if I can use the AVX-512 k-register to perform the operation. I am having trouble with an error in the register assignment. Is there a good workaround?
When using AVX-512 instructions with Mojo’s inline assembler, the compiler incorrectly assigns the vector operand to a 128-bit (xmm
) register instead of the expected 512-bit (zmm
) register, despite the explicit declaration of a 512-bit vector (SIMD[DType.int16, 32]
).
Reproduction steps:
- Define a function using
SIMD[DType.int16, 32]
:
from sys._assembly import inlined_assembly
@always_inline("nodebug")
fn cmpgt_epi16_mask(val:SIMD[DType.int16, 32], rsh:SIMD[DType.int16, 32]) -> UInt32:
var mask: UInt32 = 0
mask = inlined_assembly[
"""
vpcmpw $$5, $1,$2, $0
""",
UInt32,
constraints = "=k,v,v",
# constraints = "=k,z,z",
has_side_effect = False
](mask, val, rsh)
return mask
fn main():
var vec1 = SIMD[DType.int16, 32](0)
var vec2 = SIMD[DType.int16, 32](0)
for i in range(0,32):
vec1[i] = Int16(i-16)
vec2[i] = Int16(16-i)
print(vec1,vec2,hex(Int(cmpgt_epi16_mask(vec1,vec2))))
- Compile and observe the resulting error:
error: <inline asm="">:2:26: invalid operand for instruction
vpcmpw $5, %xmm0,%zmm1, %k0
^~~~~
Expected behavior:
The compiler should assign both operands (val
and rsh
) to zmm
registers, consistent with their declared vector size (512 bits).
Actual behavior:
The compiler incorrectly assigns one operand to a 128-bit (xmm
) register, causing an instruction mismatch.