This works:
var tmp: SIMD[DType.uint8, 8] = SIMD[DType.uint8, 8](1, 1, 1, 1, 1, 1, 1, 1)
var mask: SIMD[DType.bool, 8] = bitcast[DType.bool](tmp)
But this doesn’t (mojo 25.1.0.dev2024121705 (67a9f701)
):
alias mask_array = InlineArray[UInt8, 8](1, 1, 1, 1, 1, 1, 1, 1)
var tmp: SIMD[DType.uint8, 8] = strided_load[8](mask_array.unsafe_ptr(), 1)
var mask: SIMD[DType.bool, 8] = bitcast[DType.bool](tmp)
Invalid bitcast
%25 = bitcast <8 x i8> %24 to <8 x i1>
${MODULAR_HOME}/envs/max/bin/mojo: error: failed to lower module to LLVM IR for archive compilation, translate module to LLVMIR failed
strided_load
doesn’t return SIMD[DType.bool, 8]
either:
alias mask_array = InlineArray[Bool, 8](1, 1, 1, 1, 1, 1, 1, 1)
var mask: SIMD[DType.bool, 8] = strided_load[8](mask_array.unsafe_ptr(), 1)
error: invalid call to 'strided_load': failed to infer parameter 'type'
And when I specify the type:
alias mask_array = InlineArray[Bool, 8](1, 1, 1, 1, 1, 1, 1, 1)
var mask: SIMD[DType.bool, 8] = strided_load[type=DType.bool, simd_width=8](mask_array.unsafe_ptr(), 1)
error: invalid call to 'strided_load': failed to infer implicit parameter 'address_space' of argument 'addr' type 'UnsafePointer'
var mask: SIMD[DType.bool, 8] = strided_load[type=DType.bool, simd_width=8](mask_array.unsafe_ptr(), 1)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: failed to infer parameter #2, parameter isn't used in any argument
var mask: SIMD[DType.bool, 8] = strided_load[type=DType.bool, simd_width=8](mask_array.unsafe_ptr(), 1)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
(strided_load
requires a boolean mask, unlike __m256 _mm256_maskload_ps(float const *mem_addr, __m256i mask)
for vmaskmovps
.)
Are these compiler bugs? If not, how do I generate a boolean mask at runtime (from an array like [1, ..., 1, 0, ..., 0]
with a variable offset) for strided_load
? In C++, I can do this:
int32_t mask_array[16] = {-1, -1, -1, -1, -1, -1, -1, -1,
0, 0, 0, 0, 0, 0, 0, 0};
__m256i mask = _mm256_loadu_si256(reinterpret_cast<__m256i *>(&mask_array[8 - i]));
__m256 x = _mm256_maskload_ps(data, mask); // loads i floats from data