Question: vpermi2b inline assembly output incorrect in loop context due to register allocation

Maybe you could try this

from sys import llvm_intrinsic

alias T = SIMD[DType.int8, 64]

@always_inline("nodebug")
fn vpermi2b(a: T, b: T, idx: T) -> T:
  return llvm_intrinsic["llvm.x86.avx512.vpermi2var.qi.512", T](a, idx, b)
1 Like