Support GPTQ quant in MoE layer

At the moment the MoE layer in MAX does not support quantization. This is blocking implementation for popular models like Qwen3-30B-A3B with GPTQ (Text, Coder, Omni variants). It seems the only thing missing is a quant aware version of the grouped_matmul_ragged kernel.

Can the team add support for this? It would unlock a lot of MoE models to run on hardware with less vram.

1 Like