Support GPTQ quant in MoE layer

thomasip · October 26, 2025, 6:42pm

At the moment the MoE layer in MAX does not support quantization. This is blocking implementation for popular models like Qwen3-30B-A3B with GPTQ (Text, Coder, Omni variants). It seems the only thing missing is a quant aware version of the grouped_matmul_ragged kernel.

Can the team add support for this? It would unlock a lot of MoE models to run on hardware with less vram.

Topic		Replies	Views
Where are the SOTA quantized models? General	6	119	July 29, 2025
Multi GPU support for Gemma 3 MAX	3	401	June 24, 2025
MAX Model Repository MAX	3	79	August 6, 2025
Porting various models to MAX MAX	6	212	May 8, 2025
MAX on CPU doubt and request MAX discussion , feature-request , 24_6	3	139	July 9, 2025

Support GPTQ quant in MoE layer

Related topics