Compiler OOM on DGX Spark GB10 / Unified Memory machines even for 0.5b size models

gchauhan · June 21, 2026, 11:06pm

On DGX Spark GB10 and Unified memory machines, the default settings for MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_SIZE_PERCENT=90 result in compiler OOMing even for very small models like Qwen/Qwen2.5-0.5B-Instruct.

After spending a whole day trying various debug settings, code paths, finally figured out the following works.

Change to a lower value like export MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_SIZE_PERCENT=15 to get things to work. Memory spike dropped from 108G → 33G and model loaded with correct output in 96s.

Working example:

docker run -d --gpus=all \ -e HF_HUB_OFFLINE=1 -e HF_HOME=/hf \ -e MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_SIZE_PERCENT=15 \ -v ~/.cache/huggingface:/hf -p 8000:8000 \ ``docker.modular.com/modular/max-nvidia-full:nightly`` \
--model Qwen/Qwen2.5-0.5B-Instruct --devices gpu:0 \
--max-length 2048 --max-batch-size 1

Hope it helps and saves you time from chasing down different rabbit holes.

Topic		Replies	Views
Optimized Kernels for Blackwell -- do they work on GB10 Serving	2	50	June 22, 2026
Mojo compiler crashes in HPC environments with strict per-process virtual memory limits Mojo	9	226	April 21, 2026
Help with max serve performance on H100 MAX	6	159	April 14, 2026
Modular: MAX 25.2: Unleash the power of your H200's–without CUDA! Content blog	0	71	March 25, 2025
I found an issue when running step05 of the llm.modular.com book Mojo	10	294	January 9, 2026

Compiler OOM on DGX Spark GB10 / Unified Memory machines even for 0.5b size models

Related topics