Compiler OOM on DGX Spark GB10 / Unified Memory machines even for 0.5b size models

On DGX Spark GB10 and Unified memory machines, the default settings for MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_SIZE_PERCENT=90 result in compiler OOMing even for very small models like Qwen/Qwen2.5-0.5B-Instruct.

After spending a whole day trying various debug settings, code paths, finally figured out the following works.

Change to a lower value like export MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_SIZE_PERCENT=15 to get things to work. Memory spike dropped from 108G → 33G and model loaded with correct output in 96s.

Working example:

docker run -d --gpus=all \ -e HF_HUB_OFFLINE=1 -e HF_HOME=/hf \ -e MODULAR_DEVICE_CONTEXT_MEMORY_MANAGER_SIZE_PERCENT=15 \ -v ~/.cache/huggingface:/hf -p 8000:8000 \ ``docker.modular.com/modular/max-nvidia-full:nightly`` \
--model Qwen/Qwen2.5-0.5B-Instruct --devices gpu:0 \
--max-length 2048 --max-batch-size 1

Hope it helps and saves you time from chasing down different rabbit holes.