DeepSeek-OCR infer at CPU

abukaram · November 15, 2025, 12:04pm

Can I infer DeepSeek-OCR with Max, using CPU only?

BradLarson · November 18, 2025, 7:28pm

In terms of built-in architectures in MAX, DeepseekV2ForCausalLM and DeepseekV3ForCausalLM are present, and are in active development. These architectures are oriented towards GPU serving, however. We have had no requests from anyone yet to run them on CPU, given their size.

The relatively new DeepSeek-OCR uses the DeepseekOCRForCausalLM architecture, which has some notable differences from the above. It’s possible to build it out in MAX, but the Modular team hasn’t yet created a version of it. Furthermore, to be practical on CPU, a heavily quantized model would need to be used and so far we’ve only optimized CPU-based quantization in the Llama family of models.

abukaram · November 19, 2025, 5:18am

Thanks for your feedback, appreciated.

Ortegon456 · December 21, 2025, 1:11pm

Running DeepSeek-OCR on CPU only isn’t really practical for inference, especially with Max, because the model and its dependencies are designed around GPU acceleration and CUDA support — CPU-only execution leads to very slow performance or may not work properly at all. Most guides and community discussions indicate that DeepSeek-OCR is best run on an NVIDIA GPU setup with adequate VRAM for reasonable throughput, and CPU-only usage isn’t recommended for real-world tasks.

owenhilyard · December 22, 2025, 12:11am

CPUs can get fairly high memory bandwidth, with AMD’s EPYC Turing being able to easily blow past all non-5090 consumer GPUs. Similarly, AVX512 does provide a very reasonable amount of flops when you consider that 256 cores of Zen 5 has the same vector width as 128 NVIDIA SMs, meaning that if you are compute bound (as many convnet models are), a large CPU is not a horrible choice vs a GPU, especially for higher batch counts. CPU instances on public cloud are often quite a bit cheaper as well, so sometimes the economics work out in the favor of CPU.

Topic		Replies	Views
MAX Model Repository MAX	3	134	August 6, 2025
Encoder-Decoder (T5) model serving support MAX	4	211	January 4, 2026
About the MAX category MAX	0	179	December 4, 2024
DeepSeek && TileLang Mojo	1	300	September 30, 2025
MAX Nightly 25.5.0.dev2025072205 Released Nightly	0	62	July 22, 2025

DeepSeek-OCR infer at CPU

Related topics