Modular for image / video models?

kyro · April 3, 2025, 4:53pm

Hello. We’re exploring Modular specifically for its portability across platforms. We are mostly interested in image, video model inference as well as LoRA training. I see some examples of Vision models, but nothing for image / video gen or LoRA training for an image model. Is this a use case for Modular? If so, is there any community content anyone could recommend for us to research? Thanks in advance!

BradLarson · April 10, 2025, 6:32pm

We don’t currently have specific examples of image classification, object detection, etc. models built in MAX. Our initial targets for high-level model architectures have been the most widely-used ones in generative AI, such as text → text and text + image → text models.

However, we do have many of the operations needed to build those models exposed in the MAX Graph API, such as 2-D convolution or pooling. I’ll caution that the kernels for some of these image operations haven’t yet been as finely tuned as the ones used in LLMs and multimodal models, due to our focus on generative models.

This is a use case that MAX definitely can support, and we’d love to see community exploration of this, but we haven’t yet constructed examples that show this.

owenhilyard · April 10, 2025, 6:47pm

There is an implementation of stable diffusion 1.5, but you’ll need to find the weights somewhere else since runway took them off huggingface.

Porting nodes from comfyui over to MAX shouldn’t be that difficult, especially since a lot of the basic operations are already in place.

Topic		Replies	Views
New resources for building models in MAX Models & Pipelines gpu	0	36	June 27, 2025
[Hackathon] Creating a Yolov10 architecture using max graphs Community Showcase modular-hack-weekend	6	76	June 29, 2025
Problem statement Mojo	1	93	March 13, 2025
Porting various models to MAX MAX	6	167	May 8, 2025
Multi GPU support for Gemma 3 MAX	3	240	June 24, 2025

Modular for image / video models?

Related topics