Hello. We’re exploring Modular specifically for its portability across platforms. We are mostly interested in image, video model inference as well as LoRA training. I see some examples of Vision models, but nothing for image / video gen or LoRA training for an image model. Is this a use case for Modular? If so, is there any community content anyone could recommend for us to research? Thanks in advance!
We don’t currently have specific examples of image classification, object detection, etc. models built in MAX. Our initial targets for high-level model architectures have been the most widely-used ones in generative AI, such as text → text and text + image → text models.
However, we do have many of the operations needed to build those models exposed in the MAX Graph API, such as 2-D convolution or pooling. I’ll caution that the kernels for some of these image operations haven’t yet been as finely tuned as the ones used in LLMs and multimodal models, due to our focus on generative models.
This is a use case that MAX definitely can support, and we’d love to see community exploration of this, but we haven’t yet constructed examples that show this.
There is an implementation of stable diffusion 1.5, but you’ll need to find the weights somewhere else since runway took them off huggingface.
Porting nodes from comfyui over to MAX shouldn’t be that difficult, especially since a lot of the basic operations are already in place.