[Hackathon] Very Initial Progress Towards Stable Diffusion in MAX

BlueFlame202 · June 29, 2025, 11:01pm

Project Github: GitHub - BlueFlame202/stable-diffusion-max: Stable Diffusion with MAX
Project Status: Unfinished

Over the course of the hackathon, I started exploring what it would take to bring Stable Diffusion (SDXL) inference to the MAX platform. I am quite new to hackathons, GPU programming, and everything, so the final implementation is still in early stages and not producing valid outputs yet (currently returning NaNs). The goal was to begin mapping out the architecture and figure out integration points between PyTorch-style models and MAX graphs.

So far, this has involved:

Loading official SD v1.4 weights into a partial MAX pipeline.
Testing an overall class wrapping the PyTorch implementations of UNet and the VAEs, so that in the future I can build out those components.
Beginning to investigate how custom kernels might eventually be implemented in Mojo for performance-critical pieces.

Challenges and Observations:

One of the big hurdles was figuring out how memory worked, as well as (slightly embarrassingly) getting confused over documentation, since I was working with the Nightly version.
Mojo’s low-level control and MAX’s layout system are exciting, but take time to get used to—especially when working with large, complex models like SDXL.
I didn’t yet get to implementing actual Mojo kernels or fully hooking up the graphs, but the structure is in place to support future iterations.
Despite following diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py at main · huggingface/diffusers · GitHub for inspiration (originally was trying to build SDXL), faced issues with NaN values when trying to code a simpler version.

What’s Next:

Debugging the NaNs and confirming basic correctness with small test cases.
Gradually replacing components with native MAX graphs and Mojo kernels for performance.
Wrapping tests and benchmarking into Pixi tasks to streamline reproducibility and validation.

Impact:

While this submission is mainly a starting point, my hope is that it’ll be part of the start to speeding up StableDiffusion with Modular and Mojo, and making it more cross-compatible.

Thanks to the hackathon organizers and community—looking forward to continuing this work beyond the weekend!

Topic		Replies	Views
Modular: Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon Content blog	1	37	May 30, 2025
Modular weekend hack project - NMS (non max suppression) kernel in Mojo + Pytorch integration with YOLOv10 Community Showcase modular-hack-weekend	1	26	June 29, 2025
Modular: Modular GPU Kernel Hackathon Highlights: Innovation, Community, & Mojo🔥 Content blog	0	45	May 20, 2025
Modular Hack Weekend Winners Announced! Hack Weekend	0	183	June 30, 2025
Problem statement Mojo	1	91	March 13, 2025

[Hackathon] Very Initial Progress Towards Stable Diffusion in MAX

Related topics