[Hackathon] Creating a Yolov10 architecture using max graphs

alix · June 29, 2025, 6:01am

YOLOv10_ARCHITECTURE_README.md

main

# YOLOv10 Model Architecture for MAX

## Overview

This project provides a complete implementation of the YOLOv10 model architecture for MAX (Modular AI Execution), including all necessary files to register the model and serve it with an OpenAI-compatible API.

## Project Structure

```
DOCVISION/
├── yolov10_model/                    # YOLOv10 custom architecture
│   ├── __init__.py                   # Makes architecture discoverable
│   ├── arch.py                       # Architecture registration
│   ├── model.py                      # Core model implementation
│   ├── model_config.py               # Configuration handling
│   └── weight_adapters.py            # Weight format conversion
├── serve_yolov10.py                  # Model serving script
├── yolov10_demo.py                   # Architecture demonstration
├── pixi.toml                         # Environment configuration
└── README.md                         # Project overview

This file has been truncated. show original

clattner · June 29, 2025, 3:19pm

cool!

c_h · June 29, 2025, 5:14pm

Thanks @alix! This looks fantastic. One thing that would help us out is an end-to-end example of loading weights and running some basic inference on some test cases. max serve doesn’t support image classification workflows, so the best path forward on this would be with a direct Python script.

It’s really exciting to see the work on vision classification models coming up. Thanks for sharing this work!

BradLarson · June 29, 2025, 6:11pm

Great start! From the structure of the tests and documentation, it looks like you’ve been pretty successful in using an agentic coding tool to help generate the architecture here. How did you find the experience in doing so? How hard was it to get the agent to follow our design and documentation?

One suggestion: right now the tests are the bare minimum that Claude will generate (can it load a Python module), and Claude will happily mark that as passing tests and being functional, even though the model doesn’t work. I’ve found that in order to get Claude and other agents to really get into the meat of the model, you need to provide them specific test targets and goals. For example : “produce a script that will take these images as input and return correct bounding boxes and classifications that match this example model’s output” or “run this model using max generate --custom-architecture XXX and verify it produces correct text”. Making it actually run the model will cause it to find and correct runtime issues, and verify that it builds out weight loading, etc. to get the model to function.

alix · June 29, 2025, 7:01pm

Hello Everyone,

Thanks so much for the kind words — I really appreciate it!

I completely agree about the value of an end-to-end example. I’ve been working toward that, but ran into a few challenges along the way that I wanted to share — along with some broader context.

Background & Motivation

This project is something I’ve been trying to crack for a while now — building a reliable pipeline that can detect and classify layout elements (like table rows, fields, or line items) from real-world document images, especially invoices. I’ve explored multiple approaches over time, including using LLMs, object detection models, and layout-aware vision models — so this submission is the latest iteration of that journey.

Exploring Phi-3 Vision

Alongside working with MAX Graph, I’ve also been experimenting with Microsoft’s Phi-3 Vision model (context here) to extract layout elements from documents. While Phi-3 Vision is powerful for layout detection, it’s not optimized for YOLO-style object detection — so I’ve been considering a hybrid approach:

Use Phi-3 Vision for coarse document structure (e.g., header, footer, line-item block), and
Use MAX Graph + YOLOv10 for fine-grained bounding boxes and class predictions.

My goal is to bring these together in a Python script that loads models, processes real invoices, and returns predictions both as JSON output and visual annotations — suitable for real document automation workflows.

Experience Using Cursor

I used Cursor throughout the development process, especially for navigating the modular codebase, prototyping MAX Graph modules, and generating/refining initial test scripts. It was incredibly helpful for context-aware code suggestions and for juggling multiple design iterations quickly. That said, I still had to iterate manually quite a bit to get MAX Graph modules wired up correctly, especially when it came to shape mismatches and layer definitions.

Challenges I Encountered

Input Preprocessing & Output Postprocessing
Replicating YOLOv10’s preprocessing pipeline required careful alignment with the original implementation — including image resizing, channel reordering, and normalization. Postprocessing is still in progress and involves decoding raw tensor outputs into bounding boxes, mapping class labels, restoring coordinates to the original image size, and applying Non-Maximum Suppression (NMS). These steps are critical to ensure the outputs are usable in real-world scenarios.

I’m now working on:

A run_inference.py script that loads weights, processes a test image, and outputs both visual and structured predictions.
Wrapping both Phi-3 and YOLO logic into a hybrid pipeline.
Writing goal-driven tests to validate weight loading and output correctness, not just module compilation.

Thanks again for the thoughtful feedback and support — it’s been a very rewarding experience digging into MAX Graph, and I’m excited to keep refining this!

Best,

alix · June 29, 2025, 7:04pm

Thank you Chris , I am going to take a long screen shot of this comment, frame it and look at it in the morning every day !

alix · June 29, 2025, 7:05pm

Thank you for the suggestion Chris, i will be working on it for sure.

system · December 26, 2025, 7:05pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Hackathon] Pytorch Model Converter to Max Community Showcase modular-hack-weekend	1	73	December 26, 2025
[Hackathon] YOLOv8 Performance Benchmark: PyTorch vs. Modular MAX Community Showcase modular-hack-weekend	3	84	December 26, 2025
New resources for building models in MAX Models & Pipelines gpu	0	94	June 27, 2025
Modular for image / video models? MAX	3	201	October 7, 2025
All MAX API tests can now be run via Bazel in the `modular` repository MAX	6	137	January 16, 2026

[Hackathon] Creating a Yolov10 architecture using max graphs

Related topics