Modular weekend hack project - NMS (non max suppression) kernel in Mojo + Pytorch integration with YOLOv10

pbanavara · June 29, 2025, 6:14pm

I tried this in a previous hackathon and couldn’t get it to compile. Now I have the kernel ready. Will try integrating with Pytorch + YOLOv10 and see what improvements I can achieve.

This is largely experimental. The GPU speedups are seen for boxes 1000 and above in object detections. Practically this may not matter much as you mostly see < 100 detections per image.

BradLarson · June 29, 2025, 7:19pm

The big wins for this may be simply the ability to keep the inputs and outputs on GPU, rather than incurring device->host copies, so I wouldn’t worry too much about the performance of the kernel itself. Just having it work reliably on GPU in a way that matches reference implementations is valuable.

Topic		Replies	Views
Modular: Modular GPU Kernel Hackathon Highlights: Innovation, Community, & Mojo🔥 Content blog	0	45	May 20, 2025
The Modular GPU Kernel Hackathon highlight reel just dropped! Content	0	38	May 12, 2025
Modular: Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon Content blog	1	37	May 30, 2025
[Hackathon] Very Initial Progress Towards Stable Diffusion in MAX Community Showcase modular-hack-weekend	0	32	June 29, 2025
[Hackathon] Experiment: CUDA Kernel → Mojo in Bitnet Community Showcase modular-hack-weekend	2	52	June 30, 2025

Modular weekend hack project - NMS (non max suppression) kernel in Mojo + Pytorch integration with YOLOv10

Related topics