[Hackathon] YOLOv8 Performance Benchmark: PyTorch vs. Modular MAX

TL;DR:

  1. Used Max to port a minimal implementation of yolo v8 nano extracted from Ultralytics lib.
  2. Accuracy is lower compared to the PyTorch model, I suspect: InterpolationMode, only BICUBIC is available from Max, PyTorch could be using BiLinear or Nearest.
  3. On the Inference Benchmark, it is showing at least 3x the improvement.
  4. On the Accuracy Benchmark, it is always low, so I didn’t add it.

Note: AI is heavily used in this; if you find any inaccuracy, pls feel free to correct.

Appreciation: shoutout to kapa.ai for answers.

Github URL: june-hackathon

Benchmark Results

Detailed Difference between PyTorch and Max Implementation.

High-Level Answer: Is the Change Significant or “Meh”?

The change is highly significant. It’s not just a minor syntax update—it represents a fundamental shift in the execution paradigm:

  • PyTorch: Uses an eager execution model. Networks are defined and run dynamically, operation by operation—ideal for flexibility and research.
  • Mojo/Modular MAX: Uses a graph-based, ahead-of-time (AOT) compilation model. It defines computation as a static graph using Python syntax, which is then compiled by the MAX Engine into a highly optimized binary targeting specific hardware.

Think of it like this:

  • PyTorch (Eager): An interpreter reading and executing your code line-by-line.
  • Modular MAX (Graph): A compiler that analyzes and optimizes your program into a fast, standalone application.

Detailed Breakdown of Key Changes

1. The Core Engine: torch vs. max

PyTorch:

  • Uses torch and torch.nn.
  • Executes ops immediately via the PyTorch runtime.
import torch.nn as nn

# This object IS the runnable layer
conv_layer = nn.Conv2d(3, 64, 3)
output = conv_layer(input_tensor)  # Execution happens here

Modular MAX:

  • Uses max.graph.ops to build a graph (no immediate execution).
from max.graph import ops

# This object DESCRIBES a convolution
conv_out = ops.conv2d(x, self.weight, ...)  # Adds a node to the graph
# No computation has happened yet.

2. Model Definition: nn.Module vs. Graph-Building Classes

PyTorch:

  • Uses nn.Module subclasses (Conv, C2f, SPPF) containing layers and nn.Parameters.
  • The forward method defines dynamic logic.
class Conv(nn.Module):
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), ...)
        self.bn = nn.BatchNorm2d(c2)
        # ...

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

Modular MAX:

  • Defines custom classes like MaxConv, MaxSPPF that describe the computation.
class MaxConv:
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True, name_prefix=""):
        self.weight = Weight(name=f"{name_prefix}.conv.weight", ...)
        self.bias = Weight(name=f"{name_prefix}.conv.bias", ...)
        # ...

    def __call__(self, x):
        conv_out = ops.conv2d(x, self.weight, ...)
        biased_out = conv_out + self.bias.to(x.device).reshape(...)
        return self.act(biased_out)

3. Weight Handling: Direct Loading vs. Fusion

PyTorch:

  • Loads weights with state_dict; BN is separate.
model.load_state_dict(state_dict, strict=True)

Modular MAX:

  • Explicitly fuses BatchNorm into Conv weights (for inference optimization).
# Fusing Conv + BN
bn_weight_key = f"{bn_prefix}.bn.weight"
if bn_weight_key in state_dict:
    # ... math to fuse bn_w, bn_b, bn_rm, bn_rv into conv_w ...
    fused_w = conv_w * scale.view(-1, 1, 1, 1)
    fused_b = bn_b - bn_rm * scale
    fused_weights_temp[target_key] = fused_w.numpy()
    fused_weights_temp[target_bias_key] = fused_b.numpy()

4. Inference Pipeline: Implicit vs. Explicit Compile

PyTorch:

  • Simple pipeline: eval() + inference.
model = load_yolo_model_from_pt(...)
model.eval()
with torch.no_grad():
    predictions = model(image_tensor)

Modular MAX:

  • Requires an explicit compile step before execution.
max_model = load_yolo_model_from_pt(...)  # Loads weights into placeholders
session = engine.InferenceSession()       # Creates a runtime session
max_model.compile(session)                # <-- THE CRITICAL COMPILE STEP
processed_output = max_model(max_tensor)  # Executes compiled graph

5. Data Handling: torch.Tensor vs. max.driver.Tensor and NumPy

PyTorch:

  • Entire pipeline uses torch.Tensor.

Modular MAX:

  • Uses max.driver.Tensor internally.
  • Requires conversion from NumPy → max Tensor → NumPy (for post-processing).
  • Post-processing (like dfl_numpy, dist2bbox_numpy) is written in pure NumPy.

Conclusion

The Mojo/Modular MAX script is a re-architecture, not a rewrite.

Feature PyTorch Script (Eager Execution) Mojo/Modular Script (Graph Compilation) Significance
Paradigm Dynamic, flexible, interpreter-like Static, optimized, compiler-like Massive
Core Lib torch.nn max.engine, max.graph Massive
Model Code Defines runnable nn.Modules Defines graph-describing classes Significant
Weights Loads weights directly, BN is separate BN is fused manually into Conv layers Significant
Inference model(tensor) session.compile(), then model.execute() Significant
Post-Proc Done with Torch tensors Done with NumPy arrays Minor

Final Thoughts

This Mojo/Modular MAX implementation is an excellent example of moving a model from a flexible research framework (PyTorch) to a production-grade, high-performance system using AOT compilation. The changes aren’t just about performance—they represent a full stack shift from dynamic to optimized execution.

Outputs:

PyTorch


Max


How are you measuring whether the MAX model is generating results that match the PyTorch version?

The logic on the weight handling is interesting, how does it do translation from the PyTorch weights to a format that MAX can use?

For accuracy, I am just comparing outputs of both, using:

if np.allclose(pytorch_np_transposed, max_np, rtol=1e-3, atol=1e-4):

for Relative tolerance and Absolute tolerance.

Its output:

For translation, we are reading weight while ignoring batchnorms in the function:

def load_and_fuse_state_dict(self, state_dict):, 

and output is a dictionary where keys are strings, matching the MAX graph’s Weight placeholders.

graph TD
    A[PyTorch .pt file] --> B{torch.load};
    B --> C[PyTorch state_dict\n(dict of torch.Tensors, NCHW layout)];
    C --> D{load_and_fuse_state_dict method};

    subgraph D [Conversion Logic]
        E{For each weight in state_dict} --> F{Has BatchNorm?};
        F -- Yes --> G[Fuse Conv+BN Math\n(in PyTorch)];
        F -- No --> H[Use Original Conv Weight];
        G --> I{Convert Tensor to NumPy & Transpose Layout};
        H --> I;
        I --> J[Store in new dictionary];
    end

    D --> K[Final 'fused_weights' dictionary\n(dict of np.ndarrays, NHWC layout)];
    K --> L{max_model.compile()};
    L --> M[Compiled, Optimized MAX Engine Executable];