Learning MAX Graph API Through Working Examples

Learning MAX Graph API Through Working Examples

Sharing my learning journey: Six progressive examples from relu(x * 2 + 1) to production transformers

Repository: GitHub - DataBooth/max-learning: Learn MAX Graph API through 6 progressive examples (element-wise → transformers). Working code, minimal examples, benchmarks, and tests. An ongoing learning journey shared.
Version: 0.3.0

Key Features:

  • Minimal examples highlighting pure MAX Graph API without abstractions
  • DistilBERT achieving 5.58x speedup vs PyTorch on M1 CPU
  • First reported MAX Graph inference on Apple Silicon GPU (element-wise ops)
  • 49 comprehensive tests with correctness validation
  • Complete benchmarking framework with MAX vs PyTorch comparisons
  • MLP regression and CNN MNIST implementations

Perfect for: Anyone wanting to learn MAX Graph API through working code, building from basics to transformers.

Ready for community feedback! All tests passing, all examples working.

MIT Licence | Sponsored by DataBooth

5 Likes

This is really amazing, thank you for breaking this down to be so easily digestible!

1 Like

Thank you! Really appreciate the encouragement again.

One small piece of feedback from building these examples: while learning, I struggled with the InferenceSession terminology in the tutorial’s simple addition example. The term “inference” has strong ML model connotations (to me at least - likely others?) so it took a while to realise MAX uses it more broadly to mean “graph execution” (which makes sense, aligning with TensorRT, ONNX Runtime, etc.).

I submitted a docs feedback suggesting a one-sentence clarification where InferenceSession is first introduced. Though if MAX aims to position itself as a general computational framework beyond just ML inference (which from your talks it does) names like GraphSession or ExecutionSession might communicate that more clearly. Just a thought—completely understand if alignment with existing inference engine conventions (TensorRT, etc.) is more valuable.

My full list of possible names: ComputeSession, ExecutionSession, GraphEngine, GraphExecutor, GraphRuntime, GraphSession, or RuntimeSession. :slight_smile:

Thanks again for the positive response to the repo! :fire:

As an update, the latest MAX nightly contains a fallback path for matrix multiplication on Apple silicon GPUs, so ops.matmul now compiles and runs on those GPUs. There are significant performance optimizations to be made in matmul on that architecture, but at least it’ll work. Therefore, your linear layer example now builds using the latest nightly.

One caveat is that max.driver.Tensor has been renamed to max.driver.Buffer, so you’ll need to make that change in your code.

As a recommendation: you may want to lock the specific MAX version used in your projects within your pixi.toml to make sure they continue to build as breaking API changes come in. You can then move that version up and modernize your code manually as you are ready to use new versions. In this case, I think changing the line in your pixi.toml to read

modular = "==26.1.0.dev2026012005"

would lock to this morning’s nightly release.

2 Likes

Hi Brad Larson,Dude the core Mathematical operation for Matrix multiplication (ops.matmul as you phrase) in Apple GPUs requires high level optimization. If the Tensor is now buffer as it’s phrased Max.driver.tensor to max.driver.buffer. Logically and intellectually I’ll be interested in understanding the importance of that significant shift in either compatibility or performance.

And also have interest on why the the Code will Freeze if I don’t use pixi.toml as you stated.

Thanks for the feedback and pointers :slight_smile: i’ll try and get to them today

Thanks for the info and suggestion I think this is now done and I added a script to update to latest nightly and re-test with rollback if tests fail.

1 Like