Learning MAX Graph API Through Working Examples

mjboothaus · January 16, 2026, 9:05am

Learning MAX Graph API Through Working Examples

Sharing my learning journey: Six progressive examples from relu(x * 2 + 1) to production transformers

Repository: GitHub - DataBooth/max-learning: Learn MAX Graph API through 6 progressive examples (element-wise → transformers). Working code, minimal examples, benchmarks, and tests. An ongoing learning journey shared.
Version: 0.3.0

Key Features:

Minimal examples highlighting pure MAX Graph API without abstractions
DistilBERT achieving 5.58x speedup vs PyTorch on M1 CPU
First reported MAX Graph inference on Apple Silicon GPU (element-wise ops)
49 comprehensive tests with correctness validation
Complete benchmarking framework with MAX vs PyTorch comparisons
MLP regression and CNN MNIST implementations

Perfect for: Anyone wanting to learn MAX Graph API through working code, building from basics to transformers.

Ready for community feedback! All tests passing, all examples working.

MIT Licence | Sponsored by DataBooth

clattner · January 17, 2026, 11:35pm

This is really amazing, thank you for breaking this down to be so easily digestible!

mjboothaus · January 18, 2026, 3:39am

Thank you! Really appreciate the encouragement again.

One small piece of feedback from building these examples: while learning, I struggled with the InferenceSession terminology in the tutorial’s simple addition example. The term “inference” has strong ML model connotations (to me at least - likely others?) so it took a while to realise MAX uses it more broadly to mean “graph execution” (which makes sense, aligning with TensorRT, ONNX Runtime, etc.).

I submitted a docs feedback suggesting a one-sentence clarification where InferenceSession is first introduced. Though if MAX aims to position itself as a general computational framework beyond just ML inference (which from your talks it does) names like GraphSession or ExecutionSession might communicate that more clearly. Just a thought—completely understand if alignment with existing inference engine conventions (TensorRT, etc.) is more valuable.

My full list of possible names: ComputeSession, ExecutionSession, GraphEngine, GraphExecutor, GraphRuntime, GraphSession, or RuntimeSession.

Thanks again for the positive response to the repo!

BradLarson · January 20, 2026, 2:20pm

As an update, the latest MAX nightly contains a fallback path for matrix multiplication on Apple silicon GPUs, so ops.matmul now compiles and runs on those GPUs. There are significant performance optimizations to be made in matmul on that architecture, but at least it’ll work. Therefore, your linear layer example now builds using the latest nightly.

One caveat is that max.driver.Tensor has been renamed to max.driver.Buffer, so you’ll need to make that change in your code.

As a recommendation: you may want to lock the specific MAX version used in your projects within your pixi.toml to make sure they continue to build as breaking API changes come in. You can then move that version up and modernize your code manually as you are ready to use new versions. In this case, I think changing the line in your pixi.toml to read

modular = "==26.1.0.dev2026012005"

would lock to this morning’s nightly release.

trojan_x · January 20, 2026, 6:35pm

Hi Brad Larson,Dude the core Mathematical operation for Matrix multiplication (ops.matmul as you phrase) in Apple GPUs requires high level optimization. If the Tensor is now buffer as it’s phrased Max.driver.tensor to max.driver.buffer. Logically and intellectually I’ll be interested in understanding the importance of that significant shift in either compatibility or performance.

trojan_x · January 20, 2026, 6:36pm

And also have interest on why the the Code will Freeze if I don’t use pixi.toml as you stated.

mjboothaus · January 21, 2026, 9:54pm

Thanks for the feedback and pointers i’ll try and get to them today

mjboothaus · January 26, 2026, 4:09am

Thanks for the info and suggestion I think this is now done and I added a script to update to latest nightly and re-test with rollback if tests fail.

Topic		Replies	Views
ONNX: difference in MAX cpu <-> gpu execution MAX debugging , 25_2	3	401	April 15, 2025
MAX Graph Python API built-in ops fail to compile for GPU - what's the correct pattern? MAX discussion , gpu	3	54	January 19, 2026
New resources for building models in MAX Models & Pipelines gpu	0	97	June 27, 2025
Porting various models to MAX MAX	6	255	May 8, 2025
Examples of custom CPU / GPU operations in Mojo MAX discussion , 24_6	29	1443	October 6, 2025

Learning MAX Graph API Through Working Examples

Learning MAX Graph API Through Working Examples

Related topics