Thank you! Really appreciate the encouragement again.
One small piece of feedback from building these examples: while learning, I struggled with the InferenceSession terminology in the tutorial’s simple addition example. The term “inference” has strong ML model connotations (to me at least - likely others?) so it took a while to realise MAX uses it more broadly to mean “graph execution” (which makes sense, aligning with TensorRT, ONNX Runtime, etc.).
I submitted a docs feedback suggesting a one-sentence clarification where InferenceSession is first introduced. Though if MAX aims to position itself as a general computational framework beyond just ML inference (which from your talks it does) names like GraphSession or ExecutionSession might communicate that more clearly. Just a thought—completely understand if alignment with existing inference engine conventions (TensorRT, etc.) is more valuable.
My full list of possible names: ComputeSession, ExecutionSession, GraphEngine, GraphExecutor, GraphRuntime, GraphSession, or RuntimeSession.
Thanks again for the positive response to the repo!
As an update, the latest MAX nightly contains a fallback path for matrix multiplication on Apple silicon GPUs, so ops.matmul now compiles and runs on those GPUs. There are significant performance optimizations to be made in matmul on that architecture, but at least it’ll work. Therefore, your linear layer example now builds using the latest nightly.
One caveat is that max.driver.Tensor has been renamed to max.driver.Buffer, so you’ll need to make that change in your code.
As a recommendation: you may want to lock the specific MAX version used in your projects within your pixi.toml to make sure they continue to build as breaking API changes come in. You can then move that version up and modernize your code manually as you are ready to use new versions. In this case, I think changing the line in your pixi.toml to read
Hi Brad Larson,Dude the core Mathematical operation for Matrix multiplication (ops.matmul as you phrase) in Apple GPUs requires high level optimization. If the Tensor is now buffer as it’s phrased Max.driver.tensor to max.driver.buffer. Logically and intellectually I’ll be interested in understanding the importance of that significant shift in either compatibility or performance.