Welcome to the wild world of modern machine learning! You’re right a lot has changed since the late nineties. We’ve gone from hand-coding backpropagation, to frameworks, back to hand-coding backpropagation again in some cases.
Before I get into how things map to Modular’s stack, because you mention working with Keras I thought I’d recommend François Chollet’s “Deep Learning with Python”, which has a neat-looking third edition upcoming. That and Jeremy Howard and Sylvain Gugger’s “Deep Learning for Coders with fastai and PyTorch” are the two references I tend to provide to people getting back into modern ML. Earlier versions of those books were written before the onset of the generative AI revolution, so I’d also point to Sebastian Raschka’s “Build a Large Language Model (From Scratch)” if you’re interested in model LLM design.
Modular is building a vertically integrated stack of technologies for making AI workloads more performant and easier to deploy (and doing much more with accelerators beyond that). As such, there are multiple points at which you can hook into our stack for doing machine learning work.
At the very top level, we’ve made it really easy to serve LLMs in production on NVIDIA GPUs (AMD coming soon!) that pulls together everything Modular has been working on. Using a batteries-included Docker image or our Magic package and environment manager, you can deploy many common LLM architectures with MAX simply by pointing at weights for a model on Hugging Face. There’s an incredible amount of technology underpinning these. Give them a try, it’s really cool to use a high-performance Llama 3.1 chat interface running on GPU via MAX.
The model architectures themselves are built using our relatively new Python API, so if you’ve already started building familiarity with Python it’s not too much of a leap to start looking into MAX at that level. The ML world today lives in Python, and we want to make it easy to progressively start introducing MAX into these existing massive Python codebases.
The MAX Engine contains within it a powerful optimizing graph compiler, and our APIs start at a much more atomic level than existing frameworks like PyTorch. We’ve started building those higher levels on top of these graph computation elements, but much will continue to grow at these upper levels of abstraction. Chris above links to community projects like Endia and Basalt where higher-level ML APIs are being experimented with on these computational graph components.
Our initial focus has been on accelerating inference workloads, rather than training, because the former dominates compute loads in production. However, as can be seen in the above-linked frameworks, there’s nothing preventing the use of our low-level primitives to do graph transformations for setting up the backward pass. We just want to make sure we fully meet the needs of accelerated ML inference before expanding into training.
And then at the heart of MAX is Mojo, which we want to be the best way to program GPUs (and more!). With the 24.6 release, we’ve only just started to show how we use Mojo to write the high-performance operations at the core of our computational graphs inside ML models. Beyond that, you’ve most likely seen the amazing things that the broader Mojo community has built with Mojo as a fast, Python-family host language. It is a very powerful language for high-performance computing. I’ve talked a lot about our newer Python APIs for building graphs for ML models, because I wanted to make sure that you knew that you didn’t need to learn Mojo to use MAX for accelerating your ML models.
That’s a lot, but I wanted to show that depending on what you want to accomplish, there are many entry points for you in Modular’s stack as you progress on your deep learning journey. I’d be glad to provide additional detail on any of these areas, as well as places to start for tutorials, etc.