Getting started with deep learning

Hi, Bit of a beginners question.

Recently I have been learning how to do machine learning with Keras in python.

The last time I did neural nets, I wrote all the code myself in C after reading a paper on backpropagation, and that was in around 1996!! Despite gaining a masters degree in AI in 1998, I was too early to the field, and spent most of my career writing enterprise applications in Java, then technical architecture. Amazing what has happened in the mean time…

Now that I have skilled up a bit in python, I would like to learn Mojo alongside. Since I am starting new projects from scratch, and Mojo seems like a better python, I may as well attempt to do so in Mojo from the get go, where possible.

Lets say I have a small machine learning problem, and have coded up in python and Keras a solution that trains a network satisfactorily via deep learning. The trained model is in tensorflow, and I can run that in python too, or export it and run it some other way.

What are the equivalent pieces of this stack in the Mojo world, and roughly speaking how would I replicate this example problem and solution here?

Sorry my question is a little awkward but as you can hopefully see I am new to this and just trying to get my head around what the various pieces are and how they fit together. A few pointers to get me going in the right direction would be appreciated. :pray:

3 Likes

Sorry I don’t have any answers but just wanted to +1 because these are the kind of questions I have as someone not coming from an AI/ML background. I know Mojo is certainly not intended for ‘beginners’ as it stands, but I think at some point an eye must be trained on people who want to come into Mojo/Max as a native as the tech heads towards maturity.

3 Likes

Yes, maybe there is no native equivalent of Keras for Mojo yet, and that is the missing piece that would let me build deep learning nets in a similar fashion to how I currently do in python.

Looking at the docs for Max I see:

-graph | Modular Docs

“The MAX Graph API provides a low-level programming interface for high-performance inference graphs written in Mojo. It’s an API for graph-building only, and it does not implement support for training.”

Ok

-tensor | Modular Docs

Basic operations on tensors seems to be covered. So I could roll my own backprop optimizer using these two libraries?

Is this a correct assessment and there is no deep learning builder/training library yet?

About the graphs library:

-graph | Modular Docs

Does this support auto differentiation of compute graphs? I think auto differentiation is not needed for inferernce, only training? Without auto diff quite a bit more work to write an optimizer as would need to work out the differentiations manually.

Hi Will!

I have a couple of suggestions for places you might want to start. You’re right that the MAX Graph API is targeted towards inference right now. There are some Mojo projects out there that might give you a good start:

Endia
Basalt
Momograd

I’ve also found Understanding Deep Learning to be a solid resource for getting up to speed on the practical theory with solid exercises that you could implement in Mojo.

5 Likes

Thanks for that. Skimming the READMEs of Endia, Basalt and Momograd it seems like things in this area are “in their infancy”. Endia has an impressive sounding version stamp of 24.5.0, until I realise that has been done to match the Mojo version, rather than say anything indicative about the maturity of the library.

Looks like I should probably stick with doing training under python and experiment with exporting models and running them under Mojo/Max.

One question: If I do this but create a set up and runtime that runs the python code under Mojo - will that all work nicely with no issues? Only reason for trying to do this would be to gain familiatiry with using the Mojo toolchain day-to-day, since I read that it runs a standard python interpreter when used this way, there will be no advantage in performance or language features.

Also: Is there some kind of package library for Mojo? A website I could have gone to and searched for “deep learning” and found those libraries you linked to myself? How do I “discover” the eco-system here, other than through word of mouth recomendation on where to look in GitHub? Or does this kind of thing not exist yet either?

It’s still early days, but we’re building out a platform for hosting community packages along with the prefix.dev team. You can find details about it here: GitHub - modular/modular-community: A repo to hold community-submitted rattler-build recipes, to make community packages available via the modular-community prefix.dev channel

1 Like

I get a 404 on that channel? https://repo.prefix.dev/modular-community - but no big deal, I am sure it will be available when its ready. And I can see that prefix.dev indexes all its channels and makes them searchable which is nice.

Thanks for all your helpful pointers, feel like I have enough of an idea to at least make a start with playing around with it.

Welcome to the wild world of modern machine learning! You’re right a lot has changed since the late nineties. We’ve gone from hand-coding backpropagation, to frameworks, back to hand-coding backpropagation again in some cases.

Before I get into how things map to Modular’s stack, because you mention working with Keras I thought I’d recommend François Chollet’s “Deep Learning with Python”, which has a neat-looking third edition upcoming. That and Jeremy Howard and Sylvain Gugger’s “Deep Learning for Coders with fastai and PyTorch” are the two references I tend to provide to people getting back into modern ML. Earlier versions of those books were written before the onset of the generative AI revolution, so I’d also point to Sebastian Raschka’s “Build a Large Language Model (From Scratch)” if you’re interested in model LLM design.

Modular is building a vertically integrated stack of technologies for making AI workloads more performant and easier to deploy (and doing much more with accelerators beyond that). As such, there are multiple points at which you can hook into our stack for doing machine learning work.

At the very top level, we’ve made it really easy to serve LLMs in production on NVIDIA GPUs (AMD coming soon!) that pulls together everything Modular has been working on. Using a batteries-included Docker image or our Magic package and environment manager, you can deploy many common LLM architectures with MAX simply by pointing at weights for a model on Hugging Face. There’s an incredible amount of technology underpinning these. Give them a try, it’s really cool to use a high-performance Llama 3.1 chat interface running on GPU via MAX.

The model architectures themselves are built using our relatively new Python API, so if you’ve already started building familiarity with Python it’s not too much of a leap to start looking into MAX at that level. The ML world today lives in Python, and we want to make it easy to progressively start introducing MAX into these existing massive Python codebases.

The MAX Engine contains within it a powerful optimizing graph compiler, and our APIs start at a much more atomic level than existing frameworks like PyTorch. We’ve started building those higher levels on top of these graph computation elements, but much will continue to grow at these upper levels of abstraction. Chris above links to community projects like Endia and Basalt where higher-level ML APIs are being experimented with on these computational graph components.

Our initial focus has been on accelerating inference workloads, rather than training, because the former dominates compute loads in production. However, as can be seen in the above-linked frameworks, there’s nothing preventing the use of our low-level primitives to do graph transformations for setting up the backward pass. We just want to make sure we fully meet the needs of accelerated ML inference before expanding into training.

And then at the heart of MAX is Mojo, which we want to be the best way to program GPUs (and more!). With the 24.6 release, we’ve only just started to show how we use Mojo to write the high-performance operations at the core of our computational graphs inside ML models. Beyond that, you’ve most likely seen the amazing things that the broader Mojo community has built with Mojo as a fast, Python-family host language. It is a very powerful language for high-performance computing. I’ve talked a lot about our newer Python APIs for building graphs for ML models, because I wanted to make sure that you knew that you didn’t need to learn Mojo to use MAX for accelerating your ML models.

That’s a lot, but I wanted to show that depending on what you want to accomplish, there are many entry points for you in Modular’s stack as you progress on your deep learning journey. I’d be glad to provide additional detail on any of these areas, as well as places to start for tutorials, etc.

6 Likes

I am actually reading and working through “Deep Learning with Python”. Intriguingly, early in the book it has an exercise to implement a simple densely connected net using tensorflow and then implement a batch update operation to train it. It occured to me that I could try repeating that exercise in Mojo/Max because I think all the parts needed are there - except maybe auto diff? But should be ok, the example only uses 2 types of activation function.

Thanks for your extensive reply, it is very helpful in setting the scene and filling me in on the current state of things. I also did not appreciate that I could do python+Max too, good to know!

I’m pretty keen as well on the ML/DL side of mojo since this is what I currently study in school. Afaik with mojo still actively being developed there are some ML/DL frameworks currently being developed. There is also a Discord server for the community if y’all by chance have not yet heard.

1 of our community members has a port of llm.c in mojo as a proof of concept project llm.mojo you can run it locally on your machine.

2 Likes