New MAX Docker Containers in 24.6

We’re excited to announce a major addition in the 24.6 release: new Docker containers that provide optimized MAX installations across cloud platforms. While these containers are featured in our Deploy Llama3 on GPU tutorial, they deserve special attention as standalone packages.

You can find two distinct images in the Modular Docker Hub repository:

  1. A standard image for cloud deployments and local testing.
  2. A specialized image optimized for our Helm charts.

Both containers come pre-configured with everything needed to run MAX on NVIDIA GPUs locally and in cloud environments like AWS, GCP, and Azure. You can learn more about how to use the containers directly from the Docker Hub README files. We welcome your feedback on these containers here. Looking ahead to 2025, we’re excited to bring to further optimizations - not just for LLM performance, but also to reduce container size and streamline deployment times in your infrastructure.

3 Likes

Is there any chance of a quay.io mirror? I think as long as you’re only using it with public repositories it’s free, but the main customer benefit is that they don’t rate limit. Academic institutions often have both restrictive access to an organizational docker hub account and get rate limited by IP address due to it only taking a few students pulling down containers to hit rate limits.

Secondly, is there a planned image where we can hand the container a volume mount with a checkout of the repo, so that we can avoid sending model weights to huggingface?

Thanks for the feedback Owen! I’ll share this with the product team in charge of containers. Rate limiting from Docker Hub is something that’s been on our mind. Have you experienced this firsthand?

It’ll be a while before we can fully respond. The company is out on holiday leave right now, so there won’t be any new containers or mirrors coming in the next few weeks.

Yes, my university is more or less always at the rate limit unless it’s 10pm to ~4 am. A lot of universities are using NAT, so everyone on campus shows up as 1 IP address, which docker hub allows 100 pulls per 6 hours, so one person playing with kubernetes is enough to use substantial amounts of the rate limit on their own. Combine that with classes that teach containers and require you to use them for homework, and you end up with very limited access. This is probably less of a concern for companies which can set up and tell everyone to use a caching proxy like artifactory, but it is an issue for academia.

This is a “when it’s convenient” request, no rush.