Learning Mojo GPU Programming Without a Local GPU

ficapy · June 11, 2025, 10:37am

Hey everyone,

I’ve been really interested in learning GPU programming with Mojo, but I don’t have a GPU, and I’m not planning on buying an NVIDIA card anytime soon. So, I decided to try out a serverless GPU service to get some practice in.

My first thought was to use GCP, since I already use Google CloudRun services (see their announcement here). However, I quickly found out that I didn’t have a GPU quota by default and would need to request one. To get started faster, I decided to go with RunPod.io instead.

The Basic Idea

The approach is pretty simple:

I send my Mojo code to a serverless endpoint.
The serverless function executes the code on a GPU and sends back the result.

The Implementation

First, I created a new GitHub repository for the project.

Here’s the Dockerfile. A quick note: if you swap max-core with the full modular package, the Docker image size jumps from about 1.5GB to around 4GB.

FROM ubuntu:24.04

WORKDIR /gpu-intro

RUN apt-get update && \
    apt-get install -y curl && \
    rm -rf /var/lib/apt/lists/* && \
    curl -fsSL https://pixi.sh/install.sh | sh && \
    /root/.pixi/bin/pixi init . \
      -c https://conda.modular.com/max-nightly/ -c conda-forge && \
    /root/.pixi/bin/pixi add max-core flask && \
    /root/.pixi/bin/pixi add --pypi runpod && \
    rm -rf /root/.cache

COPY server.py .

CMD ["/gpu-intro/.pixi/envs/default/bin/python", "server.py"]

And here’s the server.py file that runs on the serverless instance. It takes Base64 encoded files, writes them to disk, and executes main.mojo.

import subprocess
import runpod
from base64 import b64decode

def handler(event):
    input_data = event["input"]
    
    if "main.mojo" not in input_data:
        return {"error": "main.mojo file not found in json"}

    try:
        # Decode and write all provided files to disk
        for filename, base64_content in input_data.items():
            file_content = b64decode(base64_content)
            with open(filename, 'wb') as f:
                f.write(file_content)
        
        # Run the mojo command
        result = subprocess.run(
            "/root/.pixi/bin/pixi run mojo main.mojo",
            shell=True,
            capture_output=True,
            text=True,
            check=True
        )
        return result.stdout
        
    except subprocess.CalledProcessError as e:
        error_msg = (
            f"Execution failed with code {e.returncode}:\n"
            f"STDOUT:\n{e.stdout}\n"
            f"STDERR:\n{e.stderr}"
        )
        return {"error": error_msg}
    except Exception as e:
        return {"error": str(e)}

# Start the serverless handler
runpod.serverless.start({"handler": handler})

Next, head over to RunPod.io and create a new Serverless Endpoint. You’ll point it to your new GitHub repository, select some GPU that’s compatible with Mojo, and you can leave most of the other settings at their defaults. After a short wait, RunPod will automatically build your Docker image and deploy the endpoint.

Calling the Endpoint

To call the endpoint, you can use a simple Python script like this. It reads your main.mojo file, encodes it, and sends it to the RunPod API.

call.py

import requests
from base64 import b64encode

headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer <your_api_key>'
}

FILENAMES = ["main.mojo"]

data = {
    'input': {i:b64encode(open(i,mode='rb').read()).decode('utf-8') for i in FILENAMES}
}

response = requests.post('https://api.runpod.ai/v2/<your_project_id>/runsync', headers=headers, json=data)
print(response.json()['output'])

Final Thoughts

Just a heads-up, this is a very basic and barebones implementation, but it gets the job done!

Cost: As for the cost, it’s practically free for this kind of light use. A cold start takes about 10 seconds to respond, while a warm start is around 3 seconds. Since serverless billing is per-second, the cost is negligible for simple experiments.

Hope this helps anyone else out there who wants to tinker with Mojo on a GPU without the hardware

Topic		Replies	Views
Looking for examples of mulit-gpu usage with Mojo GPU Programming gpu	3	207	April 4, 2025
Examples of custom CPU / GPU operations in Mojo MAX discussion , 24_6	28	1074	April 9, 2025
Examples of programming GPU functions using the Mojo MAX Driver API MAX discussion , gpu , 25_1	5	332	April 26, 2025
GPU Programming Manual Community Showcase gpu , docs , modular-content	17	438	March 26, 2025
Modular: Modverse #48: Modular Platform 25.3, MAX AI Kernels, and the Modular GPU Kernel Hackathon Content blog	1	23	May 30, 2025

Learning Mojo GPU Programming Without a Local GPU

The Basic Idea

The Implementation

Calling the Endpoint

Final Thoughts

Related topics