Hey everyone,
I’ve been really interested in learning GPU programming with Mojo, but I don’t have a GPU, and I’m not planning on buying an NVIDIA card anytime soon. So, I decided to try out a serverless GPU service to get some practice in.
My first thought was to use GCP, since I already use Google CloudRun services (see their announcement here). However, I quickly found out that I didn’t have a GPU quota by default and would need to request one. To get started faster, I decided to go with RunPod.io instead.
The Basic Idea
The approach is pretty simple:
- I send my Mojo code to a serverless endpoint.
- The serverless function executes the code on a GPU and sends back the result.
The Implementation
First, I created a new GitHub repository for the project.
Here’s the Dockerfile
. A quick note: if you swap max-core
with the full modular
package, the Docker image size jumps from about 1.5GB to around 4GB.
FROM ubuntu:24.04
WORKDIR /gpu-intro
RUN apt-get update && \
apt-get install -y curl && \
rm -rf /var/lib/apt/lists/* && \
curl -fsSL https://pixi.sh/install.sh | sh && \
/root/.pixi/bin/pixi init . \
-c https://conda.modular.com/max-nightly/ -c conda-forge && \
/root/.pixi/bin/pixi add max-core flask && \
/root/.pixi/bin/pixi add --pypi runpod && \
rm -rf /root/.cache
COPY server.py .
CMD ["/gpu-intro/.pixi/envs/default/bin/python", "server.py"]
And here’s the server.py
file that runs on the serverless instance. It takes Base64 encoded files, writes them to disk, and executes main.mojo
.
import subprocess
import runpod
from base64 import b64decode
def handler(event):
input_data = event["input"]
if "main.mojo" not in input_data:
return {"error": "main.mojo file not found in json"}
try:
# Decode and write all provided files to disk
for filename, base64_content in input_data.items():
file_content = b64decode(base64_content)
with open(filename, 'wb') as f:
f.write(file_content)
# Run the mojo command
result = subprocess.run(
"/root/.pixi/bin/pixi run mojo main.mojo",
shell=True,
capture_output=True,
text=True,
check=True
)
return result.stdout
except subprocess.CalledProcessError as e:
error_msg = (
f"Execution failed with code {e.returncode}:\n"
f"STDOUT:\n{e.stdout}\n"
f"STDERR:\n{e.stderr}"
)
return {"error": error_msg}
except Exception as e:
return {"error": str(e)}
# Start the serverless handler
runpod.serverless.start({"handler": handler})
Next, head over to RunPod.io and create a new Serverless Endpoint. You’ll point it to your new GitHub repository, select some GPU that’s compatible with Mojo, and you can leave most of the other settings at their defaults. After a short wait, RunPod will automatically build your Docker image and deploy the endpoint.
Calling the Endpoint
To call the endpoint, you can use a simple Python script like this. It reads your main.mojo
file, encodes it, and sends it to the RunPod API.
call.py
import requests
from base64 import b64encode
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer <your_api_key>'
}
FILENAMES = ["main.mojo"]
data = {
'input': {i:b64encode(open(i,mode='rb').read()).decode('utf-8') for i in FILENAMES}
}
response = requests.post('https://api.runpod.ai/v2/<your_project_id>/runsync', headers=headers, json=data)
print(response.json()['output'])
Final Thoughts
Just a heads-up, this is a very basic and barebones implementation, but it gets the job done!
Cost: As for the cost, it’s practically free for this kind of light use. A cold start takes about 10 seconds to respond, while a warm start is around 3 seconds. Since serverless billing is per-second, the cost is negligible for simple experiments.
Hope this helps anyone else out there who wants to tinker with Mojo on a GPU without the hardware