Build an LLM in MAX from scratch 📖

dunnoconnor · November 20, 2025, 7:13pm

The MAX Experimental API is a model-building framework we’ve been developing to help you create custom models in the MAX Framework. You can now get hands-on with this experimental API by following our MAX LLM tutorial.

In this tutorial, you’ll build each component of the model yourself—embeddings, attention mechanisms, and feed-forward layers. You’ll see how they fit together into a complete language model by completing the sequential coding challenges in the attached GitHub repository. At the end of the tutorial, you’ll be able to generate text with your model and compare it to GPT-2 on Hugging Face.

The MAX Experimental API will change over time and expand to include new features and functionality. As it evolves, we plan to update the tutorial accordingly. If you try the tutorial, please share your thoughts and feedback below. If you encounter errors, log them as GitHub issues or submit a pull request with improvements. We hope you enjoy the tutorial and being among the first to use this exciting new API.

dolewhip · November 21, 2025, 8:46pm

Thanks for the book pretty cool. Running through it.

For step07 the test has

test_input = Tensor.randn(

        batch_size, seq_length, config.n_embd, dtype=DType.float32, device=CPU()

    )

Should it be?

from max.experimental import random

test_input = random.normal(

        (batch_size, seq_length, config.n_embd), dtype=DType.float32, device=CPU()

    )

I was guessing on this random. Some other minor things like sys and path missing, I can pull request later.

dolewhip · November 22, 2025, 12:03am

Hey for test.step_10 I didn’t see any randint? I filled with ones or what should we use on step 10?

mseritan · December 7, 2025, 3:02pm

I am running into the same issue.

vguerra · December 7, 2025, 6:48pm

in here you can find some pointers on how to solve some issues, for instance, the randn issue from testing step 07 : Many issues · Issue #7 · modular/max-llm-book · GitHub

mseritan · December 8, 2025, 1:22am

I have submitted a PR which allows the tests to complete: Fix issues with Tensor.rand* not being available and other by winding-lines · Pull Request #15 · modular/max-llm-book · GitHub

Let me know if this works for you,

Marius

dunnoconnor · December 8, 2025, 6:49pm

Thanks for your contribution @mseritan! PR #15 does resolve these issues and is now merged.

alix · December 15, 2025, 11:09pm

I have a basic nvidia a4000 , although all the tests are being passed , i am getting an test failed message on step 5 and 6 , at the end that says “ Functional test failed: Failed to compile and execute graph! Please file an issue. This error should have been caught at op creation time.
Failed to compile and execute graph! Please file an issue. This error should have been caught at op creation time.”

bezenek · December 17, 2025, 4:07pm

I am seeing this same issue, Marius. I have seen several people who are reporting issues past this point, so I am not certain why we are blocked.

Here is the output I’m seeing on an Intel i9 box with an Nvidia 4090 video card:

✅ Output shape is correct: (2, 4, 768)

❌ Functional test failed: Failed to compile and execute graph! Please file an issue. This error should have been caught at op creation time.

   Failed to compile and execute graph! Please file an issue. This error should have been caught at op creation time.

Here is information about the system:

(Modular) toddb@fidfast max-llm-book % lscpu
Architecture:                x86_64
  CPU op-mode(s):            32-bit, 64-bit
  Address sizes:             39 bits physical, 48 bits virtual
  Byte Order:                Little Endian
CPU(s):                      32
  On-line CPU(s) list:       0-31
Vendor ID:                   GenuineIntel
  Model name:                Intel(R) Core(TM) i9-14900KF
    CPU family:              6
    Model:                   183
    Thread(s) per core:      2
    Core(s) per socket:      24
    Socket(s):               1
    Stepping:                1
    CPU(s) scaling MHz:      16%
    CPU max MHz:             6000.0000
    CPU min MHz:             800.0000
    BogoMIPS:                6374.40

The OS:

(Modular) toddb@fidfast max-llm-book % uname -a
Linux fidfast 6.14.0-37-generic #37~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 20 10:25:38 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

And the graphics card:

(Modular) toddb@fidfast max-llm-book % nvidia-smi
Wed Dec 17 08:05:01 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 Off | Off |
| 0% 30C P8 7W / 450W | 40MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1742 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 1860 G /usr/bin/gnome-shell 10MiB |
+-----------------------------------------------------------------------------------------+

– end –

Anyone with help, please provide anything that you can think of.

Thanks,
Todd B.

dunnoconnor · December 18, 2025, 8:02pm

Thanks for the detailed report here. I will dive into the steps and tests (this may be an issue in the try/except block in the test suite) and work to improve the functionality and error messaging here.

bezenek · February 6, 2026, 5:59pm

Today, using the new version of MAX (26.1), I was able to run the entire book on the NVIDIA DGX Spark. It did not run properly on the older version.

If Modular is interested in any benchmarks for this device, please feel free to contact me on LinkedIn at Todd M. Bezenek - Omnifidus | LinkedIn.

As a quick test, I did this on a non-busy device with a 1 GbE active network connection:

$ time pixi run check-all

The result was:

pixi run check-all 54.92s user 33.55s system 40% cpu 3:37.85 total

During the test, the system was almost quiescent. I’ll look into it more when I have time, as this is my hobby, not my work. :=) For work, I do compiler and hardware design.

-Todd B.
Bittally

Topic		Replies	Views
I found an issue when running step05 of the llm.modular.com book Mojo	10	295	January 9, 2026
Max llm tutorial bug General debugging	4	85	June 13, 2026
All MAX API tests can now be run via Bazel in the `modular` repository MAX	7	231	February 6, 2026
Notebook format for the max-llm-book MAX max-llm-book	0	53	May 15, 2026
New resources for building models in MAX Models & Pipelines gpu	0	128	June 27, 2025

Build an LLM in MAX from scratch 📖

Related topics