I found an issue when running step05 of the llm.modular.com book

bezenek · December 15, 2025, 11:57am

I am running throught the book llm.modular.com, and I am seeing the same error on both a MacBook Pro M1 and an Nvidia DGX Spark. This is the error when I attempt to run “pixi run s05”:

(Modular) toddb@fidspark max-llm-book % pixi run s05

Pixi task (s05): python tests/test.step_05.py                                                       Running tests for Step 05: Token Embeddings...



Results:

✅ Embedding is correctly imported from max.nn.module_v3

✅ Module is correctly imported from max.nn.module_v3

✅ GPT2Embeddings class exists

✅ GPT2Embeddings inherits from Module

✅ self.wte embedding layer is created correctly

✅ config.vocab_size is used correctly

✅ config.n_embd is used correctly

✅ self.wte is called with input_ids in __call__ method

✅ All placeholder 'None' values have been replaced

✅ GPT2Embeddings class can be instantiated

✅ GPT2Embeddings.wte is initialized

tmb: (1) We are on track.

✅ GPT2Embeddings forward pass executes without errors

tmb: (2) We are on track.

✅ Output shape is correct: (2, 4, 768)

tmb: (3) We are on track.

❌ Functional test failed: Failed to compile and execute graph! Please file an issue. This error should have been caught at op creation time.

   Failed to compile and execute graph! Please file an issue. This error should have been caught at op creation time.



============================================================

⚠️ Some checks failed. Review the hints above and try again.

============================================================

Please let me know what my next step should be.

Thank you.

-Todd B.

P.S. To get to this point on the DGX Spark, I had to modify the step_05 test, but the same thing happened on the MacBook Pro with no modification.

BradLarson · December 15, 2025, 1:41pm

It could be a different issue with the nightlies, but as one possibility compilation may be failing when trying to target the DGX Spark’s GPU. We unfortunately don’t yet support the brand-new devices in the DGX Spark and Jetson Thor because we use an internal version of libnvptxcompiler to do ahead-of-time compilation of PTX to target NVIDIA hardware and that hasn’t yet been updated to support CUDA 13. CUDA 13 drops support for any NVIDIA hardware order than Turing, as well as driver versions below 580, so we’re working on a solution to allow us to update compatibility to support the new hardware while not dropping the older NVIDIA GPUs.

In the meantime, if that’s the case, you may be able to change this line in main.py to read

device = CPU()

and force execution on the CPU, rather than GPU. I thought that this would default to CPU on the MacBook Pro, as well, so there may be something else going on here.

bezenek · December 15, 2025, 2:09pm

Yes, I fixed the problem on the DGX. The MacBook Pro is failing with the llm.modular.com book right out of the box. I did a Git status to verify I had the newest version (no changes since Friday).

If you come up with anything, please let me know. Until then, i’ll see what I can do to fix the problem on the DGX (graph compiler), but I cannot guarantee I’ll fix anything since I do this in my free time.

It would be cool to get it working on both so I can compare the performance.

Also, I should be able to get around the graph library problem by disabling it if there is an easy way to do that.

-Todd B.

iamjk · January 2, 2026, 9:20pm

Just putting this here as Orin was not mentioned specifically – I am hitting the same error on nightly with an Orin Nano. I have yet to confirm whether the suggested workaround of using the CPU works, but I anticipate that it should.

bezenek · January 2, 2026, 9:34pm

Can you point me at the work-around note so I can try it?

iamjk · January 2, 2026, 9:46pm

@bezenek

bezenek · January 2, 2026, 9:58pm

This got me past the problem that my fix was getting me past. Then I hit this one:

BradLarson · January 3, 2026, 8:25pm

I think the problem with steps 5+ is that on GPU-equipped systems, there’s a mismatch between the input tensors in the test cases which were being placed on the CPU and the graph that was running by default on the GPU. Additionally, when running on GPU a datatype of bfloat16 is assumed, and NumPy can’t handle that datatype by default.

I’ve put up a PR here that should fix all but steps 8 and 12. Those require a little more investigation, they were failing with rebind errors and shape mismatches even after these fixes.

iamjk · January 5, 2026, 5:17pm

Thanks @BradLarson! So if I understand right, this should unblock the noted steps for Apple silicon, but newer Nvidia edge hardware still suffers from the incompatibility you specified before, is that right?

BradLarson · January 5, 2026, 8:03pm

If you pull the latest from the repository, the Orin Nano should now work for all but steps 8 and 12. We weren’t handling the case of an attached NVIDIA GPU correctly. I’m working on the last two steps now.

Apple silicon should have been working, because we were treating those systems as if they didn’t have a MAX-supported GPU just yet.

iamjk · January 9, 2026, 4:32pm

Confirmed that the upstream changes have fixed my issues (at least up until step 5) on the Orin Nano. Thanks @BradLarson!

Topic		Replies	Views
Build an LLM in MAX from scratch 📖 MAX max-llms , max-llm-book	10	739	February 6, 2026
Max llm tutorial bug General debugging	4	85	June 13, 2026
Help compiling MLP in MAX MAX debugging	4	123	February 23, 2026
Failed to resolve module path for `MOGGKernelAPI` MAX debugging	1	78	February 10, 2026
Encoder-Decoder (T5) model serving support MAX	4	212	January 4, 2026

I found an issue when running step05 of the llm.modular.com book

Related topics