Error running offline_inference.py

zetwhite · May 29, 2025, 12:12am

Hi, I’m new to modular and I’m trying to run offline inference code from Quickstart | Modular.

from max.entrypoints.llm import LLM
from max.pipelines import PipelineConfig


def main():
    model_path = "modularai/Llama-3.1-8B-Instruct-GGUF"
    pipeline_config = PipelineConfig(model_path=model_path)
    llm = LLM(pipeline_config)

    prompts = [
        "In the beginning, there was",
        "I believe the meaning of life is",
        "The fastest way to learn python is",
    ]

    print("Generating responses...")
    responses = llm.generate(prompts, max_new_tokens=50)
    for i, (prompt, response) in enumerate(zip(prompts, responses)):
        print(f"========== Response {i} ==========")
        print(prompt + response)
        print()


if __name__ == "__main__":
    main()

But i got error with this log :

[2025-05-29 09:06:53] WARNING memory_estimation.py:142: Truncated model's default max_length from 131072 to 94767 to fit in memory.
[2025-05-29 09:06:53] INFO memory_estimation.py:190: 

	Estimated memory consumption:
	    Weights:                4.58 GiB
	    KVCache allocation:     23.14 GiB
	    Total estimated:        27.72 GiB used / 30.80 GiB free
	Auto-inferred max sequence length: 94767
	Auto-inferred max batch size: 1

Exception ignored in: <function LLM.__del__ at 0x734964a084c0>
Traceback (most recent call last):
  File "/home/zetwhite/.local/lib/python3.10/site-packages/max/entrypoints/llm.py", line 72, in __del__
    self._pc.set_canceled()
AttributeError: 'LLM' object has no attribute '_pc'
Traceback (most recent call last):
  File "/home/zetwhite/quickstart/offline.py", line 25, in <module>
    main()
  File "/home/zetwhite/quickstart/offline.py", line 8, in main
    llm = LLM(pipeline_config)
TypeError: LLM.__init__() missing 1 required positional argument: 'pipeline_config'

I’m using ubuntu 22.04 / python 3.10 / modular == 25.3.0.
What causes this error and how can I fix it?

zetwhite · May 29, 2025, 12:17am

Ah, I found this post Quickstart Docs and giving setting to LLM fixed my problem!

valon · May 29, 2025, 9:25pm

Glad it was helpful.

Considering we’re seeing multiple people report the same problem I’m working to get the docs and code example updated.

Topic		Replies	Views
MAX on CPU doubt and request MAX discussion , feature-request , 24_6	2	110	January 10, 2025
Issue running the llama3 example in MAX 24.6 MAX debugging , 24_6	12	131	December 21, 2024
MAX Nightly 25.4.0.dev2025051623 Released Nightly	0	23	May 17, 2025
Community meeting question: MAX speed loading weights & GPU warmup MAX	5	100	February 13, 2025
Quickstart Docs Mojo docs	6	112	June 3, 2025

Error running offline_inference.py

Related topics