Model Correctness Check

I am currently exploring the Modular repository and have understood several aspects of the MAX pipeline so far. However, I am stuck at one specific point and would appreciate some clarification.
Integration test models directory

In the integration test models directory, there are model-specific tests along with supporting files. One such file is verify_pipelines.py. From my understanding, this script verifies model outputs against a reference implementation using predefined tolerance thresholds (for example, cos_dist_threshold and kl_div_threshold).

Basically, we are the one who is calculating these tolerances using this command.

./bazelw run //max/tests/integration/pipelines/python:verify_pipelines – --pipeline “Qwen/Qwen2.5-7B-Instruct-bfloat16” --devices=‘gpu’ --find-tolerances --print-suggested-tolerances

I took a example of Qwen model which is already supported in MAX, it gives us some suggested tolerances like the below image:

These tolerance values are not computed or validated by verify_pipelines.py itself. Instead, they are predefined based on prior analysis and are enforced to ensure that the numerical behavior of a model does not regress within the MAX pipeline over time. In practice, it seems that tolerances are chosen based on observed divergence from a reference implementation, often with an additional margin (for example, ~20–30%) to account for variability.

Please let me know if this understanding is correct.

If it is, my follow-up question is: when adding support for a new model in MAX, how should one verify that the model integration is actually correct? I have run inference using the added model and observed that it produces the expected output for a given prompt. Is this sufficient to consider the model integration correct within MAX, or are there additional recommended validation steps?

Thank you in advance for your guidance.

It looks like the code to actually create a golden test output to validate against is in verify.py, which also explains how to run the creation with torch/max and then the verification: