Model Correctness Check

I am currently exploring the Modular repository and have understood several aspects of the MAX pipeline so far. However, I am stuck at one specific point and would appreciate some clarification.
Integration test models directory

In the integration test models directory, there are model-specific tests along with supporting files. One such file is verify_pipelines.py. From my understanding, this script verifies model outputs against a reference implementation using predefined tolerance thresholds (for example, cos_dist_threshold and kl_div_threshold).

Basically, we are the one who is calculating these tolerances using this command.

./bazelw run //max/tests/integration/pipelines/python:verify_pipelines – --pipeline “Qwen/Qwen2.5-7B-Instruct-bfloat16” --devices=‘gpu’ --find-tolerances --print-suggested-tolerances

I took a example of Qwen model which is already supported in MAX, it gives us some suggested tolerances like the below image:

These tolerance values are not computed or validated by verify_pipelines.py itself. Instead, they are predefined based on prior analysis and are enforced to ensure that the numerical behavior of a model does not regress within the MAX pipeline over time. In practice, it seems that tolerances are chosen based on observed divergence from a reference implementation, often with an additional margin (for example, ~20–30%) to account for variability.

Please let me know if this understanding is correct.

If it is, my follow-up question is: when adding support for a new model in MAX, how should one verify that the model integration is actually correct? I have run inference using the added model and observed that it produces the expected output for a given prompt. Is this sufficient to consider the model integration correct within MAX, or are there additional recommended validation steps?

Thank you in advance for your guidance.