How to import and run an exported MAX Model from MEF

We can export a compiled MAX Model via export_compiled_model. What really is the MEF and is there a way to import and execute such a exported model at a later stage (in the Mojo API)?
I’d love to learn more about this :slight_smile:

I found instances of the MEF in the model cache for MAX, which are probably meant to reduce unnecessary compilations on topologically similar graphs:

.magic/envs/default/share/max/.max_cache/mof/mef

While playing around with that builtin model caching and my own Dictionary-based model caching mechanisms, I recognized that the builtin one - which probably loads compiled models from disc - is still pretty slow in comparison and rather unusable for some actual JIT stuff. Will/Can this be enhanced in the future?

MEF is a device- and MAX-version-specific cache format produced by the graph compiler. It is not portable between devices and MAX version, and is primarily intended for caching the results of compiling a MAX Graph on a single system for repeated runs of the same graph. The methods for manually saving and loading MEFs were put in place at a time when the automatic graph caching didn’t work as reliably as it does today, and are largely obsolete. Automatic caching of graph compilation is pretty solid today.

As for performance, it’s unclear what you’re comparing here. I will say that the Mojo and Python Graph APIs do diverge in how they handle graph caching: the Mojo API inlines model weights with the compiled graph, where the Python API by default keeps the weights external from the graph. The latter speeds up loading the cached graph and massively reduces their footprint on disk. This technique was developed as we were working on the Python Graph API and hasn’t yet been introduced to the Mojo Graph API.

Thank you very much for your reply, very insightful. Please don’t get me wrong here, I love it that this caching mechanism exists. :fire:

Performance-wise I meant to compare it to having a compiled model as an Object in e.g. a list or dictionary, and then in every iteration get an instance from this collection element and then run it on some input. This is usually like a 100 times faster in my experience than the automatic caching mechanism that MAX provides. Could you elaborate on what of the model loading from the cache takes so much time in comparison? Could the cached models be brought closer to the running process to reduce loading time?