Version compatibility for .mojopkg files

Currently the .mojopkg files (the output of mojo --package command) are just serialized MLIR. Internally at Modular we’ve seen many cases where someone would generate a package, then do some changes in our MLIR dialects, rebuild the Mojo compiler, try to read the old package, only to get a crash. We want to guard against this behavior before 1.0.

So we’ve been thinking about adding some version checks, so that if the .mojopkg file was from a different (older or newer) version of Mojo, and the compiler can’t guarantee that the MLIR binary representation is compatible, it’d warn or error.

We could not so far find a way to really do a fine-grained version check on MLIR dialects (those get changed all the time) so the intent is to just say “.mojopkg files from release 0.9 are not compatible with the 1.0 compiler” or something along those lines.

This would imply that the standard way to redistribute Mojo packages would be through the source, not through the binary packages. I think this is already the case (since nobody can guarantee MLIR compatibility).

I would like to ask for community thoughts on this. Does my reasoning above align with how the .mojopkg files are used?

Tagging @fcormack who’s working on this.

Thanks,

Denis.

Hello, after working a lot with Mojo, I must admit I never understood why we needed mojopkg in the first place.

Is this to load so source files faster since they’re already parsed? If so, isn’t it the job of the compiler cache instead?

Is this to have one file per package? In which case, wouldn’t dumping all the source files into a zip do the job?

If we want to decide on how to handle mojopkg files the community should understand the goals of such files.

I don’t know the full history, but here is my take: at the moment, the only value of .mojopkg files is to speed up compilation process. View them as “precompiled headers”, loading from those files is faster than parsing the corresponding source.

From this standpoint, yet, the files should not be visible to the users and are merely an implementation detail. That’s exactly why we propose to add version checking.

Unfortunately they are exposed via the mojo package command and maybe users use those files in different ways, hence the check.

To recap, your summary is correct to the best of my understanding. If/when we have time to speed up mojo compiler, removing mojopkg and replacing those with zipped source would be an option to consider.

I guess I’m also confused about what purpose mojopkg files serve. Right now, I’m using them as a kind of intermediate build artifact for a library. First I compile the library to a mojopkg, then I compile the downstream executable using the library mojopkg.

I suppose I had hoped that mojopkg files could be a convenient way to ship mojo apps to production. For example: compile all the libs into mojopkg files, compile the app to a mojopkg file, ship all the mojopkg files to the compute node, then just before running the app, compile all the mojopkg files on the target hardware to the final executable.

It sounds like as long as we ship the same compiler with the app that we used to build the mojopkg files, this should work? Even post 1.0?

While I’m here, what even is the intended way to build and ship a mojo app to production with multiple libraries? I can’t find much in the docs about actually deploying mojo. Maybe few people have gotten this far?

One can create multiple .mojopkg files, and then just use them in mojo build, that’s how Modular internally does it (the standard library and the kernels library are two separate mojopkg files).

Yes, the scenario you described, using these files as an intermediate step should work and will continue to work.

To summarize so far:
In the long run, we’d love to find a way to either hide these files completely, or make them robust to version changes.

For now, it seems that Fraser’s change is the right thing to do. I also found an old GitHub issue describing this version mismatch problem.

Thanks,

Denis

In the long run, we’d love to find a way to either hide these files completely …

Hide the mojopkg files from whom? End users? Developers? Both? That seems to contradict your earlier statement:

using [mojopkg] files as an intermediate step should work and will continue to work.

I’m still confused. It sounds like we’ve established that using mojopkg files is the current deployment strategy, including shipping them onto target hardware where end users can see them.

What does the deployment strategy look like after the changes you’re suggesting get implemented? Including possibly hiding mojopkg files completely?

As things stand right now, it is ok to ship these files to a different hardware, as long as the .mojopkg file is produced and consumed by the same version of the Mojo compiler.

At this time, the only strategy to deploy a library (written in Mojo) to another person using a different version of Mojo compiler is to give them the source.

As you said, this is not a satisfactory situation. Mojo needs proper libraries/packages, and .mojopkg files, as they are now, are not a good solution.

Hence my original question, does anyone in the community use these files as means to send libraries to other people, or to author those files on one compiler version and use on a different one?

I think that .mojopkg is generally good to have. It’s an unfortunate reality that Mojo needs some way for companies that don’t want to or can’t disclose source to distribute libraries. The primary suspects here would be a user-space driver written in Mojo for an accelerator. My preference would be to warn on mismatched versions since I can easily see a scenario you need to deal with a closed source library that is perfectly compatible with a new version but hasn’t updated yet.

However, I think that it would actually be more beneficial to allow embedding json-like data into the package. This lets users have version information, license info, SBOMs, SLSA artifacts, etc embedded in the distribution artifact. This also provides a good place to throw things like “required linker flags” or “public defines” as I discussed in the thread about stability.

I think that mojopkgs have a reasonable use for library distribution for situations where you don’t need source code, since it’s a single file to download instead of tons of small ones.

To recap, your summary is correct to the best of my understanding. If/when we have time to speed up mojo compiler, removing mojopkg and replacing those with zipped source would be an option to consider.

I think there still needs to be a place to put metadata in that scenario, but I suspect that there will still be a need for binary-only distribution.

Hence my original question, does anyone in the community use these files as means to send libraries to other people, or to author those files on one compiler version and use on a different one?

I have used them in a few places to help with particularly nasty build times, but they are a poor substitute for incremental builds. I’ve encountered some rather interesting bugs trying to share between versions so I always rebuild when I move to a new compiler version. I suspect that we will see an increase in sharing .mojopkg files when there is less fragmentation around compiler versions (since a good chunk of the ecosystem is currently on nightly).

Ok, thanks for clarifying. I think I see the source of the confusion. We might be thinking of slightly different concepts of deployment.

In the case of me, a developer, building an app that uses a third-party library, my preferred way of consuming that library is in source form. I probably wouldn’t end up using any closed source libraries in this project, so I don’t have any insights on how to improve the experience there. But, if the mojo build toolchain produces mojopkg files on the way to the executable on my local machine, I think that’s fine. If it doesn’t, that’s fine too. If that’s what you mean by hiding mojopkg files, them I’m all for it.

But, when I’m building an app to copy to end user hardware, I would much prefer to send some intermediate representation rather than the source tree(s). Also, we need to send the same compiler we used to build the app and libraries IR to produce the final ASM. Ideally, one mojopkg file for the whole app and all its libraries would be best. But if the libraries need their own mokopjg files too, that’s probably fine. In this scenario, I’m not sure what hiding mojopkg files would mean, but that doesn’t sound like something I’d want.

Hi Owen, this is an interesting use case, thanks for bringing it up!

How this relates to other programming languages?

In Java, I’d send an obfuscated .jar. In Rust, Swift, C++, the only choice is between a source and a binary, right?

Rust will, sort-of, let you send .rlib files around. You aren’t supposed to, but it does work. You can also hide the “crown jewels” algorithm in a C-ABI library (C, C++, Rust, etc for the actual source) you distribute as a library and wrap that with a Rust library. Of course, contract law is a far stronger protection than just distributing binaries, but some companies really don’t like handing out source code, especially if they want to do B2C distribution of their library or low-friction (minimal lawyer involvement) B2B.

A lot of hardware is still very deeply closed source, and depending on how things are designed agreements with EDA vendors may require that some parts of the driver and/or firmware are kept closed source regardless of whether or not the company making the accelerator would like to upstream the source code into Mojo. For instance, the source code to various parts of AMD’s FPGA stack likely falls under US export controls and thus cannot be open sourced, but they would very much like people to use them for AI. My assumption was that Modular would be responsible for managing and distributing a build of Mojo + MAX that has all of these extra bits baked into it for companies that want to support Mojo and MAX but who are unable/unwilling to open source for one reason or another.

You might be able to get better information from talking to your partners over at AMD on what the requirements to ship the compiler components of Vitis AI with Mojo would be. This software covers FPGAs, Xilinx SoCs, and AMD’s consumer NPUs and AMD has kept it closed.