Comptime read_file("schema.json")

If there was one compile-time feature I’d love in Mojo, it’s the ability to bring in external data for codegen.

I realize something like comptime fetch(url) is a non-starter (non-determinism, security, etc.), but it raises a narrower question:

Should Mojo support deterministic compile-time file access?
e.g. comptime read_file("schema.json")

This seems like a useful middle ground:

  • keeps builds reproducible (local, versioned inputs)

  • enables code generation from schemas/config

  • avoids pushing everything into external build steps

Curious if this is aligned with Mojo’s design goals, or if even file I/O at comptime is considered out of scope?

That is a really useful feature to have, but sadly it shares the same capability with:

comptime ssh_key = read_file("~/.ssh/id_rsa")

As a result, we need to figure out how to sandbox this. Even workspace scoping isn’t enough since .env files often have sensitive information in them.

This is EXTREMELY desirable to have, even C has it now, albeit a weaker form, but there are wide-reaching security implications that cause headaches.

We could restrict it to relative paths having a suffix, define the set of suffixes allowed to be read as a compiler runtime argument (regex), and default to “(txt|yang|json)” (and never allow .env or resolve ‘~’)

Or go further and instead of:

read_file("anything on disk")

We define explicit compile-time inputs, and only those are readable:

comptime schema = read_file("@inputs/schema.json")

And the build config declares:

[comptime_inputs]
schema = "schemas/schema.json"

Now:

  • :cross_mark: ~/.ssh/id_rsa → inaccessible

  • :cross_mark: .env → inaccessible unless explicitly declared

  • :white_check_mark: schemas/schema.json → allowed

This flips the model from:

“comptime can read files”

to:

“comptime can read declared artifacts

File extensions are a lie, so that doesn’t help that much, and sensitive stuff gets stored in json all the time. I think only doing relative paths opens up path confusion pretty easily, and standard setups like github workspaces have well known relative paths to sensitive information.

The build config is the kind of sandboxing we need to do, but there’s still some discussions on where to put that information and when that information should be included. Bazel also complicates things.