Rethinking the AI-Tool Use Policy for the agentic era

Hi Modular community,

I’ve been contributing to the stdlib using AI-assisted workflows and wanted to share some thoughts on the AI Tool Use Policy. I think the intent is great, but I believe some rules are optimized for a past that’s already behind us.

I think it’s clear that all code will be AI-written. The question is how we handle human responsibility.

We’re not heading toward a world where AI assists humans, we’re heading toward one where AI writes 100% of the code and humans just direct, review, and try to own it. The sooner we design contribution workflows that aim at that reality, the better positioned the project will be.

I think policies that treat AI generation as an exception will need to change anyway, so we might as well start adapting now

Draft PRs could be the right boundary for agentic workflows.

We’ve moved from Copilot autocomplete to autonomous agents that can run in parallel, open branches, write tests, and iterate on feedback. The right human checkpoint in this model isn’t “did a human type the code”, it’s “did a human review and own this before it went to reviewers.”

A practical policy update would be: allow agents to autonomously create draft PRs, and require the human author to self-review and manually mark them as ready for review. That’s a clear, verifiable, enforceable boundary that doesn’t require policing how the code was produced.

Prohibiting AI-written PR descriptions seems to me the wrong lever

The current policy asks contributors to write PR descriptions themselves, on the theory that it forces self-review. But this creates a weird workflow: AI writes the code, human copies the AI’s summary and pastes it adapting it. That’s not more human, it’s just slower. A better option IMO is to invest in prompts and tooling that produce excellent PR descriptions (e.g. not too detailed), and then have the human review and refine the description before marking the PR as ready. The draft PR stage is where that review happens.

Proposal

I suggest updating the AI tool use policy so:

  • It allows agents to create and update draft PRs.
  • The human author would review the diff and mark the PR as ready for review, which becomes the accountability checkpoint.
  • The Assisted-by: AI label would still provide transparency.

This keeps accountability where it matters while removing unnecessary friction.

Curious whether others are running into the same friction and how you’re handling it

Hey Manuel - agentic coding is rapidly rewriting the way software is built and as a project, we have to reckon with that.

I agree that as a project we need to focus on how we best enable AI coding to be effective with MAX & Mojo.

Internally Modular’s maintainer team are discussing policy changes here and will 100% follow up - transparently my main concern is around code review burden on maintainers. I think we can somewhat alleviate that with various policies (small PRs, adding testing and PR review automations etc) but want to be sure we approach things the right way.

Thanks for kicking off this discussion! Lots more to come.

1 Like

I totally understand the concern, I’ve run into the same issue at the company I work at. Now most of the problems has been solved by improving the AI workflow instead (improvements in CLAUDE.md or skills, GHA automations, improving test coverage, etc), becoming 3-5X more productive without losing quality (actually reducing tech-debt).

To illustrate my point, imagine a contributor submits a PR with a major issue: a bug, too much duplicated logic (violating DRY), overly verbose code, or something that takes a long time to review.

IMO there are two possible approaches:

  • Ban or restrict AI usage, reducing review burden but shifting many more hours of work back to the contributor.

  • Focus on the actual issue and ask the contributor to refine their AI workflow so the same problem doesn’t happen again, without hurting productivity. This could also lead to an improvement in Modular AI infra (e.g. refining CLAUDE.md) so every new contributor will not repeat that issue.

Personally, I think the second approach scales better as AI-driven development becomes the norm.

Working with draft PRs could be key, IMO. However, I’ve noticed that even draft PRs often get reviewed by maintainers. A small process change, like not reviewing draft PRs until they’re marked ready by the author, after ensuring they meet all the requirements (small PRs, test covered, CI passing, etc), could make a big difference.

1 Like

Yes I comply and this legacy policy should be changed.

You don’t always have to draft parts on a PR that can easily be automated by an AI the main job is to architect systems not to design syntax on your codebase. I’ve been observing this trojanly though I’ve not encountered it.

These are great ideas for removing friction. And we do appear to be going in the direction that you’ve laid out.

One problem though is that this massively increases the burden on Modular. If dozens or hundreds (or more) people have their agents create draft PRs, the sheer number will quickly become overwhelming. For many open-source projects, that already seems to be happening.

Looking at Modular’s PR history, it’s already going from a sea of purple PRs to, more recently, a sea of red PRs. This is common for open source projects as they become more popular but it looks like it’s already starting to be exacerbated by AI and is decreasing the signal to noise ratio there.

AI is reducing submission cost far more than it is currently reducing review cost.

When supply outstrips demand, you actually usually want to increase friction rather than decrease it.

This is what some are already doing, just a couple of recent examples below:

  1. ghostty, first-time contributors now need a personal vouch, all PRs must be tied to previously accepted issues and must declare what AI tools used, how much is AI assisted and no agent-opened PRs: Updated AI usage policy for contributions by mitchellh · Pull Request #10412 · ghostty-org/ghostty · GitHub
  2. tldraw, no longer accepting external contributions at all: Contributions policy · Issue #7695 · tldraw/tldraw · GitHub

AI agents will hopefully allow contributors to increase their support towards Modular’s success but I think it might be worth slowing down here, just my 2 cents!

3 Likes

We mitigated review burden in my company by leaning on draft PRs and adding more checkpoints through GitHub Actions, many of them using an “LLM-as-a-judge” approach (Using Claude Sonnet 4.6).

Even though Copilot already does a decent job, having AI reviewers tailored to Mojo/Max best practices could be a game changer. By the time a PR is marked as “ready for review,” it should already be free of the common pitfalls that maintainers are typically burned out from catching.

And whenever a maintainer catches something in a PR that could become a recurring issue, it’s just a matter of incorporating that case into the reviewer prompt.

If you’d like, I can open a draft PR showing what an initial Claude Code review GitHub Action might look like.

I want to add a bit of a controversial take.

What is a benefit of a PR made by a human, when all the work was done by the agent?

As I see it there are two benefits:

  • Main benefit is the person making a PR had an idea which could be implemented by an agent
  • They validated this idea by spending time and tokens

Now the code itself is actually a byproduct which potentially can be recreated given you have the prompt and info about the agent which was being used.

So in this case the contribution can be code less, just write down the prompt and agent info. Some one from Modular can schedule a run of this prompt and merge PR internally, there is no need for human communication overhead and the responsibility lays on people who will actually maintain this code in the future.

I think this might reduce the spam pressure on the Modular team and can lead to more organic repo evolution.

5 Likes

I do agree that contributors using AI should also help improve the AI setup itself, by refining prompts based on the pitfalls they encounter, creating follow-up PRs with the improvements in CLAUDE.md, docs, CC skills, etc. That said, there’s a lot more work involved than just writing code.

Your approach could end up shifting a significant burden onto maintainers, who may need to spend much more time validating implementations, running benchmarks, and verifying correctness. It could also lead to a flood of changes that wouldn’t have been upstreamed otherwise because they weren’t properly validated, for example, potential “optimizations” that actually degrade performance in practice.

I think it is the contrary. As a maintainer I need to validate everything anyways, as I am the one who will have to maintain the code which was committed by an external contributor. So when I get a PR with code, I need to thoroughly review the code. Now here is the thing, if I can just copy paste the prompt to the tool and review the generated code directly, on my terms and maybe also tweak things if needed adhoc, that cuts out the middle man and has a more direct approach to it. The time I as a maintainer need to spend on the PR is less as I need to communicate with the agent only. I don’t have to give feedback to human which tells stuff to agent then post stuff from agent so I can review it again. I certainly still can do it, if the result I get from the agent is not great and I want to loop in the initial idea/prompt contributor to get their opinion. The only thing that I am saving on when some one else contributes code generated by a tool is the cost of tokens, which is IMHO neglect able. However I do lose the info about how it was prompted (the context) and as you mentioned if I have to tweak the CLAUDE.md, docs, skill etc…

1 Like

I hate the idea of using AI tools,especially prompting them to generate entire codebases,to submit PRs. If it’s just for generating boilerplate, then that’s fine. But in an open-source project, if you’re simply prompting a complex problem, it means you don’t understand it well enough to tackle it. You need to research it thoroughly to contribute meaningfully.

AI tools should be used for your own issues, where you have the freedom to do as you like—not in open-source projects, where the burden falls on maintainers.

“Good First Issues” shouldn’t be solved using AI, as they are meant to help new contributors get familiar with the project. Using AI tools in this context only adds extra burden on maintainers. But that’s just my opinion.

My general feeling is that if it’s immediately clear that you used AI by some mechanism other than Assisted-by: AI, then the PR has some problems. Personally, I prefer to hold AI-assisted coding to the exact same standards as human coding, which means that if I can’t tell it’s written by AI without that tag, then it’s not a problem although I might give it an extra correctness review.

I think that calling out AI generation specifically is still warranted, because reviewing large piles of automated submissions is a good way to quickly burn through limited review resources. To put it another way, if you don’t think a PR is worth doing some manual work to present to other humans, why is it worth it for me to review it? I don’t even care if you write the PR description first and then feed it into the LLM as a prompt.

I see no reason why the current policy prevents you from using agents against your copy of the repo, before manually opening a PR to the main one once it’s ready.

Prohibiting AI-written PR descriptions seems to me the wrong lever

In my opinion, if you don’t understand what the AI wrote well enough to hand write a description, you didn’t review it carefully enough. Super long PR descriptions aren’t necessary in most cases, with many one the longest I’ve seen being long due to detailed included benchmarks, so I don’t expect this to be that much of a burden.

4 Likes

Hi, an small perspective on this.
At scale, at least two things affect a lot:

  • how big an change is
  • how often changes comes

According to the theorem of Nyquist Shannon (sampling theory, signal processing):
In order to reconstruct an signal faithfully,
we have to be able to sample at an rate of at least twice the rate of change.

Now with A.I, context is getting bigger, and usage increases.
Which means that over time, PR’s will appear more often, bringing larger changes.

So if we want to be able to know what the actual state of the system is,
it is necessary to read the codebase twice more often than change it.

What can be done ?

  • Maybe allow only one vibecode-week per month.
  • Maybe pick one area that can be vibe coded at a time, and change it when needed.
  • Ask PR author to be the first human reviewer.
  • Categorize change types (structural, refactor, algo, new feature )
  • Process PR in order of small changes to large changes
    (But team reviewer can pick larger one if thought important)
  • Any ideas ?

I never used any huge agentic tool,
but i do use some gpt to ask questions an help the thinking.

I think that if a lot of AI do a lot of changes often,
the bigger problem is to keep up at knowing “what is the current state”.

The reason why AIs future in code generation is quite promising is because of our previous powerful use cases such as how Anthropics Claude could build the Claude C Compiler (CCC) entirely on Rust.

Then why do we need external contributors at all? :upside_down_face:

If everything can be written by AI, then maintainers can spin multiple AI agents and wait for them to finish. As I mentioned above, the only benefit would be to offset the cost of code generation to contributors, which is quite silly.

Currently lots of open source projects battle a swarm of contributions because for the contributor who does everything through an agent, the cost of contribution is minimal and the benefit is still significant.

In this podcast (https://www.youtube.com/watch?v=RJyPVLMyyuA) Dimitri mentioned Jevons paradox - Wikipedia (starts at about 1:07:05) and how from economic standpoint if some process costs me 99 cents and I get even 100 cents of value out of it, my goal is to infinitely repeat this process. As a potential contributor, if I can instruct an agent to do some PRs which will cost me almost nothing, but can give me some benefits (reputation and such). I will generate PRs, specifically as I am not a maintainer, I am not responsible for this project to work. As a contributor I am the chicken if we want to go with the typical agile methodology lingo ( The Chicken and the Pig - Wikipedia ). Maintainers are the pigs, as they will have to keep up with the swarm of contributions and be able to fix and evolve the code base.

1 Like

Yeah, it would be wise to start with an conservative approach on this, at least gradual.
Also need an strategy to offload pre-reviews for maintainers.
In addition, an strategy for pre-submit too.

I could see an future where many features are submitted,
but not prioritized (example: an module for x, y z).

So the roadmap plays a huge role there to keep the flow relevant.

To me, the key isn’t how the code is written (AI, manual, or mixed), but the value of the outcome and the contributor’s ownership.

For example, @mzaks , you’ve worked on hashers from the beginning and earned trust in that area, you’re effectively an SME there. That kind of ownership+commitment matters more than how the code was produced.

I believe we are in a great moment to contribute using AI faster than ever before. However, with the advent of 1.0, I feel there are a lot of things underneath Modular we don’t actually understand, with a lot of maintainers headed down, and it’s harder to add value. I remember creating the `Counter` struct with no problem, when Mojo was about becoming a superset of Python. Now I get push back just for adding a `String.capitalize()` method, and I completely get it, and actually I have no problem just because I did not spend 1 hour for this but 2-5 minutes and tokens from my AI subscription I am already paying anyways.

I feel we would need some guidance in the things that are welcome in stdlib these days, more than constraint on AI usage, specially because there are many things in which AI could be “superhuman” in many tasks, as the time for the AI to e.g. look for optimization opportunities, spot them, optimize them, creating benchmarks and run them to proof the optimization, create a draft PR for the author to review, it’s literally minutes vs many human hours.

For what it’s worth, I don’t think this is unique to Mojo, it’s the natural evolution of almost any large OSS project.

When Mojo was first open sourced, there were fewer rules, guardrails, more room for experimentation, etc… That’s very common early on, when a project is still in its infancy and the boundaries have not been defined yet. But as a project matures (especially one backed by a for-profit company and moving toward a 1.0 release) it becomes necessary to be more selective about what gets added and what long term maintenance burden the project is willing to take on.

The rate at which community contributions can grow tends to far outpace the rate at which maintainers and reviewers can grow, and AI only amplifies that.

I think your comment actually highlights an important part of the problem: the time and effort required to create a contribution is no longer a good proxy for the time and effort required to review and maintain it. A contributor can now produce a PR in a matter of minutes, but the review cost remains very real. Every AI-generated patch, revision, explanation, and optimization still has to be reviewed by a human. And in OSS, that cost does not fall on the contributor’s agent, it falls on maintainers and reviewers, who also inherit the long term responsibility for the code once it lands.

I do agree with this.

The stdlib team should do a better job of communicating its current goals and priorities. Not just for 1.0, but more broadly for Mojo as a whole. That would make it easier for contributors to focus effort on the areas that are most likely to be valuable and accepted.

One concrete way to improve this may be to keep refining the GitHub process around accepted issues and clearer up-front signals on what work is actually desired. That gives contributors a better sense of where to spend their time, effort, AI tokens, and it also helps reduce reviewer time spent on PRs that were unlikely to align with project priorities in the first place.

Part of the challenge is simply bandwidth. There is a lot happening, and like most startup teams, we are often stretched thin. But I do think it would be worthwhile for us to align more clearly on what we want and then communicate that back to the OSS community. That guidance will naturally evolve over time, but having it written down would still be a big improvement.

1 Like

I like this take.

One model I’ve seen work well in larger OSS projects is to build a better review funnel before something reaches maintainer eyes. That can mean having a set of trusted contributors who help with issue triage, design feedback, and early PR reviews so that by the time a maintainer looks at something, the scope/discussion is much more focused.

I do think that kind of model has to be handled carefully, though. I would not want us to simply offload all of the front-line review burden onto OSS contributors and treat them as free labor :flushed_face:. But I do think there is room for a healthier middle layer. Contributors who have built trust in certain areas and can help with this :slight_smile:

Computers scale, humans don’t. Having the interface on one side being human and the other being automated feels like the beginning of trouble. I get that Manuel is still having humans in the loop except the design part of the PR is gone.

All PR are not the same. Some - I assert - need human thought and design, others not so much. Why was this decided? What are the issues that are being grappled with? If there is any tweak to the PR policy - and I think there should be none as we go from pre-1.0 to 1.0 beta - that it be as little as possible. Baby steps so that the limiting factor can be identified and assessed.

That’s the crux of the issue and a problem for all OSS - responsibility. I don’t think I saw anyone mention this: IMO advancements in LLMs and autonomous coding agents didn’t create new problems, but only amplified existing ones: misalignment of incentives and absence of the skin in the game. There have always been low quality contributions, it’s just that now the volume has increased.

Path forward (for all of OSS) is pretty obvious to me: align incentives and introduce skin in the game. ghostty (which I use :grimacing:) solved it their way. As someone wrote earlier, 1) approved contributor needs to vouch for you; 2) I’m assuming both have something to lose (skin in the game) if the newcomer pushes low quality code.

I like the idea of requiring prospective contributors to be involved in the community. First discuss with maintainers what they need help with, agree on the approach/solution, then get to work. And if the issue isn’t assigned to you, your PR is automatically closed (even better if it cannot be created in the first place). That’s essentially proof-of-work and maybe there is a way to implement proof-of-stake too (back to skin in the game).

Just to dream about the future little bit. Perhaps when autonomous coding agents are better and cheaper, OSS projects won’t need external code contributions. Only product-focused discussions on what gets build and how product evolves. Once there is agreement and buy-in, prompt and leave overnight to iterate. Maintenance isn’t a problem if a swarm of agents is doing it. But to get there, the technology needs to exist and work well (project-wide reasoning)

The more I think about the idea, the cooler and more inevitable (in the long run) it sounds. We don’t even need a deterministic process. A stochastic one will do as long as it converges. OP is already living in the future while we are still here in the present.

2 Likes