Skip to content

CONTRIBUTING.md: Guidelines relevant to AI-assisted contributions #14052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

gasche
Copy link
Member

@gasche gasche commented May 24, 2025

This is a proposal for guidelines, to include in CONTRIBUTING.md, on AI-assisted contributions.

A short summary would be as follows:

  • These guidelines do not try to prevent the use of AI-assisted tools, or treat AI-assisted contributions differently from the authors.
  • We state the obvious: contributors are responsible for the quality of their contributions, and in particular they should have written or read all of it.
  • The guidelines demand that if a substantial part of a contribution is authored by an automated tool (this does not apply to automated copy-editing for example), the use of the automated tool should be explicitly disclosed. (Proposed test for the substantial-authorship criterion: if a human had written this part, would including it without mention be plagiarism?)

Related work I'm aware of:

difficulties, or the generation of human-approved refactoring and
code-improvement suggestions. Unreasonable usage is unfortunately also
getting more common, for example low-effort productions of code
changes that that submitters do not understand themselves,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
changes that that submitters do not understand themselves,
changes that submitters do not understand themselves,

(This typo has been spotted by ChatGPT)


## Guidelines relevant to AI-assisted contributions

AI-assistance tools are getting increasingly used to produce text and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be either "AI assistance tools" or "AI-assisted tools" (probably better).

can be more difficult than reviewing human-written code and
requires a different approach -- this is one reason why we ask in
(3) to disclose large-scale automation usage. Low-effort posting of
low-quality contributions -- whether human- or program-produced may
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
low-quality contributions -- whether human- or program-produced may
low-quality contributions -- whether human- or program-produced -- may

@alainfrisch
Copy link
Contributor

This looks very reasonable to me.

@goldfirere
Copy link
Contributor

I'm against having this policy. I think it makes us look hostile to programmers who use AI assistance routinely as they code. In particular, some workflows make it hard to distinguish what code is human-written and what code is not. Is this text even human-written? I (a human) am pushing mechanical switches on my desk. You are reading some text which hopefully corresponds with thoughts that are in my head. But lots has happened in the meantime, to get these thoughts from my brain to yours! I recognize that somehow AI feels different, but I'm not sure this is a distinction with a difference.

One thing to think about: this text was spell-checked as I wrote it, correctly identifying a few typos. Did I double-check that my OS's dictionary was properly licensed? No. Have you?

I do not currently use AI tools in my workflow, because I have not found an AI tool yet that doesn't drive me bonkers. This is probably my failing, though, and I look forward to a day soon where this changes. If I were considering contributing to a new language community and saw this policy, I would shop elsewhere.

@gasche
Copy link
Member Author

gasche commented May 27, 2025

The intent of the guidelines is not to discourage the use of AI tools to contribute to the compiler distribution. I tried to write it in a way that is not carry positive or negative judgment of their use in general -- a particular check against this is to make sure that every guideline also makes sense if AI tools are replaced by human labor, and I think they do.

When you report that you think these guidelines would discourage you as a prospective contributor, I wonder if you mean that:

  • there are specific parts of the guidelines that you found discouraging, which could be reformulated, or
  • the general vibe (no pun intended) of these guidelines is discouraging, or maybe even their mere existence

@goldfirere
Copy link
Contributor

Yes, good questions. I think I worry about the burden of adhering to the guidelines. That is, assume my routine coding environment has an AI running in the background suggesting completions as I go, which I occasionally use. Is my work primarily the output of AI? It’s reasonable to think that the majority of characters are AI produced…. especially considering that AI sometimes offers multi-line completions. So it’s unclear how I would conform to guideline (3). And I have no idea how I would conform to point (5).

I agree that these rules make literal sense to reinterpret to be about collaborating with a human (and that this is a good goal). But the way that humans collaborate is so different from the way an AI assistant might enhance one’s code makes the analogy not quite work out, to me.

Mostly, I feel that rules such as these have the opposite effect of a code of conduct. I see a code of conduct as saying, essentially, “we want to be welcoming to all contributors, and we’re going (to try) be civil no matter who you are”. These rules, on the other hand, seem to say “we’re wary of contributions from AI, though we’ll tolerate them”. The former might subtly encourage more people to participate (even though we would have likely been civil regardless of having a code of conduct), while the latter might subtly encourage fewer people to participate (even though we would review AI-generated code with due diligence). Maybe I’m crazy here! I’d be quite happy to hear I’m in the minority opinion about my reaction to this.

Copy link
Contributor

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. Each of the points made seems reasonable, but I am sympathethic to @goldfirere's opinion that we do not yet know what form AI collaboration will take, so trying to dictate guidelines and expectations about it could be a bit premature.

If we still wanted to write something about this, we could simply include a generic statement, something along the lines of:

AI contributions: apply your judgement. Just as you would not submit a PR by blindly copying someone else's code, you are expected to behave similarly with respect to AI-generated content.

Of course, if we start noticing a problematic increase in low-quality submissions due to AI, then yes, perhaps more specific guidelines may become useful. But then again, the real problem is not low-quality submissions, but rather submissions that look very much high-quality, but are not. And if someone is going through the trouble of crafting such submissions, it is unlikely that any guidelines will discourage them.

their consent beforehand. This section documents our expectations
regarding their usage to contribute to the OCaml compiler
distribution. These guidelines also apply, or have a direct
counterpart, with entirely human-produced content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
counterpart, with entirely human-produced content.
counterpart, to entirely human-produced content.

@goldfirere
Copy link
Contributor

To be clear, I have no qualms with @nojb's

AI contributions: apply your judgement. Just as you would not submit a PR by blindly copying someone else's code, you are expected to behave similarly with respect to AI-generated content.

I don't personally feel such a statement is necessary (I think it's implicit that when you submit a PR, you take responsibility for what you are submitting), but I'm fine if others would like to add this. (Once upon a time, I also did not think codes of conduct were necessary. But I have seen enough people saying that a CoC would make it more likely for them to contribute, so I changed my stance. Maybe there is something similar here.)

Also, just to clarify: I don't actually feel that strongly on this whole point. I do have an opinion, which I've tried to explain: I think a policy like the one in this PR does more harm than good. But I'm a relative newcomer here, I admit that my opinion might be wrong in any number of ways, and I'm happy to stand aside if that seems to be the best course of action.

@Octachron
Copy link
Member

Overall, I also don't like the proposition because it spends a lot of time trying to prescribe how contributors may use AI in order to avoid a currently not-existent problem.

Prescribing what usage of AI is reasonable feels out-of-place in a contributing guideline. I don't think it is our place to decide for others which uses of tools are reasonable or not (in the same way that we don't prescribe a vim configuration).

Similarly, I don't think that it makes sense to special case AI: I expect that every contribution should has been been reviewed for correctness by their authors, have a well-defined sets of authors, that have agreed to contribute to the PR. This applies for all authors, sapient or not. Typically, the ACM or linux foundation policies seem to be mostly a reminder that the authorship of AI-generated contents is a legally grey (or black) area.

Nevertheless, I partially agree that it might be useful to add a reminder that the three previous points (correctness, authorship, authors agreement) are less trivially true in a AI settings. For this purpose, my feeling is that @nojb 's proposition strikes a better balance. With maybe a slight amendment to account for the authorship question

AI contributions: apply your judgement. Just as you would not submit a PR written by someone else, or by blindly copying someone else's code, you are expected to behave similarly with respect to AI-generated content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants