-
Notifications
You must be signed in to change notification settings - Fork 1.1k
CONTRIBUTING.md: Guidelines relevant to AI-assisted contributions #14052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
Related work I'm aware of: - Linux Foundation: https://www.linuxfoundation.org/legal/generative-ai - ACM: https://www.acm.org/publications/policies/new-acm-policy-on-authorship
difficulties, or the generation of human-approved refactoring and | ||
code-improvement suggestions. Unreasonable usage is unfortunately also | ||
getting more common, for example low-effort productions of code | ||
changes that that submitters do not understand themselves, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes that that submitters do not understand themselves, | |
changes that submitters do not understand themselves, |
(This typo has been spotted by ChatGPT)
|
||
## Guidelines relevant to AI-assisted contributions | ||
|
||
AI-assistance tools are getting increasingly used to produce text and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be either "AI assistance tools" or "AI-assisted tools" (probably better).
can be more difficult than reviewing human-written code and | ||
requires a different approach -- this is one reason why we ask in | ||
(3) to disclose large-scale automation usage. Low-effort posting of | ||
low-quality contributions -- whether human- or program-produced may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
low-quality contributions -- whether human- or program-produced may | |
low-quality contributions -- whether human- or program-produced -- may |
This looks very reasonable to me. |
I'm against having this policy. I think it makes us look hostile to programmers who use AI assistance routinely as they code. In particular, some workflows make it hard to distinguish what code is human-written and what code is not. Is this text even human-written? I (a human) am pushing mechanical switches on my desk. You are reading some text which hopefully corresponds with thoughts that are in my head. But lots has happened in the meantime, to get these thoughts from my brain to yours! I recognize that somehow AI feels different, but I'm not sure this is a distinction with a difference. One thing to think about: this text was spell-checked as I wrote it, correctly identifying a few typos. Did I double-check that my OS's dictionary was properly licensed? No. Have you? I do not currently use AI tools in my workflow, because I have not found an AI tool yet that doesn't drive me bonkers. This is probably my failing, though, and I look forward to a day soon where this changes. If I were considering contributing to a new language community and saw this policy, I would shop elsewhere. |
The intent of the guidelines is not to discourage the use of AI tools to contribute to the compiler distribution. I tried to write it in a way that is not carry positive or negative judgment of their use in general -- a particular check against this is to make sure that every guideline also makes sense if AI tools are replaced by human labor, and I think they do. When you report that you think these guidelines would discourage you as a prospective contributor, I wonder if you mean that:
|
Yes, good questions. I think I worry about the burden of adhering to the guidelines. That is, assume my routine coding environment has an AI running in the background suggesting completions as I go, which I occasionally use. Is my work primarily the output of AI? It’s reasonable to think that the majority of characters are AI produced…. especially considering that AI sometimes offers multi-line completions. So it’s unclear how I would conform to guideline (3). And I have no idea how I would conform to point (5). I agree that these rules make literal sense to reinterpret to be about collaborating with a human (and that this is a good goal). But the way that humans collaborate is so different from the way an AI assistant might enhance one’s code makes the analogy not quite work out, to me. Mostly, I feel that rules such as these have the opposite effect of a code of conduct. I see a code of conduct as saying, essentially, “we want to be welcoming to all contributors, and we’re going (to try) be civil no matter who you are”. These rules, on the other hand, seem to say “we’re wary of contributions from AI, though we’ll tolerate them”. The former might subtly encourage more people to participate (even though we would have likely been civil regardless of having a code of conduct), while the latter might subtly encourage fewer people to participate (even though we would review AI-generated code with due diligence). Maybe I’m crazy here! I’d be quite happy to hear I’m in the minority opinion about my reaction to this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion. Each of the points made seems reasonable, but I am sympathethic to @goldfirere's opinion that we do not yet know what form AI collaboration will take, so trying to dictate guidelines and expectations about it could be a bit premature.
If we still wanted to write something about this, we could simply include a generic statement, something along the lines of:
AI contributions: apply your judgement. Just as you would not submit a PR by blindly copying someone else's code, you are expected to behave similarly with respect to AI-generated content.
Of course, if we start noticing a problematic increase in low-quality submissions due to AI, then yes, perhaps more specific guidelines may become useful. But then again, the real problem is not low-quality submissions, but rather submissions that look very much high-quality, but are not. And if someone is going through the trouble of crafting such submissions, it is unlikely that any guidelines will discourage them.
their consent beforehand. This section documents our expectations | ||
regarding their usage to contribute to the OCaml compiler | ||
distribution. These guidelines also apply, or have a direct | ||
counterpart, with entirely human-produced content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counterpart, with entirely human-produced content. | |
counterpart, to entirely human-produced content. |
To be clear, I have no qualms with @nojb's
I don't personally feel such a statement is necessary (I think it's implicit that when you submit a PR, you take responsibility for what you are submitting), but I'm fine if others would like to add this. (Once upon a time, I also did not think codes of conduct were necessary. But I have seen enough people saying that a CoC would make it more likely for them to contribute, so I changed my stance. Maybe there is something similar here.) Also, just to clarify: I don't actually feel that strongly on this whole point. I do have an opinion, which I've tried to explain: I think a policy like the one in this PR does more harm than good. But I'm a relative newcomer here, I admit that my opinion might be wrong in any number of ways, and I'm happy to stand aside if that seems to be the best course of action. |
Overall, I also don't like the proposition because it spends a lot of time trying to prescribe how contributors may use AI in order to avoid a currently not-existent problem. Prescribing what usage of AI is reasonable feels out-of-place in a contributing guideline. I don't think it is our place to decide for others which uses of tools are reasonable or not (in the same way that we don't prescribe a vim configuration). Similarly, I don't think that it makes sense to special case AI: I expect that every contribution should has been been reviewed for correctness by their authors, have a well-defined sets of authors, that have agreed to contribute to the PR. This applies for all authors, sapient or not. Typically, the ACM or linux foundation policies seem to be mostly a reminder that the authorship of AI-generated contents is a legally grey (or black) area. Nevertheless, I partially agree that it might be useful to add a reminder that the three previous points (correctness, authorship, authors agreement) are less trivially true in a AI settings. For this purpose, my feeling is that @nojb 's proposition strikes a better balance. With maybe a slight amendment to account for the authorship question
|
This is a proposal for guidelines, to include in CONTRIBUTING.md, on AI-assisted contributions.
A short summary would be as follows:
Related work I'm aware of: