Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a workflow for automated backporting #5227

Merged
merged 1 commit into from Feb 1, 2023
Merged

Add a workflow for automated backporting #5227

merged 1 commit into from Feb 1, 2023

Conversation

akuzm
Copy link
Member

@akuzm akuzm commented Jan 26, 2023

  1. Find the latest release branch
  2. For each commit in main and not in release branch (compared by title), find the corresponding PR.
  3. If the PR fixes an issue labeled "bug", and neither the PR nor the issue are labeled "no-backport", cherry-pick the commits from the PR onto the release branch, and create a PR with these changes.

The resulting backport PRs: https://github.com/timescale/timescaledb/labels/pr-backport

Disable-check: force-changelog-changed

@codecov
Copy link

codecov bot commented Jan 26, 2023

Codecov Report

Merging #5227 (d6cadb7) into main (44cd71a) will increase coverage by 0.09%.
The diff coverage is n/a.

❗ Current head d6cadb7 differs from pull request most recent head 9e1bfb2. Consider uploading reports for the commit 9e1bfb2 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5227      +/-   ##
==========================================
+ Coverage   89.02%   89.12%   +0.09%     
==========================================
  Files         225      225              
  Lines       51838    51726     -112     
==========================================
- Hits        46151    46099      -52     
+ Misses       5687     5627      -60     
Impacted Files Coverage Δ
src/bgw/scheduler.c 83.41% <0.00%> (-2.01%) ⬇️
src/loader/bgw_message_queue.c 88.63% <0.00%> (-0.57%) ⬇️
src/loader/loader.c 94.38% <0.00%> (-0.34%) ⬇️
tsl/src/debug.c 49.36% <0.00%> (-0.33%) ⬇️
src/bgw/job_stat.c 91.87% <0.00%> (-0.32%) ⬇️
tsl/src/nodes/decompress_chunk/decompress_chunk.c 94.89% <0.00%> (-0.20%) ⬇️
src/ts_catalog/continuous_agg.c 94.69% <0.00%> (-0.12%) ⬇️
tsl/test/src/test_ddl_hook.c 93.18% <0.00%> (-0.08%) ⬇️
src/cache.c 94.93% <0.00%> (-0.07%) ⬇️
tsl/src/bgw_policy/job.c 87.45% <0.00%> (-0.05%) ⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44cd71a...9e1bfb2. Read the comment docs.

@akuzm akuzm marked this pull request as ready for review January 26, 2023 15:59
@github-actions
Copy link

@RafiaSabih, @shhnwz: please review this pull request.

Powered by pull-review

scripts/backport.py Outdated Show resolved Hide resolved
@svenklemm
Copy link
Member

Can we have a sanity check that aborts when latest-dev/reverse-dev is touched since those would require manual intervention.

scripts/backport.py Outdated Show resolved Hide resolved
@akuzm
Copy link
Member Author

akuzm commented Jan 27, 2023

Can we have a sanity check that aborts when latest-dev/reverse-dev is touched since those would require manual intervention.

Sure. They'll probably fail because of the conflicts anyway, but I added a check to ignore such PRs.

@akuzm
Copy link
Member Author

akuzm commented Jan 27, 2023

Another problem, the actions don't start on PRs created by the github-actions bot: peter-evans/create-pull-request#48. We'll have to make a bot for that, preferably not Mats :)

Copy link
Member

@jnidzwetzki jnidzwetzki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR is merged, I would recommend to:

  • Mention this action in our eng-database call. So, everybody is aware of this new behavior.
  • Update the release documentation.

.github/workflows/backport.yaml Outdated Show resolved Hide resolved
scripts/backport.py Outdated Show resolved Hide resolved
scripts/backport.py Outdated Show resolved Hide resolved
scripts/backport.py Outdated Show resolved Hide resolved
scripts/backport.py Outdated Show resolved Hide resolved
@akuzm
Copy link
Member Author

akuzm commented Feb 1, 2023

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions about the workflow below that might be good to discuss.

It would be better to split it up into more functions to make it easier to maintain. In the current state, it is quite hard to read and follow the code.

GITHUB_TOKEN: ${{ secrets.ORG_AUTOMATION_TOKEN }}
run: |
git remote --verbose
scripts/backport.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being explicit is better if the environment is not under your control. This avoids it accidentally trying to run as Python 2.x.

Suggested change
scripts/backport.py
python3 scripts/backport.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 2 is long deprecated, no? And environment is under our control, this workflow requires ubuntu-latest. I will change the shebang to #/usr/bin/env python3 because the .py script is not compatible with python 2. Can't really tell what's the recommended approach these days: https://peps.python.org/pep-0394/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer to play it safe, but if you feel confident it will always be python3, go with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the shebang to require python3.

.github/workflows/changelog-check.yaml Outdated Show resolved Hide resolved
scripts/backport.py Show resolved Hide resolved
Comment on lines 152 to 165
# Find out what is the branch corresponding to the previous version compared to
# main. We will backport to that branch.
version_config = {
x[0].strip(): x[1].strip()
for x in [
line.split("=")
for line in subprocess.check_output(
f"git show {source_remote}/main:version.config",
shell=True,
text=True,
).splitlines()
if line
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little easier to read. You can also split it into multiple lines.

Suggested change
# Find out what is the branch corresponding to the previous version compared to
# main. We will backport to that branch.
version_config = {
x[0].strip(): x[1].strip()
for x in [
line.split("=")
for line in subprocess.check_output(
f"git show {source_remote}/main:version.config",
shell=True,
text=True,
).splitlines()
if line
]
}
# Find out what is the branch corresponding to the previous version compared to
# main. We will backport to that branch.
version_config = dict(
re.split(r"\s+=\s+", line)
for line in subprocess.check_output(
f"git show {source_remote}/main:version.config",
shell=True,
text=True,
).splitlines()
if line
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter tells me to use dict comprehension instead of dict(), but re.split might be nicer indeed, I'll try to incorporate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion, just a suggestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote it even simpler with capture groups.

Comment on lines +174 to +188
# Find out which commits are unique to main and target branch. Also build sets of
# the titles of these commits. We will compare the titles to check whether a
# commit was backported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this risk missing some commits if they have similar titles?

Since we never modify the master, can't we use the SHA on the master to decide if we should backport the commit or not. We could add the original SHA either to the commit message using interpret-trailer or to a note using git notes. (I would prefer the former since that does not require special treatment to push and pull notes.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar -- no. Exactly the same titles -- yes, they will be missed. But how likely this is in practice? I think the bug fixes usually have very distinct titles. I like the titles because this doesn't require any parsing.

$ git log --oneline --pretty=format:%s origin/main | sort | uniq -c | sort -nr | head -100
     18 Prepare the repo for next development cycle
      5 Prepare the repo for the next development cycle
      5 Add missing gitignore entry
      4 Fix flaky pg_dump test
      4 Bump postgres versions used in CI
      3 Remove unused functions
      3 Fix typos in comments
      3 Fix memory overflow
      2 Update CHANGELOG
      2 Upadting docs
      2 Run pgindent on code
      2 Remove unused functions from utils.c
      2 Remove dead code
      2 Release 2.0.0
      2 Move enterprise features to community
      2 Make older_than / newer_than error message user friendly
      2 Fix various misspellings
      2 Fix issues discovered by coverity
      2 Fix flaky bgw_custom test
      2 Fix compression_ddl isolation test
      2 Fix CAgg on CAgg variable bucket size validation
      2 Fix assertion failure in cursor_fetcher_rewind
      2 cleanup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the SHA of the original commit as a trailer would not require any parsing either. It would also allow finding the original commit easy to make sure that no mistakes were done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm using cherry-pick -x, which adds (cherry picked from commit c5e496a554e9f4d04578f39669108554c22c918d) to the end of the message. I think we have to support this kind of trailer, because this is a git-standard thing that you can also use when cherry-picking manually.

So matching by SHA is going to require:

  1. some minimal parsing (e.g. re.match);
  2. human operators not forgetting to add -x when they cherry-pick manually -- this one I'm worried about the most;
  3. some special handling when you actually do a different manual fix in the release branch. You'll have to pretend it's a cherry-pick not to confuse the automation.

Using titles solves all three problems, because it's easy and natural to use the same title for manual cherry-pick/separate fix, and instead of parsing I just do git log --pretty=format:%s.

Let's observe for a while how problematic this title matching is going to be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm using cherry-pick -x, which adds (cherry picked from commit c5e496a554e9f4d04578f39669108554c22c918d) to the end of the message. I think we have to support this kind of trailer, because this is a git-standard thing that you can also use when cherry-picking manually.

So matching by SHA is going to require:

1. some minimal parsing (e.g. re.match);

Yeah, this one is simpler.

2. human operators not forgetting to add `-x` when they cherry-pick manually -- this one I'm worried about the most;

Since human operators can be quite creative in what they are doing we're probably going to need some separate CI checks for release branches any way.

3. some special handling when you actually do a different manual fix in the release branch. You'll have to pretend it's a cherry-pick not to confuse the automation.

Not sure that is a big problem. If you do a manual fix, you would probably do something like this anyway:

$ git cherry-pick -x <whatever>
<hack hack hack>
$ git commit -a -m'My tweaks'
<test test test>
$ git rebase -i upstream/main
<merge it all, and keep the original message>

Using titles solves all three problems, because it's easy and natural to use the same title for manual cherry-pick/separate fix, and instead of parsing I just do git log --pretty=format:%s.

Let's observe for a while how problematic this title matching is going to be?

Seems it's already decided. :)

Comment on lines 199 to 204
# We will do backports per-PR, because one PR, though not often, might contain
# many commits. So as the first step, go through the commits unique to main, find
# out which of them have to be backported, and remember the corresponding PRs.
# We also have to remember which commits to backport. The list from PR itself is
# not what we need, these are the original commits from the PR branch, and we
# need the resulting commits in master. Store them as dict(pr number -> PRINfo).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.... quite often PRs contain multiple commit messages because they are fixing separate issues but are added to a single PR for convenience. In that case, it might make sense to cherry-pick them separately and create a PR for each.

Copy link
Member Author

@akuzm akuzm Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't really handle that, because we can't understand which commit fixes which issue. One might need to be backported, and the other not. I can skip PRs with multiple commits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can skip PRs with multiple commits.

Did that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't really handle that, because we can't understand which commit fixes which issue.

Why not? Each commit would contain a "fixed" label in this case, so it should be straightforward to cherry-pick these and create separate PRs?

One might need to be backported, and the other not. I can skip PRs with multiple commits.

Not sure why one should be backported and the other one not. Both are committed in a PR, so both should be part of the release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why one should be backported and the other one not. Both are committed in a PR, so both should be part of the release.

If both should be backported together, separate PRs seem to contradict it?

In the case you described above, we have two commits fixing two different issues in one PR. The current rule for backporting is "if the fixed issue is labeled 'bug' and not labeled 'no-backport'". These labels can differ for these issues, so the decision to backport will also differ.

I think it is more convenient to make backport decision on a per-PR basis. Commits often don't have "fixes" references (and I think it's good, because they tend to spam the issues endlessly at each rebase). We will also need more code to support both per-PR and per-commit basis, and pretty tricky logic to combine these approaches, because they might easily give different results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case you described above, we have two commits fixing two different issues in one PR. The current rule for backporting is "if the fixed issue is labeled 'bug' and not labeled 'no-backport'". These labels can differ for these issues, so the decision to backport will also differ.

But then they should not be merged as a single PR. They should actually be separate PRs.

To be able to backport on a per-PR basis, it is important that if a PR consists of multiple commits, it is because they are impossible to submit as separate PRs. In that case, and only in that case, you can safely assume that the entire PR should be backported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits often don't have "fixes" references (and I think it's good, because they tend to spam the issues endlessly at each rebase).

All mine have. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more convenient to make backport decision on a per-PR basis. [...] We will also need more code to support both per-PR and per-commit basis, and pretty tricky logic to combine these approaches, because they might easily give different results.

I am leaning towards the per-PR basis for another reason than "more code to support both". It is in some situations necessary to have two commits for a single change. For example, if you rename a lot of files and then modify them, it is better to have one commit for the renaming and one for the following changes. In this case, the PR would be the glue for these two PRs.

Comment on lines 292 to 306
if (
subprocess.run(
f"git rev-parse {target_remote}/{backport_branch} > /dev/null 2>&1",
shell=True,
check=False,
).returncode
== 0
):
print(
f'Backport branch {backport_branch} for PR #{original_pr.number}: "{original_pr.title}" already exists. Skipping.'
)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a lot easier to read if you split out the git commands into separate functions and use them in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so... the first thing I need to know when I look at this code is which git command it is running. Putting this command somewhere else just makes it more complicated. I agree these lines look kinda ugly, especially when formatted by "black", but it's all just boilerplate around the only meaningful thing, the git command.

I can instead move boilerplate to functions, e.g. git_check_output and git_returncode, to remove the amount of visual noise. Thankfully I only need two these modes for subprocess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankfully I only need two these modes for subprocess.

Actually three, but anyway, done that, looks better now.

)

original_description = original_pr.body
# Comment out the keywords like 'Fixes #1234' to avoid confusion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what confusion you have in mind.

It is good if the backport PRs reference the issue as well so that you can track that it was actually backported. This will allow you to eliminate that our backporting system is making a mistake.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to reference the issue, but I don't want Github to treat it as a "fixes" reference. Granted, most of the time it's going to be closed already, so it's not important. But there might be some weird case when the original issue was reopened, but the backport of the old fix is in flight, and then it closes the issue. Or maybe we would want to reason about backporting another way, starting from issues, and then it would help to have only one PR that fixes an issue. I think it is more straightforward to keep the "issue"<->"PR that fixes it" a one-to-one correspondence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to reference the issue

e.g. it says "Original issue #xxxx" in the description of the backport PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. it says "Original issue #xxxx" in the description of the backport PR

That works for me.

1. Find the latest release branch
1. For each commit in main and not in release branch (compared by
   title), find the corresponding PR.
1. If the PR fixes an issue labeled "bug", and neither the PR nor the
   issue are labeled "no-backport", cherry-pick the commits from the PR
onto the release branch, and create a PR with these changes.
@akuzm akuzm merged commit 739fd00 into main Feb 1, 2023
@akuzm akuzm deleted the aku/backport branch February 1, 2023 12:25
@akuzm
Copy link
Member Author

akuzm commented Feb 1, 2023

Oh my, I forgot to disable the auto merge 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants