Add a workflow for automated backporting #5227

akuzm · 2023-01-26T15:01:20Z

Find the latest release branch
For each commit in main and not in release branch (compared by title), find the corresponding PR.
If the PR fixes an issue labeled "bug", and neither the PR nor the issue are labeled "no-backport", cherry-pick the commits from the PR onto the release branch, and create a PR with these changes.

The resulting backport PRs: https://github.com/timescale/timescaledb/labels/pr-backport

Disable-check: force-changelog-changed

codecov · 2023-01-26T15:21:31Z

Codecov Report

Merging #5227 (d6cadb7) into main (44cd71a) will increase coverage by 0.09%.
The diff coverage is n/a.

❗ Current head d6cadb7 differs from pull request most recent head 9e1bfb2. Consider uploading reports for the commit 9e1bfb2 to get more accurate results

@@            Coverage Diff             @@
##             main    #5227      +/-   ##
==========================================
+ Coverage   89.02%   89.12%   +0.09%     
==========================================
  Files         225      225              
  Lines       51838    51726     -112     
==========================================
- Hits        46151    46099      -52     
+ Misses       5687     5627      -60

Impacted Files	Coverage Δ
src/bgw/scheduler.c	`83.41% <0.00%> (-2.01%)`	⬇️
src/loader/bgw_message_queue.c	`88.63% <0.00%> (-0.57%)`	⬇️
src/loader/loader.c	`94.38% <0.00%> (-0.34%)`	⬇️
tsl/src/debug.c	`49.36% <0.00%> (-0.33%)`	⬇️
src/bgw/job_stat.c	`91.87% <0.00%> (-0.32%)`	⬇️
tsl/src/nodes/decompress_chunk/decompress_chunk.c	`94.89% <0.00%> (-0.20%)`	⬇️
src/ts_catalog/continuous_agg.c	`94.69% <0.00%> (-0.12%)`	⬇️
tsl/test/src/test_ddl_hook.c	`93.18% <0.00%> (-0.08%)`	⬇️
src/cache.c	`94.93% <0.00%> (-0.07%)`	⬇️
tsl/src/bgw_policy/job.c	`87.45% <0.00%> (-0.05%)`	⬇️
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44cd71a...9e1bfb2. Read the comment docs.

.github/workflows/backport.yaml

github-actions · 2023-01-26T16:06:56Z

@RafiaSabih, @shhnwz: please review this pull request.

Powered by pull-review

scripts/backport.py

svenklemm · 2023-01-27T06:41:03Z

Can we have a sanity check that aborts when latest-dev/reverse-dev is touched since those would require manual intervention.

scripts/backport.py

akuzm · 2023-01-27T08:12:34Z

Can we have a sanity check that aborts when latest-dev/reverse-dev is touched since those would require manual intervention.

Sure. They'll probably fail because of the conflicts anyway, but I added a check to ignore such PRs.

akuzm · 2023-01-27T10:39:08Z

Another problem, the actions don't start on PRs created by the github-actions bot: peter-evans/create-pull-request#48. We'll have to make a bot for that, preferably not Mats :)

jnidzwetzki

Before this PR is merged, I would recommend to:

Mention this action in our eng-database call. So, everybody is aware of this new behavior.
Update the release documentation.

.github/workflows/backport.yaml

scripts/backport.py

akuzm · 2023-02-01T09:02:17Z

Added a couple of lines to the wiki: https://github.com/timescale/eng-database/wiki/Releasing-TimescaleDB#automated-cherry-picking-of-bug-fixes

mkindahl

I have some questions about the workflow below that might be good to discuss.

It would be better to split it up into more functions to make it easier to maintain. In the current state, it is quite hard to read and follow the code.

mkindahl · 2023-01-31T14:39:12Z

.github/workflows/backport.yaml

+          GITHUB_TOKEN: ${{ secrets.ORG_AUTOMATION_TOKEN }}
+        run: |
+          git remote --verbose
+          scripts/backport.py


Being explicit is better if the environment is not under your control. This avoids it accidentally trying to run as Python 2.x.

Suggested change

scripts/backport.py

python3 scripts/backport.py

Python 2 is long deprecated, no? And environment is under our control, this workflow requires ubuntu-latest. I will change the shebang to #/usr/bin/env python3 because the .py script is not compatible with python 2. Can't really tell what's the recommended approach these days: https://peps.python.org/pep-0394/

I usually prefer to play it safe, but if you feel confident it will always be python3, go with it.

Changed the shebang to require python3.

.github/workflows/changelog-check.yaml

scripts/backport.py

mkindahl · 2023-02-01T09:12:38Z

scripts/backport.py

+# Find out what is the branch corresponding to the previous version compared to
+# main. We will backport to that branch.
+version_config = {
+    x[0].strip(): x[1].strip()
+    for x in [
+        line.split("=")
+        for line in subprocess.check_output(
+            f"git show {source_remote}/main:version.config",
+            shell=True,
+            text=True,
+        ).splitlines()
+        if line
+    ]
+}


A little easier to read. You can also split it into multiple lines.

Suggested change

# Find out what is the branch corresponding to the previous version compared to

# main. We will backport to that branch.

version_config = {

x[0].strip(): x[1].strip()

for x in [

line.split("=")

for line in subprocess.check_output(

f"git show {source_remote}/main:version.config",

shell=True,

text=True,

).splitlines()

if line

]

}

# Find out what is the branch corresponding to the previous version compared to

# main. We will backport to that branch.

version_config = dict(

re.split(r"\s+=\s+", line)

for line in subprocess.check_output(

f"git show {source_remote}/main:version.config",

shell=True,

text=True,

).splitlines()

if line

)

The linter tells me to use dict comprehension instead of dict(), but re.split might be nicer indeed, I'll try to incorporate it.

No strong opinion, just a suggestion.

Wrote it even simpler with capture groups.

mkindahl · 2023-02-01T09:27:39Z

scripts/backport.py

+# Find out which commits are unique to main and target branch. Also build sets of
+# the titles of these commits. We will compare the titles to check whether a
+# commit was backported.


Doesn't this risk missing some commits if they have similar titles?

Since we never modify the master, can't we use the SHA on the master to decide if we should backport the commit or not. We could add the original SHA either to the commit message using interpret-trailer or to a note using git notes. (I would prefer the former since that does not require special treatment to push and pull notes.)

Similar -- no. Exactly the same titles -- yes, they will be missed. But how likely this is in practice? I think the bug fixes usually have very distinct titles. I like the titles because this doesn't require any parsing.

$ git log --oneline --pretty=format:%s origin/main | sort | uniq -c | sort -nr | head -100 18 Prepare the repo for next development cycle 5 Prepare the repo for the next development cycle 5 Add missing gitignore entry 4 Fix flaky pg_dump test 4 Bump postgres versions used in CI 3 Remove unused functions 3 Fix typos in comments 3 Fix memory overflow 2 Update CHANGELOG 2 Upadting docs 2 Run pgindent on code 2 Remove unused functions from utils.c 2 Remove dead code 2 Release 2.0.0 2 Move enterprise features to community 2 Make older_than / newer_than error message user friendly 2 Fix various misspellings 2 Fix issues discovered by coverity 2 Fix flaky bgw_custom test 2 Fix compression_ddl isolation test 2 Fix CAgg on CAgg variable bucket size validation 2 Fix assertion failure in cursor_fetcher_rewind 2 cleanup

Adding the SHA of the original commit as a trailer would not require any parsing either. It would also allow finding the original commit easy to make sure that no mistakes were done.

Now I'm using cherry-pick -x, which adds (cherry picked from commit c5e496a554e9f4d04578f39669108554c22c918d) to the end of the message. I think we have to support this kind of trailer, because this is a git-standard thing that you can also use when cherry-picking manually.

So matching by SHA is going to require:

some minimal parsing (e.g. re.match);

human operators not forgetting to add -x when they cherry-pick manually -- this one I'm worried about the most;

some special handling when you actually do a different manual fix in the release branch. You'll have to pretend it's a cherry-pick not to confuse the automation.

Using titles solves all three problems, because it's easy and natural to use the same title for manual cherry-pick/separate fix, and instead of parsing I just do git log --pretty=format:%s.

Let's observe for a while how problematic this title matching is going to be?

Now I'm using cherry-pick -x, which adds (cherry picked from commit c5e496a554e9f4d04578f39669108554c22c918d) to the end of the message. I think we have to support this kind of trailer, because this is a git-standard thing that you can also use when cherry-picking manually.

So matching by SHA is going to require:

1. some minimal parsing (e.g. re.match);

Yeah, this one is simpler.

2. human operators not forgetting to add `-x` when they cherry-pick manually -- this one I'm worried about the most;

Since human operators can be quite creative in what they are doing we're probably going to need some separate CI checks for release branches any way.

3. some special handling when you actually do a different manual fix in the release branch. You'll have to pretend it's a cherry-pick not to confuse the automation.

Not sure that is a big problem. If you do a manual fix, you would probably do something like this anyway:

$ git cherry-pick -x <whatever> <hack hack hack> $ git commit -a -m'My tweaks' <test test test> $ git rebase -i upstream/main <merge it all, and keep the original message>

Using titles solves all three problems, because it's easy and natural to use the same title for manual cherry-pick/separate fix, and instead of parsing I just do git log --pretty=format:%s.

Let's observe for a while how problematic this title matching is going to be?

Seems it's already decided. :)

mkindahl · 2023-02-01T09:30:37Z

scripts/backport.py

+# We will do backports per-PR, because one PR, though not often, might contain
+# many commits. So as the first step, go through the commits unique to main, find
+# out which of them have to be backported, and remember the corresponding PRs.
+# We also have to remember which commits to backport. The list from PR itself is
+# not what we need, these are the original commits from the PR branch, and we
+# need the resulting commits in master. Store them as dict(pr number -> PRINfo).


Hmm.... quite often PRs contain multiple commit messages because they are fixing separate issues but are added to a single PR for convenience. In that case, it might make sense to cherry-pick them separately and create a PR for each.

I think we can't really handle that, because we can't understand which commit fixes which issue. One might need to be backported, and the other not. I can skip PRs with multiple commits.

I can skip PRs with multiple commits.

Did that.

I think we can't really handle that, because we can't understand which commit fixes which issue.

Why not? Each commit would contain a "fixed" label in this case, so it should be straightforward to cherry-pick these and create separate PRs?

One might need to be backported, and the other not. I can skip PRs with multiple commits.

Not sure why one should be backported and the other one not. Both are committed in a PR, so both should be part of the release.

Not sure why one should be backported and the other one not. Both are committed in a PR, so both should be part of the release.

If both should be backported together, separate PRs seem to contradict it?

In the case you described above, we have two commits fixing two different issues in one PR. The current rule for backporting is "if the fixed issue is labeled 'bug' and not labeled 'no-backport'". These labels can differ for these issues, so the decision to backport will also differ.

I think it is more convenient to make backport decision on a per-PR basis. Commits often don't have "fixes" references (and I think it's good, because they tend to spam the issues endlessly at each rebase). We will also need more code to support both per-PR and per-commit basis, and pretty tricky logic to combine these approaches, because they might easily give different results.

In the case you described above, we have two commits fixing two different issues in one PR. The current rule for backporting is "if the fixed issue is labeled 'bug' and not labeled 'no-backport'". These labels can differ for these issues, so the decision to backport will also differ.

But then they should not be merged as a single PR. They should actually be separate PRs.

To be able to backport on a per-PR basis, it is important that if a PR consists of multiple commits, it is because they are impossible to submit as separate PRs. In that case, and only in that case, you can safely assume that the entire PR should be backported.

Commits often don't have "fixes" references (and I think it's good, because they tend to spam the issues endlessly at each rebase).

All mine have. :)

I think it is more convenient to make backport decision on a per-PR basis. [...] We will also need more code to support both per-PR and per-commit basis, and pretty tricky logic to combine these approaches, because they might easily give different results.

I am leaning towards the per-PR basis for another reason than "more code to support both". It is in some situations necessary to have two commits for a single change. For example, if you rename a lot of files and then modify them, it is better to have one commit for the renaming and one for the following changes. In this case, the PR would be the glue for these two PRs.

mkindahl · 2023-02-01T09:32:40Z

scripts/backport.py

+    if (
+        subprocess.run(
+            f"git rev-parse {target_remote}/{backport_branch} > /dev/null 2>&1",
+            shell=True,
+            check=False,
+        ).returncode
+        == 0
+    ):
+        print(
+            f'Backport branch {backport_branch} for PR #{original_pr.number}: "{original_pr.title}" already exists. Skipping.'
+        )
+        continue


It is a lot easier to read if you split out the git commands into separate functions and use them in the code.

I don't think so... the first thing I need to know when I look at this code is which git command it is running. Putting this command somewhere else just makes it more complicated. I agree these lines look kinda ugly, especially when formatted by "black", but it's all just boilerplate around the only meaningful thing, the git command.

I can instead move boilerplate to functions, e.g. git_check_output and git_returncode, to remove the amount of visual noise. Thankfully I only need two these modes for subprocess.

Thankfully I only need two these modes for subprocess.

Actually three, but anyway, done that, looks better now.

mkindahl · 2023-02-01T09:35:20Z

scripts/backport.py

+    )
+
+    original_description = original_pr.body
+    # Comment out the keywords like 'Fixes #1234' to avoid confusion.


Not sure what confusion you have in mind.

It is good if the backport PRs reference the issue as well so that you can track that it was actually backported. This will allow you to eliminate that our backporting system is making a mistake.

It's going to reference the issue, but I don't want Github to treat it as a "fixes" reference. Granted, most of the time it's going to be closed already, so it's not important. But there might be some weird case when the original issue was reopened, but the backport of the old fix is in flight, and then it closes the issue. Or maybe we would want to reason about backporting another way, starting from issues, and then it would help to have only one PR that fixes an issue. I think it is more straightforward to keep the "issue"<->"PR that fixes it" a one-to-one correspondence.

It's going to reference the issue

e.g. it says "Original issue #xxxx" in the description of the backport PR

e.g. it says "Original issue #xxxx" in the description of the backport PR

That works for me.

1. Find the latest release branch 1. For each commit in main and not in release branch (compared by title), find the corresponding PR. 1. If the PR fixes an issue labeled "bug", and neither the PR nor the issue are labeled "no-backport", cherry-pick the commits from the PR onto the release branch, and create a PR with these changes.

akuzm · 2023-02-01T12:28:22Z

Oh my, I forgot to disable the auto merge 😅

github-actions bot assigned akuzm Jan 26, 2023

akuzm commented Jan 26, 2023

View reviewed changes

.github/workflows/backport.yaml Outdated Show resolved Hide resolved

akuzm marked this pull request as ready for review January 26, 2023 15:59

github-actions bot requested review from RafiaSabih and shhnwz January 26, 2023 16:06

akuzm commented Jan 26, 2023

View reviewed changes

scripts/backport.py Show resolved Hide resolved

akuzm commented Jan 26, 2023

View reviewed changes

scripts/backport.py Outdated Show resolved Hide resolved

jnidzwetzki reviewed Jan 27, 2023

View reviewed changes

scripts/backport.py Outdated Show resolved Hide resolved

jnidzwetzki reviewed Jan 27, 2023

View reviewed changes

scripts/backport.py Show resolved Hide resolved

jnidzwetzki approved these changes Jan 30, 2023

View reviewed changes

.github/workflows/backport.yaml Outdated Show resolved Hide resolved

scripts/backport.py Outdated Show resolved Hide resolved

scripts/backport.py Outdated Show resolved Hide resolved

scripts/backport.py Outdated Show resolved Hide resolved

scripts/backport.py Outdated Show resolved Hide resolved

svenklemm approved these changes Feb 1, 2023

View reviewed changes

akuzm force-pushed the aku/backport branch from d6cadb7 to 703e9df Compare February 1, 2023 09:04

akuzm enabled auto-merge (rebase) February 1, 2023 09:04

mkindahl reviewed Feb 1, 2023

View reviewed changes

akuzm force-pushed the aku/backport branch from bacfa97 to 9e1bfb2 Compare February 1, 2023 11:51

akuzm merged commit 739fd00 into main Feb 1, 2023

akuzm deleted the aku/backport branch February 1, 2023 12:25

akuzm mentioned this pull request Feb 1, 2023

Improve automated backports #5268

Merged

Add a workflow for automated backporting #5227

Add a workflow for automated backporting #5227

Conversation

akuzm commented Jan 26, 2023 • edited

codecov bot commented Jan 26, 2023 • edited

Codecov Report

github-actions bot commented Jan 26, 2023

svenklemm commented Jan 27, 2023

akuzm commented Jan 27, 2023

akuzm commented Jan 27, 2023

jnidzwetzki left a comment

Choose a reason for hiding this comment

akuzm commented Feb 1, 2023

mkindahl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akuzm Feb 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akuzm commented Feb 1, 2023

akuzm commented Jan 26, 2023 •

edited

codecov bot commented Jan 26, 2023 •

edited

akuzm Feb 1, 2023 •

edited