[Pytorch AutoRevert] - Improves autorevert check heuristics #6853

jeanschmidt · 2025-06-27T15:51:39Z

Do some improvements in the back analisys for the revert logic with the goal of improving precision and recall and validate as a valid strategy.

Checked against the workflows: pull trunk inductor linux-binary-manywheel

Old code:

Timeframe: 720 hours
Commits checked: 6177
Auto revert patterns detected: 188
Actual reverts inside auto revert patterns detected: 24 (12.8%)
Total revert commits in period: 115
Reverts that dont match any auto revert pattern detected: 91

Newer code:

Workflow(s): pull, trunk, inductor, linux-binary-manywheel
Timeframe: 720 hours
Commits checked: 5403
Auto revert patterns detected: 442
Actual reverts inside auto revert patterns detected (precision): 48 (10.9%)
Total revert commits in period: 115
Reverts that dont match any auto revert pattern detected (recall): 67 (58.3%)
Per workflow precision:
  pull: 45 reverts out of 411 patterns (10.9%)
  trunk: 1 reverts out of 8 patterns (12.5%)
  inductor: 2 reverts out of 20 patterns (10.0%)
  linux-binary-manywheel: 0 reverts out of 3 patterns (0.0%)

Critical implemented changes:

Look forward and back for the first commit that ran the failed job, instead of trusting on always looking on the one right before or right after.
Job names have parts we don't care, like shards indices. As a failure could happen in any shard we want to find any shard with the same failure;

Things I tried and don't lead to great results:

ignoring error classification - too low precision, not significant increase in recall
not requiring error repetition - too low precision, not significant increase in recall

My take:
With a precision of 10% it justifies the cost of re-running jobs in order to confirm redness status, even if it is not possible to test, I suspect that the fact we force require the same output 2 times for all 3 signals, this should elevate the precision to a very high standard. Unfortunately the only way to test is run this in shadow mode.

With a recall of 55%, it points out to being able to capture most of the introduced trunk redness errors. Lots of reverts might not be caused by ci redness, especially not in the workflows we are analyzing (could be performance degradation, GHF/internal reasons and many others). This number seems comfortable to provide a substantial gain in benefit for CI quality.

vercel · 2025-06-27T15:51:44Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Updated (UTC)
torchci	⬜️ Ignored (Inspect)	Visit Preview	Jun 30, 2025 4:08pm

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/autorevert_checker.py

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/testers/autorevert.py

…nschmidt/autorevert_improvements

clee2000

These comments aren't aimed at the content in this PR specifically, but I'm curious about stuff you've tried for autorevert:

it seems like you're separating by (, which cuts off shard information, but also the test config (like default, dynamo, etc), if you add that back in, does that improve anything?
I only see classification_rule used, have you considered using the line instead?

I'm also curious if you know what the other ~50% not caught are, could it be that most are ghfirst or lint?

jeanschmidt · 2025-06-30T18:33:26Z

@clee2000

it seems like you're separating by (, which cuts off shard information, but also the test config (like default, dynamo, etc), if you add that back in, does that improve anything?

I did not test with mathing test configuration, but this might be something to look. Still, I am more curious into improving recall now than precision. Given the full retry of the same job for the 3 commits should (in my opinion) be a robust signal to clear up any false positive.

I only see classification_rule used, have you considered using the line instead?

Yes, it is not great. Precision don't increase substantially, recall falls to very low numbers.

jeanschmidt added 2 commits June 25, 2025 18:25

fixing and evaluating autorevert lambda

43e3834

fixes logic

9e95384

pytorch-bot bot added the ci-no-td label Jun 27, 2025

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 27, 2025

github-advanced-security bot found potential problems Jun 27, 2025

View reviewed changes

jeanschmidt changed the title ~~Jeanschmidt/autorevert improvements~~ [Pytorch AutoRevert] - Improves autorevert check heuristics Jun 27, 2025

jeanschmidt added 2 commits June 30, 2025 18:04

Merge branch 'main' of https://github.com/pytorch/test-infra into jea…

382bde0

…nschmidt/autorevert_improvements

20250630180805

abda9dd

clee2000 approved these changes Jun 30, 2025

View reviewed changes

jeanschmidt merged commit 5f86d76 into main Jun 30, 2025
6 checks passed

jeanschmidt deleted the jeanschmidt/autorevert_improvements branch June 30, 2025 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Pytorch AutoRevert] - Improves autorevert check heuristics #6853

[Pytorch AutoRevert] - Improves autorevert check heuristics #6853

Uh oh!

jeanschmidt commented Jun 27, 2025 •

edited

Loading

Uh oh!

vercel bot commented Jun 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clee2000 left a comment

Uh oh!

jeanschmidt commented Jun 30, 2025

Uh oh!

Uh oh!

Uh oh!

[Pytorch AutoRevert] - Improves autorevert check heuristics #6853

[Pytorch AutoRevert] - Improves autorevert check heuristics #6853

Uh oh!

Conversation

jeanschmidt commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clee2000 left a comment

Choose a reason for hiding this comment

Uh oh!

jeanschmidt commented Jun 30, 2025

Uh oh!

Uh oh!

Uh oh!

jeanschmidt commented Jun 27, 2025 •

edited

Loading

vercel bot commented Jun 27, 2025 •

edited

Loading