-
Notifications
You must be signed in to change notification settings - Fork 100
[Pytorch AutoRevert] - Improves autorevert check heuristics #6853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/autorevert_checker.py
Fixed
Show fixed
Hide fixed
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/autorevert_checker.py
Fixed
Show fixed
Hide fixed
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/autorevert_checker.py
Fixed
Show fixed
Hide fixed
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/autorevert_checker.py
Fixed
Show fixed
Hide fixed
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/testers/autorevert.py
Fixed
Show fixed
Hide fixed
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/testers/autorevert.py
Fixed
Show fixed
Hide fixed
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/testers/autorevert.py
Fixed
Show fixed
Hide fixed
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/testers/autorevert.py
Fixed
Show fixed
Hide fixed
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/testers/autorevert.py
Fixed
Show fixed
Hide fixed
…nschmidt/autorevert_improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments aren't aimed at the content in this PR specifically, but I'm curious about stuff you've tried for autorevert:
- it seems like you're separating by
(
, which cuts off shard information, but also the test config (like default, dynamo, etc), if you add that back in, does that improve anything? - I only see
classification_rule
used, have you considered using theline
instead?
I'm also curious if you know what the other ~50% not caught are, could it be that most are ghfirst or lint?
I did not test with mathing test configuration, but this might be something to look. Still, I am more curious into improving recall now than precision. Given the full retry of the same job for the 3 commits should (in my opinion) be a robust signal to clear up any false positive.
Yes, it is not great. Precision don't increase substantially, recall falls to very low numbers. |
Do some improvements in the back analisys for the revert logic with the goal of improving precision and recall and validate as a valid strategy.
Checked against the workflows: pull trunk inductor linux-binary-manywheel
Old code:
Newer code:
Critical implemented changes:
Things I tried and don't lead to great results:
My take:
With a precision of 10% it justifies the cost of re-running jobs in order to confirm redness status, even if it is not possible to test, I suspect that the fact we force require the same output 2 times for all 3 signals, this should elevate the precision to a very high standard. Unfortunately the only way to test is run this in shadow mode.
With a recall of 55%, it points out to being able to capture most of the introduced trunk redness errors. Lots of reverts might not be caused by ci redness, especially not in the workflows we are analyzing (could be performance degradation, GHF/internal reasons and many others). This number seems comfortable to provide a substantial gain in benefit for CI quality.