-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix extraction when there are multiple reverts #127
Conversation
Comparing the ptwiki datasets before and after this change, using this gist, shows that these are the edits which will not be added to the dataset after the changes:
On the other hand, these are the edits which will be added (and were not in the previous version): |
if verbose: | ||
sys.stderr.write("r") | ||
sys.stderr.flush() | ||
else: | ||
# This revision is not a revert. Get the new labels | ||
# FIXME: what if last_labels contains labels from one of | ||
# reverted edits? | ||
new_labels = project_labels - last_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're storing every single revision in revisions
, we could possibly also store the labels and take a second pass looking for revisions that are not reverted.
* Invert the reverted status recursively * Add tests for the new behavior
c3b9b31
to
2476d22
Compare
Codecov Report
@@ Coverage Diff @@
## master #127 +/- ##
==========================================
+ Coverage 52.07% 52.21% +0.13%
==========================================
Files 47 47
Lines 1373 1377 +4
==========================================
+ Hits 715 719 +4
Misses 658 658
Continue to review full report at Codecov.
|
Merged as part of #129 |
See https://phabricator.wikimedia.org/T252152.