Skip to content

Add damaging and goodfaith models for huwiki #142

Closed
tgr wants to merge 4 commits intowikimedia:masterfrom
tgr:huwiki-damaging-goodfaith
Closed

Add damaging and goodfaith models for huwiki #142
tgr wants to merge 4 commits intowikimedia:masterfrom
tgr:huwiki-damaging-goodfaith

Conversation

@tgr
Copy link
Contributor

@tgr tgr commented Mar 13, 2018

Also update the list of trusted groups.

Bug: T185903

@codecov
Copy link

codecov bot commented Mar 13, 2018

Codecov Report

Merging #142 into master will decrease coverage by 7.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #142      +/-   ##
==========================================
- Coverage   20.93%   13.91%   -7.03%     
==========================================
  Files          58       56       -2     
  Lines        1070      999      -71     
==========================================
- Hits          224      139      -85     
- Misses        846      860      +14
Impacted Files Coverage Δ
editquality/codegen/generate.py 0% <0%> (-100%) ⬇️
editquality/utilities/generate_make.py 0% <0%> (ø) ⬆️
editquality/utilities/fetch_labels.py 0% <0%> (ø) ⬆️
editquality/utilities/merge_labels.py 0% <0%> (ø) ⬆️
editquality/tests/test_config.py
editquality/codegen/tests/test_generate.py
editquality/config.py
editquality/feature_lists/lvwiki.py
editquality/codegen/config.py 0% <0%> (ø)
editquality/codegen/util.py 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81fed22...5d11a50. Read the comment docs.

- bureaucrat
- editor
- templateeditor
- interface-editor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding these now will cause some warnings, but no big issues. Essentially, we'll have labels for some edits where "needs_review" is false with "reason" == "trusted_group"

Makefile Outdated
damaging \
roc_auc.labels.true \
--label-weight "true=$(damaging_weight)" \
--pop-rate "true=0.0872" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/10th of all edits are damaging? that seems wrong.

@tgr tgr force-pushed the huwiki-damaging-goodfaith branch from b2c538b to 8abf153 Compare March 19, 2018 08:52
@tgr
Copy link
Contributor Author

tgr commented Mar 19, 2018

@halfak pointed out that I used the percentage of hand-labeled edits where I should have used the percentage of total edits so I redid the commits using the ratio between

 ack '"goodfaith": true' datasets/huwiki.labeled_revisions.40k_2016.json | wc -l

and

ack '"rev_id"' datasets/huwiki.labeled_revisions.40k_2016.json | wc -l

I'll update the docs once the PR passes review.

@tgr
Copy link
Contributor Author

tgr commented Mar 20, 2018

ack '"rev_id"' datasets/huwiki.labeled_revisions.40k_2016.json | wc -l gives 39678 so I am not sure if that's the correct denominator or I should have used 40000 instead (although it's not much difference). I suppose the missing 322 revisions are deleted ones?

@halfak
Copy link
Member

halfak commented Mar 20, 2018

I'd recommend using the 39678 as the denominator -- but it shouldn't make too much of a difference. I think you're right and 322 are revisions to deleted pages or they have been rev-deleted.

@adamwight
Copy link
Contributor

Needs a rebase. The master branch has an updated merge_labels algorithm which simplifies the Makefile logic.

min_samples_leaf: 13
pop_rate_true: 0.014812583163867339
build_number: 1
damaging:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch should drop the huwiki.reverted model too, AIUI.

@adamwight
Copy link
Contributor

P.S., you should have privileges on this repo, and if the branch were under the org repo I could push my rebase: https://github.com/wiki-ai/editquality/compare/awight-huwiki

@tgr tgr force-pushed the huwiki-damaging-goodfaith branch from 8abf153 to 5d11a50 Compare April 11, 2018 23:27
@tgr
Copy link
Contributor Author

tgr commented Apr 11, 2018

Closing in favor of #152 which uses a local branch (it did not seem possible to change the source for this one).

@tgr tgr closed this Apr 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants