Skip to content

Add is_scholarlyarticle feature to wikidatawiki#144

Merged
Ladsgroup merged 1 commit into
wikimedia:masterfrom
micgro42:isScholarlyArticleOnMaster
Aug 26, 2020
Merged

Add is_scholarlyarticle feature to wikidatawiki#144
Ladsgroup merged 1 commit into
wikimedia:masterfrom
micgro42:isScholarlyArticleOnMaster

Conversation

@micgro42
Copy link
Copy Markdown
Collaborator

Scholarly articles have a different structure and often don't have many labels other than the one in the original language. This impacts them to a degree larger than what would be appropriate.

Note that this effect likely cannot be seen in the current training data was collected during a time when there were no scholarly articles and thus contains none.

@micgro42
Copy link
Copy Markdown
Collaborator Author

(I'm unsure whether I should add the model and model_info from the build on my local device to this commit or whether there exists a reference machine for that purpose.)

Scholarly articles have a different structure and often don't have many
labels other than the one in the original language. This impacts them to
a degree larger than what would be appropriate.

Note that this effect likely cannot seen in the current training data
was collected during a time when there were no scholarly articles and
thus contains none.
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Merging #144 into master will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #144      +/-   ##
==========================================
+ Coverage   49.40%   49.47%   +0.07%     
==========================================
  Files          49       49              
  Lines        1429     1431       +2     
==========================================
+ Hits          706      708       +2     
  Misses        723      723              
Impacted Files Coverage Δ
articlequality/feature_lists/wikidatawiki.py 88.13% <100.00%> (+0.41%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7448dbb...c914047. Read the comment docs.

@halfak
Copy link
Copy Markdown
Member

halfak commented Jul 30, 2020

We have a reference machine to build this model on. But for now, we don't expect any improvements in model performance. I wonder if we could add some more training data to the model. Would you be willing to help recruit editors to label the quality of items? This might be a good opportunity to pull in some scholarly article items.

@micgro42
Copy link
Copy Markdown
Collaborator Author

That work is already ongoing ( https://labels.wmflabs.org/stats/wikidatawiki/95 ) and hopefully, it will result in better training data where we can see whether this feature makes any difference :)

@halfak
Copy link
Copy Markdown
Member

halfak commented Aug 4, 2020

Great news!

Copy link
Copy Markdown
Member

@Ladsgroup Ladsgroup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has my blessings. It's good to go once we have new data (it doesn't make much sense to merge it without the new data)

@Ladsgroup
Copy link
Copy Markdown
Member

Let's just merge this as the model is not going to be retrained yet.

@Ladsgroup Ladsgroup merged commit ae86ab2 into wikimedia:master Aug 26, 2020
@halfak
Copy link
Copy Markdown
Member

halfak commented Aug 26, 2020

Do you folks need help building models? ores-misc-01 makes the work relatively painless. One concern with merging features like this without rebuilding is that we don't know if it has a positive, negative, or neutral effect on the model fitness. Adding features that do not provide utility adds complexity. Adding features that improve signal (but we don't know) might result in attributing that change in signal to another change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants