Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revscoring: Implement fast scoring for fast cross validation #388

Merged
merged 1 commit into from Dec 28, 2017

Conversation

codez266
Copy link
Contributor

@codez266 codez266 commented Dec 27, 2017

  • Implements a new method score_many in sklearn.py to score a bunch of instances together to utilize numpy's underlying optimizations of matrix multiplication.
  • _cross_score is made to call this new method to improve speeds of cross_validation. The original score method is preserved for use in ORES.
  • New tests added for score and score_many as well as a test for probability classifier in test_sklearn to improve coverage.

@codecov
Copy link

codecov bot commented Dec 27, 2017

Codecov Report

Merging #388 into master will decrease coverage by 0.01%.
The diff coverage is 85.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #388      +/-   ##
==========================================
- Coverage   86.73%   86.71%   -0.02%     
==========================================
  Files         257      257              
  Lines        8231     8335     +104     
==========================================
+ Hits         7139     7228      +89     
- Misses       1092     1107      +15
Impacted Files Coverage Δ
revscoring/scoring/models/tests/test_sklearn.py 100% <100%> (+2.85%) ⬆️
revscoring/scoring/models/model.py 87.27% <100%> (+0.23%) ⬆️
revscoring/scoring/models/sklearn.py 83.45% <72.22%> (-9.33%) ⬇️
revscoring/languages/tests/test_finnish.py 100% <0%> (ø) ⬆️
revscoring/languages/romanian.py 80% <0%> (+4%) ⬆️
revscoring/languages/finnish.py 89.47% <0%> (+4.85%) ⬆️
revscoring/languages/norwegian.py 84% <0%> (+5.05%) ⬆️
revscoring/languages/arabic.py 84% <0%> (+5.05%) ⬆️
revscoring/languages/hungarian.py 84% <0%> (+5.05%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4d6eeb...f41b5e2. Read the comment docs.

@halfak
Copy link
Member

halfak commented Dec 28, 2017

How much faster is this? It's a lot of complication to add. I think we should quantify the benefit.

@codez266
Copy link
Contributor Author

codez266 commented Dec 28, 2017

RF classifier parameters used for benchmarking:
num_samples - 10k
n_estimators - 400
max_depth - 4
features - word2vec( 300 dimensions )
binary classification with True/False

Time for current cv_train - 414.65737676620483
Time for new fast cv_train - 81.77100491523743

Performance gain is almost 5 times.

@halfak halfak merged commit 085ba90 into master Dec 28, 2017
3 of 4 checks passed
@halfak halfak deleted the fast_score branch Dec 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants