Correct the `pr_curve()` implementation when duplicates exist #95

DavisVaughan · 2019-03-14T16:37:39Z

Closes #93. Below are 3 examples that were incorrect before, but are now working correctly. The main issue had to do with how duplicate probability values were handled.

This also affected how the PR AUC values were computed for each example. They are now computed correctly for each example. For this, its important to have the 1 value rather than an NA for the first precision value. Without it, the AUC value for all of these won't actually be 1, as it should be, because the curve won't cover the full precision/recall range.

No duplicates (this was correct before and after the change)

truth	estimate	tp	fp	recall	precision	used
a	.9	1	0	.5 = 1/2	1 = 1/(1+0)	*
a	.8	2	0	1 = 2/2	1 = 2/(2+0)	*
b	.4	2	1	1 = 2/2	.67 = 2/(2+1)	*
b	.3	2	2	1 = 2/2	.5 = 2/(2+2)	*

After:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .8, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 5 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0       1    
#> 2        0.9    0.5     1    
#> 3        0.8    1       1    
#> 4        0.4    1       0.667
#> 5        0.3    1       0.5

Before:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .8, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 5 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0      NA    
#> 2        0.9    0.5     1    
#> 3        0.8    1       1    
#> 4        0.4    1       0.667
#> 5        0.3    1       0.5

Duplicates at the beginning

truth	estimate	tp	fp	recall	precision	used
a	.9	1	0	.5 = 1/2	1 = 1/(1+0)	-
a	.9	2	0	1 = 2/2	1 = 2/(2+0)	*
b	.4	2	1	1 = 2/2	.67 = 2/(2+1)	*
b	.3	2	2	1 = 2/2	.5 = 2/(2+2)	*

After:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .9, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf        0     1    
#> 2        0.9      1     1    
#> 3        0.4      1     0.667
#> 4        0.3      1     0.5

Before:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .9, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0      NA    
#> 2        0.9    0.5     1    
#> 3        0.4    1       0.667
#> 4        0.3    1       0.5

Duplicates at the end

truth	estimate	tp	fp	recall	precision	used
a	.9	1	0	.5 = 1/2	1 = 1/(1+0)	*
a	.8	2	0	1 = 2/2	1 = 2/(2+0)	*
b	.3	2	1	1 = 2/2	.67 = 2/(2+1)	-
b	.3	2	2	1 = 2/2	.5 = 2/(2+2)	*

After:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .8, .3, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0         1  
#> 2        0.9    0.5       1  
#> 3        0.8    1         1  
#> 4        0.3    1         0.5

Before:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .8, .3, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0      NA    
#> 2        0.9    0.5     1    
#> 3        0.8    1       1    
#> 4        0.3    1       0.667

- Initialize with first case - Reverse increment / append order

…ned (`tp / tp+fp` when tp=0, fp=0). This ensures the graph always starts in the correct location.

codecov-io · 2019-03-14T20:03:19Z

Codecov Report

Merging #95 into master will decrease coverage by 0.1%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #95      +/-   ##
==========================================
- Coverage   96.25%   96.14%   -0.11%     
==========================================
  Files          44       44              
  Lines        2294     2311      +17     
==========================================
+ Hits         2208     2222      +14     
- Misses         86       89       +3

Impacted Files	Coverage Δ
src/pr-curve.cpp	`100% <100%> (ø)`	⬆️
R/prob-pr_curve.R	`98.46% <100%> (+0.02%)`	⬆️
R/selectors.R	`80% <0%> (-2.06%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6960823...55a0faa. Read the comment docs.

…ation up front

…licate class probabilities exist

DavisVaughan · 2019-03-15T20:16:43Z

This known (pathological) example demonstrates fixing a bug with order_cpp() not returning the correct order when there are duplicate class probabilities.

truth	estimate	tp	fp	recall	precision	used
a	.9	1	0	.33 = 1/3	1 = 1/(1+0)	-
b	.9	1	1	.33 = 1/3	.5 = 1/(1+1)	*
a	.4	2	1	.67 = 2/3	.67 = 2/(2+1)	*
a	.3	3	1	1 = 3/3	.75 = 3/(3+1)	*

After:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "b", "a", "a"))
estimate <- c(.9, .9, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf    0         1    
#> 2        0.9  0.333     0.5  
#> 3        0.4  0.667     0.667
#> 4        0.3  1         0.75

Before:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "b", "a", "a"))
estimate <- c(.9, .9, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf     0           NA
#> 2        0.9   0.25         1
#> 3        0.4   0.75         1
#> 4        0.3   1            1

…with `NA` values.

DavisVaughan · 2019-03-15T22:04:32Z

This PR now also implements average_precision() as an alternative to pr_auc() that does not have the issue of the ambiguity of dealing with the value of precision in the recall == 0 case. Closes #96

DavisVaughan · 2019-03-25T17:01:57Z

@dariyasydykova at this point, this PR fixes all of the issues with pr_curve() except for the original value for precision at the point where recall == 0 and fp == 0. I'm not sure what to do about that point.

It doesn't look like your tweet got any solid advice either:
https://twitter.com/dariyasydykova/status/1106617500811309056

fmannhardt · 2019-05-30T12:11:38Z

Hi, I am using yardstick for evaluating ML models and it seems I got hit by this issue in pr_curve calculation. Any plan when this would get merged and released?

DavisVaughan · 2019-05-30T20:24:49Z

I'll likely make a pass at all the outstanding yardstick issues / PRs after useR (July 9-12), and then release a new version. Things are pretty packed up until then.

fmannhardt · 2019-05-30T20:50:19Z

Thanks. Will use this branch until then.

github-actions · 2021-03-06T00:39:41Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

DavisVaughan added 5 commits March 14, 2019 11:51

Correct PR Curve algorithm:

91f897c

- Initialize with first case - Reverse increment / append order

Push 1 as the first precision value, not NA, even though it is undefi…

fee0937

…ned (`tp / tp+fp` when tp=0, fp=0). This ensures the graph always starts in the correct location.

Rename unique_estimate -> thresholds

b1947cd

Fix tests

3f3587a

Add more known tests for pr_curve() and pr_auc()

619bea9

DavisVaughan mentioned this pull request Mar 14, 2019

pr_curve() doesn't seem to generate correct precision and recall values #93

Closed

DavisVaughan added 2 commits March 14, 2019 15:50

Improve upon defaults for autoplot.pr_df()

367f0f0

NEWS

55a0faa

DavisVaughan changed the title ~~Issue 93~~ Correct the pr_curve() implementation when duplicates exist Mar 14, 2019

DavisVaughan added 3 commits March 14, 2019 16:23

Further simplify algorithm's initial case, and add extensive document…

66389b3

…ation up front

Fix a bug with order_cpp() not providing the correct order when dup…

2d4bc59

…licate class probabilities exist

Add test for rare "same class prob, different prediction" error.

7a56ad9

DavisVaughan added 5 commits March 15, 2019 18:01

Warn when n_positive == 0. Initialize the recall/precision vectors …

550d0c1

…with `NA` values.

Allow the NA value to propagate through to pr_auc()

984809c

Add average_precision() as an alternative to pr_auc()

20d05b2

Test average_precision()

9fc316d

Document

120843b

NEWS

7037f28

Merge branch 'master' into issue-93

f5a6859

Merge branch 'master' into issue-93

296950b

DavisVaughan merged commit 019ad81 into master Jul 30, 2019

DavisVaughan deleted the issue-93 branch July 30, 2019 13:01

DavisVaughan mentioned this pull request Jul 30, 2019

Average Precision Score metric #96

Closed

DavisVaughan mentioned this pull request Jun 29, 2020

collect_metrics(), pr_auc() and MLmetrics::PRAUC() all give different results on the same data #166

Closed

github-actions bot locked and limited conversation to collaborators Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correct the `pr_curve()` implementation when duplicates exist #95

Correct the `pr_curve()` implementation when duplicates exist #95

Uh oh!

DavisVaughan commented Mar 14, 2019

Uh oh!

codecov-io commented Mar 14, 2019 •

edited

Loading

Uh oh!

DavisVaughan commented Mar 15, 2019

Uh oh!

DavisVaughan commented Mar 15, 2019 •

edited

Loading

Uh oh!

DavisVaughan commented Mar 25, 2019

Uh oh!

fmannhardt commented May 30, 2019

Uh oh!

DavisVaughan commented May 30, 2019

Uh oh!

fmannhardt commented May 30, 2019

Uh oh!

github-actions bot commented Mar 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Correct the pr_curve() implementation when duplicates exist #95

Correct the pr_curve() implementation when duplicates exist #95

Uh oh!

Conversation

DavisVaughan commented Mar 14, 2019

Uh oh!

codecov-io commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DavisVaughan commented Mar 15, 2019

Uh oh!

DavisVaughan commented Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DavisVaughan commented Mar 25, 2019

Uh oh!

fmannhardt commented May 30, 2019

Uh oh!

DavisVaughan commented May 30, 2019

Uh oh!

fmannhardt commented May 30, 2019

Uh oh!

github-actions bot commented Mar 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Correct the `pr_curve()` implementation when duplicates exist #95

Correct the `pr_curve()` implementation when duplicates exist #95

codecov-io commented Mar 14, 2019 •

edited

Loading

DavisVaughan commented Mar 15, 2019 •

edited

Loading