Skip to content

Conversation

@DavisVaughan
Copy link
Member

Closes #93. Below are 3 examples that were incorrect before, but are now working correctly. The main issue had to do with how duplicate probability values were handled.

This also affected how the PR AUC values were computed for each example. They are now computed correctly for each example. For this, its important to have the 1 value rather than an NA for the first precision value. Without it, the AUC value for all of these won't actually be 1, as it should be, because the curve won't cover the full precision/recall range.


No duplicates (this was correct before and after the change)

truth estimate tp fp recall precision used
a .9 1 0 .5 = 1/2 1 = 1/(1+0) *
a .8 2 0 1 = 2/2 1 = 2/(2+0) *
b .4 2 1 1 = 2/2 .67 = 2/(2+1) *
b .3 2 2 1 = 2/2 .5 = 2/(2+2) *

After:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .8, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 5 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0       1    
#> 2        0.9    0.5     1    
#> 3        0.8    1       1    
#> 4        0.4    1       0.667
#> 5        0.3    1       0.5

Before:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .8, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 5 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0      NA    
#> 2        0.9    0.5     1    
#> 3        0.8    1       1    
#> 4        0.4    1       0.667
#> 5        0.3    1       0.5

Duplicates at the beginning

truth estimate tp fp recall precision used
a .9 1 0 .5 = 1/2 1 = 1/(1+0) -
a .9 2 0 1 = 2/2 1 = 2/(2+0) *
b .4 2 1 1 = 2/2 .67 = 2/(2+1) *
b .3 2 2 1 = 2/2 .5 = 2/(2+2) *

After:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .9, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf        0     1    
#> 2        0.9      1     1    
#> 3        0.4      1     0.667
#> 4        0.3      1     0.5

Before:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .9, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0      NA    
#> 2        0.9    0.5     1    
#> 3        0.4    1       0.667
#> 4        0.3    1       0.5

Duplicates at the end

truth estimate tp fp recall precision used
a .9 1 0 .5 = 1/2 1 = 1/(1+0) *
a .8 2 0 1 = 2/2 1 = 2/(2+0) *
b .3 2 1 1 = 2/2 .67 = 2/(2+1) -
b .3 2 2 1 = 2/2 .5 = 2/(2+2) *

After:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .8, .3, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0         1  
#> 2        0.9    0.5       1  
#> 3        0.8    1         1  
#> 4        0.3    1         0.5

Before:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "a", "b", "b"))
estimate <- c(.9, .8, .3, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf      0      NA    
#> 2        0.9    0.5     1    
#> 3        0.8    1       1    
#> 4        0.3    1       0.667

- Initialize with first case
- Reverse increment / append order
…ned (`tp / tp+fp` when tp=0, fp=0). This ensures the graph always starts in the correct location.
@DavisVaughan DavisVaughan changed the title Issue 93 Correct the pr_curve() implementation when duplicates exist Mar 14, 2019
@codecov-io
Copy link

codecov-io commented Mar 14, 2019

Codecov Report

Merging #95 into master will decrease coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #95      +/-   ##
==========================================
- Coverage   96.25%   96.14%   -0.11%     
==========================================
  Files          44       44              
  Lines        2294     2311      +17     
==========================================
+ Hits         2208     2222      +14     
- Misses         86       89       +3
Impacted Files Coverage Δ
src/pr-curve.cpp 100% <100%> (ø) ⬆️
R/prob-pr_curve.R 98.46% <100%> (+0.02%) ⬆️
R/selectors.R 80% <0%> (-2.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6960823...55a0faa. Read the comment docs.

@DavisVaughan
Copy link
Member Author

This known (pathological) example demonstrates fixing a bug with order_cpp() not returning the correct order when there are duplicate class probabilities.

truth estimate tp fp recall precision used
a .9 1 0 .33 = 1/3 1 = 1/(1+0) -
b .9 1 1 .33 = 1/3 .5 = 1/(1+1) *
a .4 2 1 .67 = 2/3 .67 = 2/(2+1) *
a .3 3 1 1 = 3/3 .75 = 3/(3+1) *

After:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "b", "a", "a"))
estimate <- c(.9, .9, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf    0         1    
#> 2        0.9  0.333     0.5  
#> 3        0.4  0.667     0.667
#> 4        0.3  1         0.75

Before:

suppressPackageStartupMessages(library(yardstick))
truth <- factor(c("a", "b", "a", "a"))
estimate <- c(.9, .9, .4, .3)
df <- data.frame(truth, estimate)
pr_curve(df, truth, estimate)
#> # A tibble: 4 x 3
#>   .threshold recall precision
#>        <dbl>  <dbl>     <dbl>
#> 1      Inf     0           NA
#> 2        0.9   0.25         1
#> 3        0.4   0.75         1
#> 4        0.3   1            1

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Mar 15, 2019

This PR now also implements average_precision() as an alternative to pr_auc() that does not have the issue of the ambiguity of dealing with the value of precision in the recall == 0 case. Closes #96

@DavisVaughan
Copy link
Member Author

@dariyasydykova at this point, this PR fixes all of the issues with pr_curve() except for the original value for precision at the point where recall == 0 and fp == 0. I'm not sure what to do about that point.

It doesn't look like your tweet got any solid advice either:
https://twitter.com/dariyasydykova/status/1106617500811309056

@fmannhardt
Copy link

Hi, I am using yardstick for evaluating ML models and it seems I got hit by this issue in pr_curve calculation. Any plan when this would get merged and released?

@DavisVaughan
Copy link
Member Author

I'll likely make a pass at all the outstanding yardstick issues / PRs after useR (July 9-12), and then release a new version. Things are pretty packed up until then.

@fmannhardt
Copy link

Thanks. Will use this branch until then.

@DavisVaughan DavisVaughan merged commit 019ad81 into master Jul 30, 2019
@DavisVaughan DavisVaughan deleted the issue-93 branch July 30, 2019 13:01
@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pr_curve() doesn't seem to generate correct precision and recall values

4 participants