Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

case weights #96

Open
topepo opened this issue Jul 27, 2021 · 4 comments
Open

case weights #96

topepo opened this issue Jul 27, 2021 · 4 comments

Comments

@topepo
Copy link

topepo commented Jul 27, 2021

It would be great to have the calculations for the curve take into account cases weights (i.e. a non-negative, numeric vector of values the same length as the other data objects).

@xrobin
Copy link
Owner

xrobin commented Aug 2, 2021

I agree this would be cool. Do you have a reference on how this is implemented in the context of ROC curves?

@topepo
Copy link
Author

topepo commented Sep 13, 2021

The curve would be based on the weighted versions of sensitivity and specificity.

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

data(pathology)
str(pathology)
#> 'data.frame':    344 obs. of  2 variables:
#>  $ pathology: Factor w/ 2 levels "abnorm","norm": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ scan     : Factor w/ 2 levels "abnorm","norm": 1 1 1 1 1 1 1 1 1 1 ...

set.seed(1)
pathology$weights <- runif(nrow(pathology))

event <- "abnorm"

unweighted <- 
  sum(pathology$pathology == event & pathology$scan == event) /
  sum(pathology$pathology == event)
unweighted
#> [1] 0.8953488

# via yardstick:
sensitivity(pathology, pathology, scan)
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 sens    binary         0.895

weighted <- 
  sum( pathology$weights * (pathology$pathology == event & pathology$scan == event) ) /
  sum( pathology$weights * (pathology$pathology == event) )

weighted
#> [1] 0.9013333

Created on 2021-09-13 by the reprex package (v2.0.0)

@DavisVaughan has the start of changes that we will be making to yardstick here

@xrobin
Copy link
Owner

xrobin commented Sep 15, 2021

I think I see. The easiest would be to directly update the roc.utils.perfs.all.fast to calculate TP/FP taking the weights into account:

  tp <- cumsum(response.sorted==1 * weights.sorted)
  fp <- cumsum(response.sorted==0 * weights.sorted)

A few thought on the implementation:

  • The number of cases and controls might become fractional because of this change. I'm not sure what side-effects this could have.
  • There's a C++ algorithm that will need to be updated too. It's a loop so it should be quite straightforward. Alternatively it could be a good time to get rid of alternative algorithms and simplify the code.
  • It will be necessary to modify the roc objects and store the weights there, so that bootstrap functions re-use the weights appropriately.
  • At this point I'm not sure how much changes will be required in those bootstrapping functions. They've needed major refactoring for a long time but I never found the time to do so.
  • Issue Refactor all-in-one plot calls #70 will get in the way. There's quite a lot of redundancy as pROC has several functions that build ROC curves under the hood (ie auc, ci, etc), which will have to be updated.

@aminadibi
Copy link

I'd love this feature too. WeightedROC package does it, but that package doesn't produce CIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants