Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coords is too slow with many thresholds #52

Closed
xrobin opened this issue Apr 28, 2019 · 3 comments
Closed

coords is too slow with many thresholds #52

xrobin opened this issue Apr 28, 2019 · 3 comments
Labels
speed The issue describes a performance problem that needs to be optimized

Comments

@xrobin
Copy link
Owner

xrobin commented Apr 28, 2019

response <- rbinom(1E5, 1, .5)
predictor <- rnorm(1E5)
r <- roc(response, predictor)
system.time(coords(r, "a"))
utilisateur     système      écoulé 
     47.791       0.088      47.867 

I would expect it to complete more or less instantly.

@xrobin xrobin added the speed The issue describes a performance problem that needs to be optimized label Apr 28, 2019
xrobin added a commit that referenced this issue May 4, 2019
@xrobin
Copy link
Owner Author

xrobin commented May 4, 2019

The function is mostly vectorized now, except the part that matches the user-supplied thresholds to ours. One difficulty is that we can't rely on the threshold values themselves and must look at the predictors instead. This step still relies on a call to sapply and is basically as slow as before.

@xrobin
Copy link
Owner Author

xrobin commented May 4, 2019

The thresholds are now determined with a for and while loop in something like O(n). Ugly but very efficient.

> system.time(coords(r, "a"))
utilisateur     système      écoulé 
      0.236       0.020       0.257 

Keeping the issue open to ponder whether it is worth updating the function with input="se" and input="sp". Those still call match repeatedly, however it is unclear whether that will be a problem as we never go along all of them systematically like with x="all".

@xrobin
Copy link
Owner Author

xrobin commented May 5, 2019

Skipping the step through numeric thresholds, and skipping the calculation of extra coordinates when only the existing se/sp/thr are requested, saves even more time.

> system.time(coords(r, "a"))
utilisateur     système      écoulé 
      0.003       0.000       0.003 

coords is now fairly competitive, and takes < 5% the time it takes to create a very large ROC curve.

> response <- rbinom(1E7, 1, .5)
> predictor <- rnorm(1E7)
> system.time(r <- roc(response, predictor))
Setting levels: control = 0, case = 1
Setting direction: controls < cases
utilisateur     système      écoulé 
     11.403       2.146      13.543 
> system.time(coords(r, "a"))
utilisateur     système      écoulé 
      0.328       0.216       0.544 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
speed The issue describes a performance problem that needs to be optimized
Projects
None yet
Development

No branches or pull requests

1 participant