Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coords is too slow with many thresholds #52

Closed
xrobin opened this issue Apr 28, 2019 · 3 comments

Comments

Projects
None yet
1 participant
@xrobin
Copy link
Owner

commented Apr 28, 2019

response <- rbinom(1E5, 1, .5)
predictor <- rnorm(1E5)
r <- roc(response, predictor)
system.time(coords(r, "a"))
utilisateur     système      écoulé 
     47.791       0.088      47.867 

I would expect it to complete more or less instantly.

@xrobin xrobin added the speed label Apr 28, 2019

xrobin added a commit that referenced this issue May 4, 2019

@xrobin

This comment has been minimized.

Copy link
Owner Author

commented May 4, 2019

The function is mostly vectorized now, except the part that matches the user-supplied thresholds to ours. One difficulty is that we can't rely on the threshold values themselves and must look at the predictors instead. This step still relies on a call to sapply and is basically as slow as before.

xrobin added a commit that referenced this issue May 4, 2019

xrobin added a commit that referenced this issue May 4, 2019

xrobin added a commit that referenced this issue May 4, 2019

@xrobin

This comment has been minimized.

Copy link
Owner Author

commented May 4, 2019

The thresholds are now determined with a for and while loop in something like O(n). Ugly but very efficient.

> system.time(coords(r, "a"))
utilisateur     système      écoulé 
      0.236       0.020       0.257 

Keeping the issue open to ponder whether it is worth updating the function with input="se" and input="sp". Those still call match repeatedly, however it is unclear whether that will be a problem as we never go along all of them systematically like with x="all".

@xrobin

This comment has been minimized.

Copy link
Owner Author

commented May 5, 2019

Skipping the step through numeric thresholds, and skipping the calculation of extra coordinates when only the existing se/sp/thr are requested, saves even more time.

> system.time(coords(r, "a"))
utilisateur     système      écoulé 
      0.003       0.000       0.003 

coords is now fairly competitive, and takes < 5% the time it takes to create a very large ROC curve.

> response <- rbinom(1E7, 1, .5)
> predictor <- rnorm(1E7)
> system.time(r <- roc(response, predictor))
Setting levels: control = 0, case = 1
Setting direction: controls < cases
utilisateur     système      écoulé 
     11.403       2.146      13.543 
> system.time(coords(r, "a"))
utilisateur     système      écoulé 
      0.328       0.216       0.544 

xrobin added a commit that referenced this issue May 5, 2019

@xrobin xrobin closed this May 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.