Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize threshold determination with algorithm=2 #44

Closed
xrobin opened this issue Mar 20, 2019 · 1 comment
Closed

Optimize threshold determination with algorithm=2 #44

xrobin opened this issue Mar 20, 2019 · 1 comment
Labels

Comments

@xrobin
Copy link
Owner

@xrobin xrobin commented Mar 20, 2019

Too much time is spent in roc.utils.R:60 in roc.utils.perfs.all.fast:

dups.sesp <- duplicated(matrix(c(se, sp), ncol=2), MARGIN=1)

There must be a better way to do it. Here is some benchmarking code:

n <- 1e6
dat <- data.frame(x = rnorm(n), y = sample(c(0:1), size = n, replace = TRUE))

library(profvis)
profvis({
	for (i in 1:10) {
		pROC::roc(dat$y, dat$x, algorithm = 2)
	}
	
})

@xrobin xrobin added the speed label Mar 20, 2019
xrobin added a commit that referenced this issue Mar 31, 2019
@xrobin
Copy link
Owner Author

@xrobin xrobin commented Mar 31, 2019

It turns out duplicated.matrix is slow. It can be replaced by two calls to duplicated.vector and a vector &.

Using the benchmarks from the cutpointr vignette, we are down to nearly the speed of ROCR, despite some remaining inefficient calls to sort, unique, duplicated and %in%.

Rplot
Rplot01

@xrobin xrobin closed this Mar 31, 2019
xrobin added a commit that referenced this issue Mar 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant