Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize threshold determination with algorithm=2 #44

Closed
xrobin opened this issue Mar 20, 2019 · 1 comment

Comments

Projects
None yet
1 participant
@xrobin
Copy link
Owner

commented Mar 20, 2019

Too much time is spent in roc.utils.R:60 in roc.utils.perfs.all.fast:

dups.sesp <- duplicated(matrix(c(se, sp), ncol=2), MARGIN=1)

There must be a better way to do it. Here is some benchmarking code:

n <- 1e6
dat <- data.frame(x = rnorm(n), y = sample(c(0:1), size = n, replace = TRUE))

library(profvis)
profvis({
	for (i in 1:10) {
		pROC::roc(dat$y, dat$x, algorithm = 2)
	}
	
})

@xrobin xrobin added the speed label Mar 20, 2019

xrobin added a commit that referenced this issue Mar 31, 2019

@xrobin

This comment has been minimized.

Copy link
Owner Author

commented Mar 31, 2019

It turns out duplicated.matrix is slow. It can be replaced by two calls to duplicated.vector and a vector &.

Using the benchmarks from the cutpointr vignette, we are down to nearly the speed of ROCR, despite some remaining inefficient calls to sort, unique, duplicated and %in%.

Rplot
Rplot01

@xrobin xrobin closed this Mar 31, 2019

xrobin added a commit that referenced this issue Mar 31, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.