Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup predict.dummyVars #727

Merged
merged 1 commit into from Aug 31, 2017
Merged

Speedup predict.dummyVars #727

merged 1 commit into from Aug 31, 2017

Conversation

@khotilov
Copy link
Contributor

@khotilov khotilov commented Aug 30, 2017

predict.dummyVars was very slow with large datasets with many factors due to the colnames(x) <- assignments within nested loops. Here's an illustrative example:

#create a local version of predict.dummyVars that is fixed to be the same as in this PR
predict_dummyVars <- caret:::predict.dummyVars
fix(predict_dummyVars)

n <- 400000
p <- 100
x <- data.frame(rep(list(x=rep(c('a','b'), n/2)), p/4)) # nominal
x <- cbind(x, data.frame(rep(list(x=rep(c('c','d','e','f'), n/4)), p/4)))
x <- cbind(x, data.frame(rep(list(x=rep(1, n)), p/2)))  # some numeric
colnames(x) <- paste0('x', 1:p)
dim(x)

dumm <- dummyVars("~ .", data = x, fullRank = T)

# before
ptm <- proc.time()
x1 <- predict(dumm, x)
print(proc.time() - ptm)

# after
ptm <- proc.time()
x2 <- predict_dummyVars(dumm, x)
print(proc.time() - ptm)

all.equal(x1,x2)

that results in:

> # before
   user  system elapsed 
  33.60   44.07   77.86 
> # after
   user  system elapsed 
   2.05    0.78    2.81 
> all.equal(x1,x2)
[1] TRUE
@codecov-io
Copy link

@codecov-io codecov-io commented Aug 30, 2017

Codecov Report

Merging #727 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #727      +/-   ##
==========================================
- Coverage   16.97%   16.97%   -0.01%     
==========================================
  Files          90       90              
  Lines       13183    13185       +2     
==========================================
  Hits         2238     2238              
- Misses      10945    10947       +2
Impacted Files Coverage Δ
R/dummyVar.R 0% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f27c68b...69269f8. Read the comment docs.

@topepo
Copy link
Owner

@topepo topepo commented Aug 31, 2017

Looks good. Thanks

@topepo topepo merged commit a01cabc into topepo:master Aug 31, 2017
2 of 3 checks passed
2 of 3 checks passed
codecov/project 16.97% (-0.01%) compared to f27c68b
Details
codecov/patch Coverage not affected when comparing f27c68b...69269f8
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.