Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"rf" Error when using only one PC from PCA #1181

Closed
5 tasks done
baderstine opened this issue Nov 17, 2020 · 1 comment
Closed
5 tasks done

"rf" Error when using only one PC from PCA #1181

baderstine opened this issue Nov 17, 2020 · 1 comment

Comments

@baderstine
Copy link

If you are filing a bug, make sure these boxes are checked before submitting your issue— thank you!

  • Start a new R session
  • Install the latest version of caret: update.packages(oldPkgs="caret", ask=FALSE)
  • Write a minimal reproducible example
  • Do not use parallel processing in the code (unless you are certain that the issue is about parallel processing).
  • run sessionInfo()

Minimal, reproducible example:

library(caret)
dat = iris
dat$Species2Class = factor(dat$Species == "setosa", levels = c("TRUE", "FALSE"), labels = c("SETOSA", "OTHER"))

# constants used for random seeds
n = 2
r = 5

# set seeds to use for model reproducibility with parallel processing
set.seed(123)
seeds <- vector(mode = "list", length = (n*r)+1)

# for the cross-validation models
for(i in 1:((n*r))) seeds[[i]] <- sample.int(10000,5000)

# for the final model
seeds[[((n*r)+1)]] <- sample.int(1000,1)

fitControl <- trainControl(
  seeds = seeds,
  method = "repeatedcv",
  number = n,
  repeats = r,
  savePredictions = 'final',
  classProbs = T,
  preProcOptions = list(pcaComp = 1),
  summaryFunction = prSummary)

set.seed(1)
response = "Species2Class"
preds = names(dat)[1:4]
model.fmla = as.formula(paste0(response, " ~ ", paste0(preds, collapse="+")))

# run the model 
fit_model <- train(form = model.fmla, 
                   data = dat, 
                   method = "rf",
                   trControl = fitControl,
                   preProcess = c("pca"),
                   metric = "AUC")

# OUTPUT: 
# Error in { : task 1 failed - "subscript out of bounds"
# In addition: There were 50 or more warnings (use warnings() to see the first 50)

This happens when attempting to use only the first principal component. With 2 or more it works fine.

Session Info:

>sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-86    ggplot2_3.3.1   lattice_0.20-41

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6         pillar_1.4.4         compiler_3.6.3       gower_0.2.1          plyr_1.8.6          
 [6] iterators_1.0.12     class_7.3-17         tools_3.6.3          rpart_4.1-15         ipred_0.9-9         
[11] lubridate_1.7.9      lifecycle_0.2.0      tibble_3.0.1         nlme_3.1-148         gtable_0.3.0        
[16] pkgconfig_2.0.3      rlang_0.4.6          Matrix_1.2-18        foreach_1.5.0        rstudioapi_0.11     
[21] prodlim_2019.11.13   stringr_1.4.0        withr_2.2.0          dplyr_1.0.0          MLmetrics_1.1.1     
[26] pROC_1.16.2          generics_0.0.2       vctrs_0.3.1          recipes_0.1.12       stats4_3.6.3        
[31] grid_3.6.3           nnet_7.3-14          tidyselect_1.1.0     data.table_1.12.8    glue_1.4.1          
[36] R6_2.4.1             survival_3.2-3       lava_1.6.7           ROCR_1.0-11          reshape2_1.4.4      
[41] purrr_0.3.4          magrittr_1.5         ModelMetrics_1.2.2.2 scales_1.1.1         codetools_0.2-16    
[46] ellipsis_0.3.1       MASS_7.3-51.6        splines_3.6.3        randomForest_4.6-14  timeDate_3043.102   
[51] colorspace_1.4-1     stringi_1.4.6        munsell_0.5.0        crayon_1.3.4        
@topepo topepo added the bug label Feb 9, 2021
topepo added a commit that referenced this issue May 12, 2021
topepo added a commit that referenced this issue May 12, 2021
@topepo
Copy link
Owner

topepo commented May 12, 2021

Should work now

@topepo topepo closed this as completed May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants