Adjust optimism bootstrap for recipes #682

asardaes · 2017-07-05T19:24:10Z

Hello,

I adjusted the optimism bootstrap code for recipes. I put all the code in a separate function this time. I suppose my use of holdout_rec will have to be adjusted once you decide what to do when a recipe fails?

I also changed the nominal workflow because there was some code left from other things I was trying when I implemented the optimism bootstrap.

I tested the following cases sequentially and in parallel, hopefully I didn't break anything:

library(caret)
library(recipes)

data("iris")
TrainData <- iris[,1:4]
TrainClasses <- iris[,5]

iris_rec <- recipe(Species ~ ., data = iris) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())

## Classification

knnFit1 <- train(TrainData, TrainClasses,
                 method = "knn",
                 preProcess = c("center", "scale"),
                 tuneLength = 2L,
                 trControl = trainControl(method = "optimism_boot",
                                          number = 5L))

knnFit2 <- train(iris_rec,
                 data = iris,
                 method = "knn",
                 tuneLength = 2L,
                 trControl = trainControl(method = "optimism_boot",
                                          number = 5L))

## Regression

library(mlbench)
library(rpart)
data(BostonHousing)
boston_rec <- recipe(medv ~ ., data = BostonHousing)

lmFit1 <- train(medv ~ .,
                data = BostonHousing,
                method = "lm",
                trControl = trainControl(method = "optimism_boot",
                                         number = 5L))

lmFit2 <- train(boston_rec,
                data = BostonHousing,
                method = "lm",
                trControl = trainControl(method = "optimism_boot",
                                         number = 5L))

## With a custom metric

madSummary <- function(data,
                       lev = NULL,
                       model = NULL) {
  out <- mad(data$obs - data$pred,
             na.rm = TRUE)
  names(out) <- "MAD"
  out
}

robustControl <- trainControl(summaryFunction = madSummary,
                              method = "optimism_boot",
                              number = 5L)
marsGrid <- expand.grid(degree = 1, nprune = (1:2) * 2)

earthFit1 <- train(medv ~ .,
                   data = BostonHousing,
                   method = "earth",
                   tuneGrid = marsGrid,
                   metric = "MAD",
                   maximize = FALSE,
                   trControl = robustControl)

earthFit2 <- train(boston_rec,
                   data = BostonHousing,
                   method = "earth",
                   tuneGrid = marsGrid,
                   metric = "MAD",
                   maximize = FALSE,
                   trControl = robustControl)

## Classification with class probabilities

dat <- twoClassSim(100)
rec <- recipe(Class ~ ., data = dat) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())

opt_acc_probs <- train(Class ~ ., data = dat, 
                       method = "svmRadial",
                       preProc = c("center", "scale"),
                       tuneLength = 3L,
                       trControl=trainControl(method = "optimism_boot", 
                                              classProbs = TRUE,
                                              number = 5L))

opt_acc_probs2 <- train(rec,
                        data = dat,
                        method = "svmRadial",
                        tuneLength = 3L,
                        trControl=trainControl(method = "optimism_boot", 
                                               classProbs = TRUE,
                                               number = 5L))

opt_roc <- train(Class ~ ., data = dat, 
                 method = "svmRadial",
                 preProc = c("center", "scale"),
                 tuneLength = 3L,
                 metric = "ROC",
                 trControl=trainControl(method = "optimism_boot", 
                                        classProbs = TRUE,
                                        number = 5L,
                                        summaryFunction = twoClassSummary))

opt_roc2 <- train(rec,
                  data = dat,
                  method = "svmRadial",
                  tuneLength = 3L,
                  metric = "ROC",
                  trControl=trainControl(method = "optimism_boot", 
                                         classProbs = TRUE,
                                         number = 5L,
                                         summaryFunction = twoClassSummary))

coveralls · 2017-07-05T19:51:27Z

Coverage increased (+0.03%) to 17.993% when pulling ceeb839 on asardaes:recipe_integration into c3b2aa6 on topepo:recipe_integration.

topepo · 2017-07-07T13:25:03Z

Thanks! That's really clean.

topepo · 2017-07-11T22:15:56Z

One more thing... can you apply the simplified code to nominalTrainWorkflow too?

asardaes · 2017-07-12T21:28:55Z

Yeah I think that should be fairly straightforward, I just have to check if it doesn't break reproducibility after changing the order. I don't think there are any models that have randomness when doing predictions, are there?

topepo · 2017-07-13T19:31:11Z

I don't think there are any models that have randomness when doing predictions, are there?

I can't think of any. I think that the regression tests would show those.

Adjust optimism bootstrap for recipes

ceeb839

topepo merged commit 6fd5f9c into topepo:recipe_integration Jul 7, 2017

asardaes deleted the recipe_integration branch July 7, 2017 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust optimism bootstrap for recipes #682

Adjust optimism bootstrap for recipes #682

asardaes commented Jul 5, 2017

coveralls commented Jul 5, 2017 •

edited

topepo commented Jul 7, 2017

topepo commented Jul 11, 2017

asardaes commented Jul 12, 2017

topepo commented Jul 13, 2017

Adjust optimism bootstrap for recipes #682

Adjust optimism bootstrap for recipes #682

Conversation

asardaes commented Jul 5, 2017

coveralls commented Jul 5, 2017 • edited

topepo commented Jul 7, 2017

topepo commented Jul 11, 2017

asardaes commented Jul 12, 2017

topepo commented Jul 13, 2017

coveralls commented Jul 5, 2017 •

edited