Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust optimism bootstrap for recipes #682

Merged
merged 1 commit into from Jul 7, 2017

Conversation

asardaes
Copy link
Contributor

@asardaes asardaes commented Jul 5, 2017

Hello,

I adjusted the optimism bootstrap code for recipes. I put all the code in a separate function this time. I suppose my use of holdout_rec will have to be adjusted once you decide what to do when a recipe fails?

I also changed the nominal workflow because there was some code left from other things I was trying when I implemented the optimism bootstrap.

I tested the following cases sequentially and in parallel, hopefully I didn't break anything:

library(caret)
library(recipes)

data("iris")
TrainData <- iris[,1:4]
TrainClasses <- iris[,5]

iris_rec <- recipe(Species ~ ., data = iris) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())

## Classification

knnFit1 <- train(TrainData, TrainClasses,
                 method = "knn",
                 preProcess = c("center", "scale"),
                 tuneLength = 2L,
                 trControl = trainControl(method = "optimism_boot",
                                          number = 5L))

knnFit2 <- train(iris_rec,
                 data = iris,
                 method = "knn",
                 tuneLength = 2L,
                 trControl = trainControl(method = "optimism_boot",
                                          number = 5L))

## Regression

library(mlbench)
library(rpart)
data(BostonHousing)
boston_rec <- recipe(medv ~ ., data = BostonHousing)

lmFit1 <- train(medv ~ .,
                data = BostonHousing,
                method = "lm",
                trControl = trainControl(method = "optimism_boot",
                                         number = 5L))

lmFit2 <- train(boston_rec,
                data = BostonHousing,
                method = "lm",
                trControl = trainControl(method = "optimism_boot",
                                         number = 5L))

## With a custom metric

madSummary <- function(data,
                       lev = NULL,
                       model = NULL) {
  out <- mad(data$obs - data$pred,
             na.rm = TRUE)
  names(out) <- "MAD"
  out
}

robustControl <- trainControl(summaryFunction = madSummary,
                              method = "optimism_boot",
                              number = 5L)
marsGrid <- expand.grid(degree = 1, nprune = (1:2) * 2)

earthFit1 <- train(medv ~ .,
                   data = BostonHousing,
                   method = "earth",
                   tuneGrid = marsGrid,
                   metric = "MAD",
                   maximize = FALSE,
                   trControl = robustControl)

earthFit2 <- train(boston_rec,
                   data = BostonHousing,
                   method = "earth",
                   tuneGrid = marsGrid,
                   metric = "MAD",
                   maximize = FALSE,
                   trControl = robustControl)

## Classification with class probabilities

dat <- twoClassSim(100)
rec <- recipe(Class ~ ., data = dat) %>%
  step_center(all_predictors()) %>%
  step_scale(all_predictors())

opt_acc_probs <- train(Class ~ ., data = dat, 
                       method = "svmRadial",
                       preProc = c("center", "scale"),
                       tuneLength = 3L,
                       trControl=trainControl(method = "optimism_boot", 
                                              classProbs = TRUE,
                                              number = 5L))

opt_acc_probs2 <- train(rec,
                        data = dat,
                        method = "svmRadial",
                        tuneLength = 3L,
                        trControl=trainControl(method = "optimism_boot", 
                                               classProbs = TRUE,
                                               number = 5L))

opt_roc <- train(Class ~ ., data = dat, 
                 method = "svmRadial",
                 preProc = c("center", "scale"),
                 tuneLength = 3L,
                 metric = "ROC",
                 trControl=trainControl(method = "optimism_boot", 
                                        classProbs = TRUE,
                                        number = 5L,
                                        summaryFunction = twoClassSummary))

opt_roc2 <- train(rec,
                  data = dat,
                  method = "svmRadial",
                  tuneLength = 3L,
                  metric = "ROC",
                  trControl=trainControl(method = "optimism_boot", 
                                         classProbs = TRUE,
                                         number = 5L,
                                         summaryFunction = twoClassSummary))

@coveralls
Copy link

coveralls commented Jul 5, 2017

Coverage Status

Coverage increased (+0.03%) to 17.993% when pulling ceeb839 on asardaes:recipe_integration into c3b2aa6 on topepo:recipe_integration.

@topepo
Copy link
Owner

topepo commented Jul 7, 2017

Thanks! That's really clean.

@topepo topepo merged commit 6fd5f9c into topepo:recipe_integration Jul 7, 2017
@asardaes asardaes deleted the recipe_integration branch July 7, 2017 13:26
@topepo
Copy link
Owner

topepo commented Jul 11, 2017

One more thing... can you apply the simplified code to nominalTrainWorkflow too?

@asardaes
Copy link
Contributor Author

Yeah I think that should be fairly straightforward, I just have to check if it doesn't break reproducibility after changing the order. I don't think there are any models that have randomness when doing predictions, are there?

@topepo
Copy link
Owner

topepo commented Jul 13, 2017

I don't think there are any models that have randomness when doing predictions, are there?

I can't think of any. I think that the regression tests would show those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants