Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error message for missing assessment set rows at prediction time #307

Merged
merged 4 commits into from Oct 26, 2020

Conversation

juliasilge
Copy link
Member

Closes #241

This PR adds an explicit check in predict_model() for whether all the original assessment rows are available, with a more direct error message:

library(tidymodels)

data(penguins)
set.seed(123)
peng_split <- initial_split(penguins, 
                            strata = species)

peng_train <- training(peng_split)
peng_test  <- testing(peng_split)

set.seed(345)
folds <- vfold_cv(peng_train, v = 6)
folds
#> #  6-fold cross-validation 
#> # A tibble: 6 x 2
#>   splits           id   
#>   <list>           <chr>
#> 1 <split [215/43]> Fold1
#> 2 <split [215/43]> Fold2
#> 3 <split [215/43]> Fold3
#> 4 <split [215/43]> Fold4
#> 5 <split [215/43]> Fold5
#> 6 <split [215/43]> Fold6

peng_rec <- 
  recipe(species ~ ., data=peng_train) %>%
  update_role(new_role = "id", island) %>%
  step_naomit(all_predictors(), all_outcomes()) %>%
  step_dummy(all_nominal(), -species)

peng_mod <- 
  multinom_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

peng_wflow <- 
  workflow() %>%
  add_model(peng_mod) %>%
  add_recipe(peng_rec)

peng_fit <- tune_grid(peng_wflow, resamples = folds)
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, pack, unpack
#> Loaded glmnet 4.0-2
#> x Fold1: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold2: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold3: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold4: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold5: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold6: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> Warning: All models failed. See the `.notes` column.

peng_fit$.notes[[1]][[1]]
#> [1] "preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set rows are not available at prediction time. Did your preprocessing steps filter or remove rows?"

Created on 2020-10-21 by the reprex package (v0.3.0.9001)

This has been a source of confusion for users who don't understand why their tuning is failing.

Thoughts on the error message? 馃

@juliasilge
Copy link
Member Author

We may also want to add something to the documentation of row filtering recipe steps.

@DavisVaughan
Copy link
Member

I like the message!

What should the user have done? i.e. in most of these cases should the user have set skip = TRUE on this kind of step to only apply it before the model is fit? If so, it might be worth mentioning something like: "If a recipe was used, consider using skip = TRUE on steps that remove rows to avoid calling them on the assessment set."

@juliasilge
Copy link
Member Author

I changed this to check whether the workflow has a recipe and specialize the error message:

library(tidymodels)

data(penguins)
set.seed(123)
peng_split <- initial_split(penguins, 
                            strata = species)

peng_train <- training(peng_split)
peng_test  <- testing(peng_split)

set.seed(345)
folds <- vfold_cv(peng_train, v = 6)
folds
#> #  6-fold cross-validation 
#> # A tibble: 6 x 2
#>   splits           id   
#>   <list>           <chr>
#> 1 <split [215/43]> Fold1
#> 2 <split [215/43]> Fold2
#> 3 <split [215/43]> Fold3
#> 4 <split [215/43]> Fold4
#> 5 <split [215/43]> Fold5
#> 6 <split [215/43]> Fold6

peng_rec <- 
  recipe(species ~ ., data=peng_train) %>%
  update_role(new_role = "id", island) %>%
  step_naomit(all_predictors(), all_outcomes()) %>%
  step_dummy(all_nominal(), -species)

peng_mod <- 
  multinom_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

peng_wflow <- 
  workflow() %>%
  add_model(peng_mod) %>%
  add_recipe(peng_rec)

peng_fit <- tune_grid(peng_wflow, resamples = folds)
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, pack, unpack
#> Loaded glmnet 4.0-2
#> x Fold1: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold2: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold3: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold4: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold5: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> x Fold6: preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set ro...
#> Warning: All models failed. See the `.notes` column.

peng_fit$.notes[[1]][[1]]
#> [1] "preprocessor 1/1, model 1/1 (predictions): Error: Some assessment set rows are not available at prediction time. Consider using `skip = TRUE` on any recipe steps that remove rows to avoid calling them on the assessment set."

Created on 2020-10-21 by the reprex package (v0.3.0.9001)

I'm honestly not sure how this would happen with a formula preprocessor, but if it were to, somehow, the error message would be:

Error: Some assessment set rows are not available at prediction time. Did your preprocessing steps filter or remove rows?

R/grid_helpers.R Outdated Show resolved Hide resolved
@juliasilge juliasilge merged commit a8ef10e into master Oct 26, 2020
@juliasilge juliasilge deleted the error-row-mismatch branch October 26, 2020 21:08
@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn user about using step_naomit with tuning
2 participants