Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings for non-positive input to step_BoxCox #713

Merged
merged 8 commits into from Jun 1, 2021
Merged

Warnings for non-positive input to step_BoxCox #713

merged 8 commits into from Jun 1, 2021

Conversation

liamblake
Copy link
Contributor

Closes #705. Warnings should appear whenever prep or bake is called on the Box-Cox step with non-positive data.

@juliasilge
Copy link
Member

Thanks so much @liamblake! 🙌

After some edits, here are what warnings look like:

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
set.seed(123)
n <- 2e3
x1 <- rpois(n, lambda = 5) # has some zero vals
x2 <- rnorm(n)             # has some negative vals
x3 <- x1 + 10              # strictly positive


recipe(~ ., data = tibble(x1, x2, x3)) %>% 
  step_BoxCox(all_predictors()) %>%
  prep()
#> Warning: Non-positive values in selected variable.
#> Warning: Non-positive values in selected variable.
#> Warning: No Box-Cox transformation will be made to: 'x1', 'x2'
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>  predictor          3
#> 
#> Training data contained 2000 data points and no missing data.
#> 
#> Operations:
#> 
#> Box-Cox transformation on x3 [trained]


recipe(~ ., data = tibble(x1 = x1 + 10, x2 = x1 + 9, x3 = x1 + 8)) %>% 
  step_BoxCox(all_predictors()) %>%
  prep() %>%
  bake(new_data = tibble(x1, x2, x3))
#> Warning: Applying Box-Cox transformation to non-positive data in column x1
#> Warning: Applying Box-Cox transformation to non-positive data in column x2
#> # A tibble: 2,000 x 3
#>       x1       x2    x3
#>    <dbl>    <dbl> <dbl>
#>  1 1.30  NaN       1.24
#>  2 1.78  NaN       1.27
#>  3 1.30  NaN       1.24
#>  4 1.89  NaN       1.28
#>  5 1.99  NaN       1.28
#>  6 0.672   0.0392  1.21
#>  7 1.50   -2.43    1.25
#>  8 1.89    0.647   1.28
#>  9 1.50   -0.437   1.25
#> 10 1.50  NaN       1.25
#> # … with 1,990 more rows

Created on 2021-06-01 by the reprex package (v2.0.0)

I also added the warning for the other case where we don't do anything, < num_unique values:

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
n <- 20
set.seed(1)
ex_dat <- data.frame(x1 = exp(rnorm(n, mean = .1)),
                     x2 = 1/rnorm(n),
                     x3 = rep(1:2, each = n/2),
                     x4 = rexp(n))

recipe(~., data = ex_dat) %>%
  step_BoxCox(x1, x2, x3, x4) %>%
  prep()
#> Warning: Non-positive values in selected variable.
#> Warning: Fewer than `num_unique` values in selected variable.
#> Warning: No Box-Cox transformation will be made to: 'x2', 'x3'
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>  predictor          4
#> 
#> Training data contained 20 data points and no missing data.
#> 
#> Operations:
#> 
#> Box-Cox transformation on x1, x4 [trained]

Created on 2021-06-01 by the reprex package (v2.0.0)

@juliasilge
Copy link
Member

juliasilge commented Jun 1, 2021

Switched to backticks in warning messages:

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
set.seed(123)
n <- 2e3
x1 <- rpois(n, lambda = 5) # has some zero vals
x2 <- rnorm(n)             # has some negative vals
x3 <- x1 + 10              # strictly positive


recipe(~ ., data = tibble(x1, x2, x3)) %>% 
  step_BoxCox(all_predictors()) %>%
  prep()
#> Warning: Non-positive values in selected variable.
#> Warning: Non-positive values in selected variable.
#> Warning: No Box-Cox transformation will be made to: `x1`, `x2`
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>  predictor          3
#> 
#> Training data contained 2000 data points and no missing data.
#> 
#> Operations:
#> 
#> Box-Cox transformation on x3 [trained]


recipe(~ ., data = tibble(x1 = x1 + 10, x2 = x1 + 9, x3 = x1 + 8)) %>% 
  step_BoxCox(all_predictors()) %>%
  prep() %>%
  bake(new_data = tibble(x1, x2, x3))
#> Warning: Applying Box-Cox transformation to non-positive data in column `x1`
#> Warning: Applying Box-Cox transformation to non-positive data in column `x2`
#> # A tibble: 2,000 x 3
#>       x1       x2    x3
#>    <dbl>    <dbl> <dbl>
#>  1 1.30  NaN       1.24
#>  2 1.78  NaN       1.27
#>  3 1.30  NaN       1.24
#>  4 1.89  NaN       1.28
#>  5 1.99  NaN       1.28
#>  6 0.672   0.0392  1.21
#>  7 1.50   -2.43    1.25
#>  8 1.89    0.647   1.28
#>  9 1.50   -0.437   1.25
#> 10 1.50  NaN       1.25
#> # … with 1,990 more rows


n <- 20
set.seed(1)
ex_dat <- data.frame(x1 = exp(rnorm(n, mean = .1)),
                     x2 = 1/rnorm(n),
                     x3 = rep(1:2, each = n/2),
                     x4 = rexp(n))

recipe(~., data = ex_dat) %>%
  step_BoxCox(x1, x2, x3, x4) %>%
  prep()
#> Warning: Non-positive values in selected variable.
#> Warning: Fewer than `num_unique` values in selected variable.
#> Warning: No Box-Cox transformation will be made to: `x2`, `x3`
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>  predictor          4
#> 
#> Training data contained 20 data points and no missing data.
#> 
#> Operations:
#> 
#> Box-Cox transformation on x1, x4 [trained]

Created on 2021-06-01 by the reprex package (v2.0.0)

R/BoxCox.R Outdated Show resolved Hide resolved
Co-authored-by: Max Kuhn <mxkuhn@gmail.com>
@liamblake
Copy link
Contributor Author

Thanks for the improvements @juliasilge and @topepo!

@juliasilge juliasilge merged commit 6454451 into tidymodels:master Jun 1, 2021
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate warning for non-positive input to step_BoxCox()
3 participants