Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step_smote breaks with some over_ratios #119

Closed
rowanjh opened this issue Feb 8, 2023 · 1 comment · Fixed by #120
Closed

step_smote breaks with some over_ratios #119

rowanjh opened this issue Feb 8, 2023 · 1 comment · Fixed by #120
Labels
bug an unexpected problem or unintended behavior

Comments

@rowanjh
Copy link

rowanjh commented Feb 8, 2023

The problem

step_smote throws an error when a class is marginally below the target upsampling threshold. This only occurs in rare cases when smote tries to synthesize a fraction of a sample (n < 1).

The issue occurs in themis:::smote_impl. When samples_needed contains a number smaller than 1, themis:::smote_data returns an empty matrix, subsequently causing an error at the line out_df[var] <- data[[names(samples_needed)[i]]][[var]][1]

One quick solution might be to floor the ratio_target when checking which classes need to be upsampled in themis:::smote_impl, as below. This would ignore the variables fractionally below the target threshold.
which_upsample <- which(table(df[[var]]) < ratio_target) # orig
which_upsample <- which(table(df[[var]]) < floor(ratio_target)) # possible solution

Reproducible example

library(recipes)
library(themis)
set.seed(123)

dat <- data.frame(
    outcome = c(rep("X", 101), rep("Z", 50)),
    X1 = rnorm(151))

rec_49 <- recipe(outcome ~ ., data = head(dat)) |>
    step_smote(outcome, over_ratio = 0.49, seed = 231)
rec_50 <- recipe(outcome ~ ., data = head(dat)) |>
    step_smote(outcome, over_ratio = 0.5, seed = 231)
rec_51 <- recipe(outcome ~ ., data = head(dat)) |>
    step_smote(outcome, over_ratio = 0.51, seed = 231)

prep(rec_49, dat) # works
prep(rec_51, dat) # works
prep(rec_50, dat) # does not work
#> Error in `step_smote()`:
#> Caused by error in `[<-.data.frame`:
#> ! replacement has 1 row, data has 0

#> Backtrace:
#>      x
#>   1. +-recipes::prep(rec_50, dat)
#>   2. +-recipes:::prep.recipe(rec_50, dat)
#>   3. | +-recipes:::recipes_error_context(...)
#>   4. | | +-base::withCallingHandlers(...)
#>   5. | | \-base::force(expr)
#>   6. | +-recipes::bake(x$steps[[i]], new_data = training)
#>   7. | \-themis:::bake.step_smote(x$steps[[i]], new_data = training)
#>   8. |   +-withr::with_seed(...)
#>   9. |   | \-withr::with_preserve_seed(...)
#>  10. |   \-themis:::smote_impl(...)
#>  11. |     +-base::`[<-`(`*tmp*`, var, value = `<fct>`)
#>  12. |     \-base::`[<-.data.frame`(`*tmp*`, var, value = `<fct>`)
#>  13. |       \-base::stop(...)
#>  14. \-base::.handleSimpleError(...)
#>  15.   \-recipes (local) h(simpleError(msg, call))
#>  16.     \-recipes:::stop_recipes_step(call = call(step_name), parent = cnd)
#>  17.       \-recipes:::stop_recipes(...)
#>  18.         \-rlang::abort(...)

Created on 2023-02-08 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Australia.1252
#>  ctype    English_Australia.1252
#>  tz       Europe/Berlin
#>  date     2023-02-08
#>  pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package      * version    date (UTC) lib source
#>  class          7.3-19     2021-05-03 [2] CRAN (R 4.1.2)
#>  cli            3.6.0      2023-01-09 [1] CRAN (R 4.1.3)
#>  codetools      0.2-18     2020-11-04 [2] CRAN (R 4.1.2)
#>  crayon         1.5.1      2022-03-26 [1] CRAN (R 4.1.3)
#>  digest         0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  dplyr        * 1.0.8      2022-02-08 [1] CRAN (R 4.1.3)
#>  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate       0.15       2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi          1.0.3      2022-03-24 [1] CRAN (R 4.1.3)
#>  fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.1.2)
#>  fs             1.5.2      2021-12-08 [1] CRAN (R 4.1.2)
#>  future         1.26.1     2022-05-27 [1] CRAN (R 4.1.3)
#>  future.apply   1.8.1      2021-08-10 [1] CRAN (R 4.1.2)
#>  generics       0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
#>  globals        0.15.0     2022-05-09 [1] CRAN (R 4.1.3)
#>  glue           1.6.2      2022-02-24 [1] CRAN (R 4.1.3)
#>  gower          1.0.0      2022-02-03 [1] CRAN (R 4.1.2)
#>  hardhat        1.2.0      2022-06-30 [1] CRAN (R 4.1.3)
#>  highr          0.9        2021-04-16 [1] CRAN (R 4.1.2)
#>  htmltools      0.5.2      2021-08-25 [1] CRAN (R 4.1.2)
#>  ipred          0.9-12     2021-09-15 [1] CRAN (R 4.1.2)
#>  knitr          1.38       2022-03-25 [1] CRAN (R 4.1.3)
#>  lattice        0.20-45    2021-09-22 [2] CRAN (R 4.1.2)
#>  lava           1.6.10     2021-09-02 [1] CRAN (R 4.1.2)
#>  lifecycle      1.0.3      2022-10-07 [1] CRAN (R 4.1.3)
#>  listenv        0.8.0      2019-12-05 [1] CRAN (R 4.1.2)
#>  lubridate      1.8.0      2021-10-07 [1] CRAN (R 4.1.2)
#>  magrittr       2.0.2      2022-01-26 [1] CRAN (R 4.1.2)
#>  MASS           7.3-56     2022-03-23 [1] CRAN (R 4.1.3)
#>  Matrix         1.5-3      2022-11-11 [1] CRAN (R 4.1.3)
#>  nnet           7.3-16     2021-05-03 [2] CRAN (R 4.1.2)
#>  parallelly     1.32.0     2022-06-07 [1] CRAN (R 4.1.2)
#>  pillar         1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.1.2)
#>  prodlim        2019.11.13 2019-11-17 [1] CRAN (R 4.1.2)
#>  purrr          0.3.4      2020-04-17 [1] CRAN (R 4.1.2)
#>  R.cache        0.16.0     2022-07-21 [1] CRAN (R 4.1.3)
#>  R.methodsS3    1.8.1      2020-08-26 [1] CRAN (R 4.1.1)
#>  R.oo           1.24.0     2020-08-26 [1] CRAN (R 4.1.1)
#>  R.utils        2.11.0     2021-09-26 [1] CRAN (R 4.1.3)
#>  R6             2.5.1      2021-08-19 [1] CRAN (R 4.1.2)
#>  RANN           2.6.1      2019-01-08 [1] CRAN (R 4.1.3)
#>  Rcpp           1.0.8.3    2022-03-17 [1] CRAN (R 4.1.3)
#>  recipes      * 1.0.4      2023-01-11 [1] CRAN (R 4.1.3)
#>  reprex         2.0.2      2022-08-17 [1] CRAN (R 4.1.3)
#>  rlang          1.0.6      2022-09-24 [1] CRAN (R 4.1.3)
#>  rmarkdown      2.11       2021-09-14 [1] CRAN (R 4.1.2)
#>  ROSE           0.0-4      2021-06-14 [1] CRAN (R 4.1.3)
#>  rpart          4.1-15     2019-04-12 [2] CRAN (R 4.1.2)
#>  rstudioapi     0.13       2020-11-12 [1] CRAN (R 4.1.2)
#>  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi        1.7.6      2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr        1.4.0      2019-02-10 [1] CRAN (R 4.1.2)
#>  styler         1.7.0      2022-03-13 [1] CRAN (R 4.1.3)
#>  survival       3.2-13     2021-08-24 [2] CRAN (R 4.1.2)
#>  themis       * 1.0.0      2022-07-02 [1] CRAN (R 4.1.3)
#>  tibble         3.1.7      2022-05-03 [1] CRAN (R 4.1.3)
#>  tidyr          1.2.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  tidyselect     1.2.0      2022-10-10 [1] CRAN (R 4.1.3)
#>  timeDate       3043.102   2018-02-21 [1] CRAN (R 4.1.2)
#>  utf8           1.2.2      2021-07-24 [1] CRAN (R 4.1.2)
#>  vctrs          0.5.2      2023-01-23 [1] CRAN (R 4.1.3)
#>  withr          2.5.0      2022-03-03 [1] CRAN (R 4.1.3)
#>  xfun           0.29       2021-12-14 [1] CRAN (R 4.1.2)
#>  yaml           2.2.2      2022-01-25 [1] CRAN (R 4.1.2)
#> 
#>  [1] C:/Users/rowan/Documents/R/win-library/4.1
#>  [2] C:/Program Files/R/R-4.1.2/library
#> 
#> ------------------------------------------------------------------------------
@rowanjh rowanjh changed the title step_smote breaks with some over_ratios step_smote breaks with some over_ratios Feb 8, 2023
@rowanjh rowanjh changed the title step_smote breaks with some over_ratios step_smote breaks with some over_ratios Feb 8, 2023
@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Feb 8, 2023
@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants