Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step_smote doesn't work #14

Closed
sebastien-foulle opened this issue Mar 6, 2020 · 3 comments
Closed

step_smote doesn't work #14

sebastien-foulle opened this issue Mar 6, 2020 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@sebastien-foulle
Copy link

Hello,

the function step_smote doesn't seem to work, whereas step_rose does :

library(dplyr)
library(themis)
dtf = otvPlots::bankData %>% filter(previous == 3) %>% select(campaign, target = y)
dtf %>% count(target)

target n
no 848
yes 294

set.seed(2020)
dtf_rose = dtf %>% recipe(target ~ .) %>% step_rose(target) %>% prep %>% juice
dtf_rose %>% count(target)

target n
no 854
yes 842

dtf_smote = dtf %>% recipe(target ~ .) %>% step_smote(target) %>% prep %>% juice
dtf_smote %>% count(target)

target n
no 848
yes 294

sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] themis_0.1.0.9000 recipes_0.1.9 dplyr_0.8.4

loaded via a namespace (and not attached):
[1] tidyr_1.0.2 splines_3.6.3 foreach_1.4.8 prodlim_2019.11.13 Formula_1.2-3
[6] moments_0.14 assertthat_0.2.1 unbalanced_2.0 latticeExtra_0.6-29 ipred_0.9-9
[11] pillar_1.4.3 backports_1.1.5 lattice_0.20-38 glue_1.3.1 digest_0.6.25
[16] RColorBrewer_1.1-2 checkmate_2.0.0 colorspace_1.4-1 mlr_2.17.0 htmltools_0.4.0
[21] Matrix_1.2-18 timeDate_3043.102 pkgconfig_2.0.3 purrr_0.3.3 scales_1.1.0
[26] parallelMap_1.4 RANN_2.6.1 jpeg_0.1-8.1 gower_0.2.1 lava_1.6.7
[31] tibble_2.1.3 htmlTable_1.13.3 generics_0.0.2 ggplot2_3.3.0 withr_2.1.2
[36] ROSE_0.0-3 nnet_7.3-12 cli_2.0.2 survival_3.1-8 magrittr_1.5
[41] crayon_1.3.4 fansi_0.4.1 doParallel_1.0.15 MASS_7.3-51.5 foreign_0.8-75
[46] class_7.3-15 FNN_1.1.3 tools_3.6.3 data.table_1.12.8 otvPlots_0.2.1
[51] lifecycle_0.2.0 BBmisc_1.11 stringr_1.4.0 munsell_0.5.0 cluster_2.1.0
[56] compiler_3.6.3 rlang_0.4.5 grid_3.6.3 iterators_1.0.12 rstudioapi_0.11
[61] htmlwidgets_1.5.1 base64enc_0.1-3 gtable_0.3.0 codetools_0.2-16 R6_2.4.1
[66] gridExtra_2.3 ParamHelpers_1.13 lubridate_1.7.4 knitr_1.28 utf8_1.1.4
[71] fastmatch_1.1-0 Hmisc_4.3-1 stringi_1.4.6 parallel_3.6.3 Rcpp_1.0.3
[76] vctrs_0.2.3 rpart_4.1-15 acepack_1.4.1 png_0.1-7 tidyselect_1.0.0
[81] xfun_0.12

Have a good day

@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Mar 6, 2020
@EmilHvitfeldt
Copy link
Member

Hello @sebastien-foulle!

thank you for filing this issue! you are correct, this is not working as intended. For now to remedy this issue you can use forcats::fct_rev to get the desired results.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(themis)
#> Loading required package: recipes
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
#> Registered S3 methods overwritten by 'themis':
#>   method               from   
#>   bake.step_downsample recipes
#>   bake.step_upsample   recipes
#>   prep.step_downsample recipes
#>   prep.step_upsample   recipes
#>   tidy.step_downsample recipes
#>   tidy.step_upsample   recipes
#> 
#> Attaching package: 'themis'
#> The following objects are masked from 'package:recipes':
#> 
#>     step_downsample, step_upsample, tunable.step_downsample,
#>     tunable.step_upsample
dtf = otvPlots::bankData %>% 
  filter(previous == 3) %>% 
  select(campaign, target = y)

dtf %>% count(target)
#> # A tibble: 2 x 2
#>   target     n
#>   <chr>  <int>
#> 1 no       848
#> 2 yes      294

dtf_smote = dtf %>% 
  mutate(target = forcats::fct_rev(target)) %>%
  recipe(target ~ .) %>% 
  step_smote(target) %>% 
  prep %>% 
  juice

dtf_smote %>% count(target)
#> # A tibble: 2 x 2
#>   target     n
#>   <fct>  <int>
#> 1 yes      848
#> 2 no       848

Created on 2020-03-06 by the reprex package (v0.3.0)

@EmilHvitfeldt
Copy link
Member

Alternatively, you can download the developmental version where the issue has been fixed

require("devtools")
install_github("tidymodels/themis")

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants