Skip to content

initial_split() with strata argument behavior #217

@issactoast

Description

@issactoast

The problem

I was expecting to have the same dimension of dataframe from
Unexpected size of data frame after initial_split() with strata argument compare to sample_frac()
At least, the original nrow of airquality data is 153 so I guess airquality_frac2 should have 15 or 16.

Reproducible example

#sample_frac and sample_n
library(dplyr)
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(2021)
dim(airquality)
#> [1] 153   6

#sample 10% from each month
airquality_frac <- airquality %>% 
    group_by(Month) %>%  
    sample_frac(size = 0.1, replace=FALSE)

library(rsample)

airquality_split <- initial_split(airquality, prop = 0.1,
              strata = "Month")

airquality_frac2 <- training(airquality_split)

dim(airquality_frac)
#> [1] 15  6
dim(airquality_frac2)
#> [1] 18  6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions