The problem
I was expecting to have the same dimension of dataframe from
Unexpected size of data frame after initial_split() with strata argument compare to sample_frac()
At least, the original nrow of airquality data is 153 so I guess airquality_frac2 should have 15 or 16.
Reproducible example
#sample_frac and sample_n
library(dplyr)
#>
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
set.seed(2021)
dim(airquality)
#> [1] 153 6
#sample 10% from each month
airquality_frac <- airquality %>%
group_by(Month) %>%
sample_frac(size = 0.1, replace=FALSE)
library(rsample)
airquality_split <- initial_split(airquality, prop = 0.1,
strata = "Month")
airquality_frac2 <- training(airquality_split)
dim(airquality_frac)
#> [1] 15 6
dim(airquality_frac2)
#> [1] 18 6
The problem
I was expecting to have the same dimension of dataframe from
Unexpected size of data frame after initial_split() with strata argument compare to sample_frac()
At least, the original nrow of airquality data is 153 so I guess
airquality_frac2should have 15 or 16.Reproducible example