-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to pass a variable to group_vfold_cv's partition number argument #81
Comments
Can you dummy up a small data set and provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page. |
Yes sorry about that, here is the reprex: library(rsample)
#> Loading required package: tidyr
run_experiment <- function(all_dataset) {
outer_cv <- 5
inner_cv <- 4
sampling1 <- nested_cv(all_dataset,
outside = group_vfold_cv(v = outer_cv, group = "Rep"),
inside = group_vfold_cv(v = inner_cv, group = "Rep"))
sampling2 <- nested_cv(all_dataset,
outside = group_vfold_cv(v = outer_cv, group = "Rep2"),
inside = group_vfold_cv(v = inner_cv, group = "Rep2"))
}
all_dataset <- matrix(nrow = 50, ncol = 5, 0) %>% as.data.frame
all_dataset$Rep <- 1:5
all_dataset$Rep2 <- 5:1
run_experiment(all_dataset)
#> Error in group_vfold_splits(data = data, group = group, v = v): object 'outer_cv' not found Created on 2019-02-01 by the reprex package (v0.2.1) |
In theory you should be able to do this with no problem, so I'd call it a bug. I think the environment could be captured (maybe with Alternatively, it would probably be beneficial (and not too bad) to rewrite using quosures so we won't have to worry about the environments at all. The only weird thing would be inserting the data into the call. |
In the meantime, if you want to program around it, you can do: library(rsample)
#> Loading required package: tidyr
#>
#> Attaching package: 'rsample'
#> The following object is masked from 'package:tidyr':
#>
#> fill
run_experiment <- function(all_dataset) {
outer_cv <- 5
inner_cv <- 4
sampling1_call <- rlang::expr(
nested_cv(
all_dataset,
outside = group_vfold_cv(v = !!outer_cv, group = "Rep"),
inside = group_vfold_cv(v = !!inner_cv, group = "Rep")
)
)
sampling2_call <- rlang::expr(
nested_cv(
all_dataset,
outside = group_vfold_cv(v = !!outer_cv, group = "Rep2"),
inside = group_vfold_cv(v = !!inner_cv, group = "Rep2")
)
)
sampling1 <- rlang::eval_tidy(sampling1_call)
sampling2 <- rlang::eval_tidy(sampling2_call)
list(sampling1, sampling2)
}
all_dataset <- matrix(nrow = 50, ncol = 5, 0) %>% as.data.frame
all_dataset$Rep <- 1:5
all_dataset$Rep2 <- 5:1
run_experiment(all_dataset)
#> [[1]]
#> [1] "nested_cv" "group_vfold_cv" "rset" "tbl_df"
#> [5] "tbl" "data.frame"
#> # Nested resampling:
#> # outer: Group 5-fold cross-validation
#> # inner: Group 4-fold cross-validation
#> # A tibble: 5 x 3
#> splits id inner_resamples
#> <list> <chr> <list>
#> 1 <split [40/10]> Resample1 <tibble [4 × 2]>
#> 2 <split [40/10]> Resample2 <tibble [4 × 2]>
#> 3 <split [40/10]> Resample3 <tibble [4 × 2]>
#> 4 <split [40/10]> Resample4 <tibble [4 × 2]>
#> 5 <split [40/10]> Resample5 <tibble [4 × 2]>
#>
#> [[2]]
#> [1] "nested_cv" "group_vfold_cv" "rset" "tbl_df"
#> [5] "tbl" "data.frame"
#> # Nested resampling:
#> # outer: Group 5-fold cross-validation
#> # inner: Group 4-fold cross-validation
#> # A tibble: 5 x 3
#> splits id inner_resamples
#> <list> <chr> <list>
#> 1 <split [40/10]> Resample1 <tibble [4 × 2]>
#> 2 <split [40/10]> Resample2 <tibble [4 × 2]>
#> 3 <split [40/10]> Resample3 <tibble [4 × 2]>
#> 4 <split [40/10]> Resample4 <tibble [4 × 2]>
#> 5 <split [40/10]> Resample5 <tibble [4 × 2]> Created on 2019-02-01 by the reprex package (v0.2.1.9000) |
What's the tidyeval equivalent of The actual value of Lines 56 to 58 in 775ac55
So you can't evaluate the Line 72 in 775ac55
and Lines 96 to 99 in 775ac55
|
@fbchow it will probably use The weirdness for this example is that we are going to have to modify the expression of the quosure using something like library(rlang)
library(rsample)
#> Warning: package 'rsample' was built under R version 3.5.2
#> Loading required package: tidyr
dat <- data.frame(x = c(1, 2))
outside <- rlang::quo(bootstraps(times = 5))
outside
#> <quosure>
#> expr: ^bootstraps(times = 5)
#> env: global
outside_modified <- rlang::call_modify(outside, data = dat)
outside_modified
#> <quosure>
#> expr: ^bootstraps(times = 5, data = <data.frame>)
#> env: global
eval_tidy(outside_modified)
#> # Bootstrap sampling
#> # A tibble: 5 x 2
#> splits id
#> <list> <chr>
#> 1 <split [2/1]> Bootstrap1
#> 2 <split [2/1]> Bootstrap2
#> 3 <split [2/0]> Bootstrap3
#> 4 <split [2/1]> Bootstrap4
#> 5 <split [2/1]> Bootstrap5 Created on 2019-02-11 by the reprex package (v0.2.1.9000) |
You can also use library(rlang)
library(rsample)
#> Warning: package 'rsample' was built under R version 3.5.2
#> Loading required package: tidyr
dat <- data.frame(x = c(1, 2))
outside <- rlang::quo(bootstraps(times = 5))
outside
#> <quosure>
#> expr: ^bootstraps(times = 5)
#> env: global
outside_modified <- rlang::call_modify(outside, data = rlang::expr(dat))
outside_modified
#> <quosure>
#> expr: ^bootstraps(times = 5, data = dat)
#> env: global
eval_tidy(outside_modified)
#> # Bootstrap sampling
#> # A tibble: 5 x 2
#> splits id
#> <list> <chr>
#> 1 <split [2/0]> Bootstrap1
#> 2 <split [2/0]> Bootstrap2
#> 3 <split [2/0]> Bootstrap3
#> 4 <split [2/0]> Bootstrap4
#> 5 <split [2/0]> Bootstrap5 Created on 2019-02-11 by the reprex package (v0.2.1.9000) |
At long last, this is now fixed: library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(rsample)
run_experiment <- function(all_dataset) {
outer_cv <- 5
inner_cv <- 4
sampling1 <- nested_cv(all_dataset,
outside = group_vfold_cv(v = outer_cv, group = "Rep"),
inside = group_vfold_cv(v = inner_cv, group = "Rep"))
sampling2 <- nested_cv(all_dataset,
outside = group_vfold_cv(v = outer_cv, group = "Rep2"),
inside = group_vfold_cv(v = inner_cv, group = "Rep2"))
list(sampling1, sampling2)
}
all_dataset <- matrix(nrow = 50, ncol = 5, 0) %>% as.data.frame()
all_dataset$Rep <- 1:5
all_dataset$Rep2 <- 5:1
run_experiment(tibble(all_dataset))
#> [[1]]
#> # Nested resampling:
#> # outer: Group 5-fold cross-validation
#> # inner: Group 4-fold cross-validation
#> # A tibble: 5 × 3
#> splits id inner_resamples
#> <list> <chr> <list>
#> 1 <split [40/10]> Resample1 <group_vfold_cv [4 × 2]>
#> 2 <split [40/10]> Resample2 <group_vfold_cv [4 × 2]>
#> 3 <split [40/10]> Resample3 <group_vfold_cv [4 × 2]>
#> 4 <split [40/10]> Resample4 <group_vfold_cv [4 × 2]>
#> 5 <split [40/10]> Resample5 <group_vfold_cv [4 × 2]>
#>
#> [[2]]
#> # Nested resampling:
#> # outer: Group 5-fold cross-validation
#> # inner: Group 4-fold cross-validation
#> # A tibble: 5 × 3
#> splits id inner_resamples
#> <list> <chr> <list>
#> 1 <split [40/10]> Resample1 <group_vfold_cv [4 × 2]>
#> 2 <split [40/10]> Resample2 <group_vfold_cv [4 × 2]>
#> 3 <split [40/10]> Resample3 <group_vfold_cv [4 × 2]>
#> 4 <split [40/10]> Resample4 <group_vfold_cv [4 × 2]>
#> 5 <split [40/10]> Resample5 <group_vfold_cv [4 × 2]> Created on 2021-11-18 by the reprex package (v2.0.1) |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Hi,
I like to make a few
nested_cv
's based on the same partition configuration as follows:However, I am getting
error, which is out of the scope for the
group_vfold_cv
function.Do you have any recommendations? Does tidy evaluation help in this case?
The text was updated successfully, but these errors were encountered: