Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pass a variable to group_vfold_cv's partition number argument #81

Closed
htlin opened this issue Jan 31, 2019 · 9 comments · Fixed by #270
Closed

How to pass a variable to group_vfold_cv's partition number argument #81

htlin opened this issue Jan 31, 2019 · 9 comments · Fixed by #270
Labels
bug an unexpected problem or unintended behavior

Comments

@htlin
Copy link

htlin commented Jan 31, 2019

Hi,
I like to make a few nested_cv's based on the same partition configuration as follows:

outer_cv <- 5
inner_cv <- 4
sampling1 <- nested_cv(all_dataset,
                      outside = group_vfold_cv(v = outer_cv, group = "Rep"),
                      inside = group_vfold_cv(v = inner_cv, group = "Rep"))
sampling2 <- ...
...

However, I am getting

object 'outer_cv' not found

error, which is out of the scope for the group_vfold_cv function.
Do you have any recommendations? Does tidy evaluation help in this case?

@topepo
Copy link
Member

topepo commented Jan 31, 2019

Can you dummy up a small data set and provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page.

@htlin
Copy link
Author

htlin commented Feb 1, 2019

Yes sorry about that, here is the reprex:

library(rsample)
#> Loading required package: tidyr

run_experiment <- function(all_dataset) {
  outer_cv <- 5
  inner_cv <- 4
  sampling1 <- nested_cv(all_dataset,
                         outside = group_vfold_cv(v = outer_cv, group = "Rep"),
                         inside = group_vfold_cv(v = inner_cv, group = "Rep"))

  sampling2 <- nested_cv(all_dataset,
                         outside = group_vfold_cv(v = outer_cv, group = "Rep2"),
                         inside = group_vfold_cv(v = inner_cv, group = "Rep2"))
}

all_dataset <- matrix(nrow = 50, ncol = 5, 0) %>% as.data.frame
all_dataset$Rep <- 1:5
all_dataset$Rep2 <- 5:1
run_experiment(all_dataset)
#> Error in group_vfold_splits(data = data, group = group, v = v): object 'outer_cv' not found

Created on 2019-02-01 by the reprex package (v0.2.1)

@DavisVaughan
Copy link
Member

In theory you should be able to do this with no problem, so I'd call it a bug. I think the environment could be captured (maybe with parent.frame()?) and then the eval() call could specify that as the environment.

Alternatively, it would probably be beneficial (and not too bad) to rewrite using quosures so we won't have to worry about the environments at all. The only weird thing would be inserting the data into the call.

@DavisVaughan
Copy link
Member

In the meantime, if you want to program around it, you can do:

library(rsample)
#> Loading required package: tidyr
#> 
#> Attaching package: 'rsample'
#> The following object is masked from 'package:tidyr':
#> 
#>     fill

run_experiment <- function(all_dataset) {
  outer_cv <- 5
  inner_cv <- 4
  
  sampling1_call <- rlang::expr(
    nested_cv(
      all_dataset,
      outside = group_vfold_cv(v = !!outer_cv, group = "Rep"),
      inside = group_vfold_cv(v = !!inner_cv, group = "Rep")
    )
  )
  
  sampling2_call <- rlang::expr(
    nested_cv(
      all_dataset,
      outside = group_vfold_cv(v = !!outer_cv, group = "Rep2"),
      inside = group_vfold_cv(v = !!inner_cv, group = "Rep2")
    )
  )
  
  sampling1 <- rlang::eval_tidy(sampling1_call)
  sampling2 <- rlang::eval_tidy(sampling2_call)
  
  list(sampling1, sampling2)
}

all_dataset <- matrix(nrow = 50, ncol = 5, 0) %>% as.data.frame
all_dataset$Rep <- 1:5
all_dataset$Rep2 <- 5:1
run_experiment(all_dataset)
#> [[1]]
#> [1] "nested_cv"      "group_vfold_cv" "rset"           "tbl_df"        
#> [5] "tbl"            "data.frame"    
#> # Nested resampling:
#> #  outer: Group 5-fold cross-validation
#> #  inner: Group 4-fold cross-validation
#> # A tibble: 5 x 3
#>   splits          id        inner_resamples 
#>   <list>          <chr>     <list>          
#> 1 <split [40/10]> Resample1 <tibble [4 × 2]>
#> 2 <split [40/10]> Resample2 <tibble [4 × 2]>
#> 3 <split [40/10]> Resample3 <tibble [4 × 2]>
#> 4 <split [40/10]> Resample4 <tibble [4 × 2]>
#> 5 <split [40/10]> Resample5 <tibble [4 × 2]>
#> 
#> [[2]]
#> [1] "nested_cv"      "group_vfold_cv" "rset"           "tbl_df"        
#> [5] "tbl"            "data.frame"    
#> # Nested resampling:
#> #  outer: Group 5-fold cross-validation
#> #  inner: Group 4-fold cross-validation
#> # A tibble: 5 x 3
#>   splits          id        inner_resamples 
#>   <list>          <chr>     <list>          
#> 1 <split [40/10]> Resample1 <tibble [4 × 2]>
#> 2 <split [40/10]> Resample2 <tibble [4 × 2]>
#> 3 <split [40/10]> Resample3 <tibble [4 × 2]>
#> 4 <split [40/10]> Resample4 <tibble [4 × 2]>
#> 5 <split [40/10]> Resample5 <tibble [4 × 2]>

Created on 2019-02-01 by the reprex package (v0.2.1.9000)

@fbchow
Copy link
Collaborator

fbchow commented Feb 1, 2019

What's the tidyeval equivalent of match.call() ? I tried using quo and eval_tidy but not sure how to find the environment's parents.

The actual value of outer_cv and inner_cv don't get picked up by match.call().

rsample/R/nest.R

Lines 56 to 58 in 775ac55

nested_cv <- function(data, outside, inside) {
nest_args <- formalArgs(nested_cv)
cl <- match.call()

So you can't evaluate the outside

outside <- eval(outer_cl)

and inside

rsample/R/nest.R

Lines 96 to 99 in 775ac55

inside_resample <- function(src, cl) {
cl$data <- quote(as.data.frame(src))
eval(cl)
}

@DavisVaughan
Copy link
Member

@fbchow it will probably use enquo() and eval_tidy() as you are saying. When you evaluate the quosure using eval_tidy(), it will evaluate the quosure in the environment that it was specified in (which I think is the parent that you are referring to).

The weirdness for this example is that we are going to have to modify the expression of the quosure using something like rlang::call_modify() before evaluating it. It will likely look something like this:

library(rlang)
library(rsample)
#> Warning: package 'rsample' was built under R version 3.5.2
#> Loading required package: tidyr
dat <- data.frame(x = c(1, 2))

outside <- rlang::quo(bootstraps(times = 5))
outside
#> <quosure>
#> expr: ^bootstraps(times = 5)
#> env:  global

outside_modified <- rlang::call_modify(outside, data = dat)
outside_modified
#> <quosure>
#> expr: ^bootstraps(times = 5, data = <data.frame>)
#> env:  global

eval_tidy(outside_modified)
#> # Bootstrap sampling 
#> # A tibble: 5 x 2
#>   splits        id        
#>   <list>        <chr>     
#> 1 <split [2/1]> Bootstrap1
#> 2 <split [2/1]> Bootstrap2
#> 3 <split [2/0]> Bootstrap3
#> 4 <split [2/1]> Bootstrap4
#> 5 <split [2/1]> Bootstrap5

Created on 2019-02-11 by the reprex package (v0.2.1.9000)

@DavisVaughan
Copy link
Member

You can also use data = expr(dat) rather than data = dat which will embed the name dat into the call rather than the entire data frame there. It shouldn't make a big difference for this example though.

library(rlang)
library(rsample)
#> Warning: package 'rsample' was built under R version 3.5.2
#> Loading required package: tidyr
dat <- data.frame(x = c(1, 2))

outside <- rlang::quo(bootstraps(times = 5))
outside
#> <quosure>
#> expr: ^bootstraps(times = 5)
#> env:  global

outside_modified <- rlang::call_modify(outside, data = rlang::expr(dat))
outside_modified
#> <quosure>
#> expr: ^bootstraps(times = 5, data = dat)
#> env:  global

eval_tidy(outside_modified)
#> # Bootstrap sampling 
#> # A tibble: 5 x 2
#>   splits        id        
#>   <list>        <chr>     
#> 1 <split [2/0]> Bootstrap1
#> 2 <split [2/0]> Bootstrap2
#> 3 <split [2/0]> Bootstrap3
#> 4 <split [2/0]> Bootstrap4
#> 5 <split [2/0]> Bootstrap5

Created on 2019-02-11 by the reprex package (v0.2.1.9000)

@juliasilge juliasilge added the bug an unexpected problem or unintended behavior label May 1, 2020
@juliasilge
Copy link
Member

At long last, this is now fixed:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rsample)

run_experiment <- function(all_dataset) {
  outer_cv <- 5
  inner_cv <- 4
  sampling1 <- nested_cv(all_dataset,
                         outside = group_vfold_cv(v = outer_cv, group = "Rep"),
                         inside = group_vfold_cv(v = inner_cv, group = "Rep"))
  
  sampling2 <- nested_cv(all_dataset,
                         outside = group_vfold_cv(v = outer_cv, group = "Rep2"),
                         inside = group_vfold_cv(v = inner_cv, group = "Rep2"))
  
  list(sampling1, sampling2)
}

all_dataset <- matrix(nrow = 50, ncol = 5, 0) %>% as.data.frame()
all_dataset$Rep <- 1:5
all_dataset$Rep2 <- 5:1
run_experiment(tibble(all_dataset))
#> [[1]]
#> # Nested resampling:
#> #  outer: Group 5-fold cross-validation
#> #  inner: Group 4-fold cross-validation
#> # A tibble: 5 × 3
#>   splits          id        inner_resamples         
#>   <list>          <chr>     <list>                  
#> 1 <split [40/10]> Resample1 <group_vfold_cv [4 × 2]>
#> 2 <split [40/10]> Resample2 <group_vfold_cv [4 × 2]>
#> 3 <split [40/10]> Resample3 <group_vfold_cv [4 × 2]>
#> 4 <split [40/10]> Resample4 <group_vfold_cv [4 × 2]>
#> 5 <split [40/10]> Resample5 <group_vfold_cv [4 × 2]>
#> 
#> [[2]]
#> # Nested resampling:
#> #  outer: Group 5-fold cross-validation
#> #  inner: Group 4-fold cross-validation
#> # A tibble: 5 × 3
#>   splits          id        inner_resamples         
#>   <list>          <chr>     <list>                  
#> 1 <split [40/10]> Resample1 <group_vfold_cv [4 × 2]>
#> 2 <split [40/10]> Resample2 <group_vfold_cv [4 × 2]>
#> 3 <split [40/10]> Resample3 <group_vfold_cv [4 × 2]>
#> 4 <split [40/10]> Resample4 <group_vfold_cv [4 × 2]>
#> 5 <split [40/10]> Resample5 <group_vfold_cv [4 × 2]>

Created on 2021-11-18 by the reprex package (v2.0.1)

@github-actions
Copy link

github-actions bot commented Dec 3, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants