Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reverse_splits #319

Merged
merged 4 commits into from
Jun 29, 2022
Merged

Add reverse_splits #319

merged 4 commits into from
Jun 29, 2022

Conversation

mikemahoney218
Copy link
Member

@mikemahoney218 mikemahoney218 commented Jun 29, 2022

This PR fixes #284 by adding a new utility function to "reverse" the analysis and assessment splits.

library(rsample)

(starting_splits <- vfold_cv(mtcars, v = 3))
#> #  3-fold cross-validation 
#> # A tibble: 3 × 2
#>   splits          id   
#>   <list>          <chr>
#> 1 <split [21/11]> Fold1
#> 2 <split [21/11]> Fold2
#> 3 <split [22/10]> Fold3

reverse_splits(starting_splits)
#> #  3-fold cross-validation 
#> # A tibble: 3 × 2
#>   splits          id   
#>   <list>          <chr>
#> 1 <split [11/21]> Fold1
#> 2 <split [11/21]> Fold2
#> 3 <split [10/22]> Fold3

reverse_splits(starting_splits$splits[[1]])
#> <Analysis/Assess/Total>
#> <11/21/32>

reverse_splits(1)
#> Error in `reverse_splits()`:
#> ! `x` must be either an `rsplit` or an `rset` object

reverse_splits(starting_splits, "no dots allowed!")
#> Error in `reverse_splits()`:
#> ! `...` must be empty.
#> ✖ Problematic argument:
#> • ..1 = "no dots allowed!"
#> ℹ Did you forget to name an argument?

Created on 2022-06-29 by the reprex package (v2.0.1)

@@ -109,7 +109,7 @@ rset_reconstructable <- function(x, to) {
test_data <- function() {
data.frame(
x = 1:50,
y = rep(c(1, 2), each = 25),
y = rep(seq.int(10), each = 5),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only having two groups here made group_mc_cv() error due to having 0 data in the assessment set. I also did a version of this in #316 to deal with the same issue there (to get both PRs to pass CI).

@mikemahoney218 mikemahoney218 marked this pull request as ready for review June 29, 2022 19:20
@juliasilge
Copy link
Member

@mattwarkentin would you like to take a look at this?

R/misc.R Outdated Show resolved Hide resolved
@mattwarkentin
Copy link
Contributor

mattwarkentin commented Jun 29, 2022

I made a couple minor comments but otherwise LGTM! Thanks for addressing this so quickly, @mikemahoney218.

@mattwarkentin
Copy link
Contributor

mattwarkentin commented Jun 29, 2022

Oh, one more thought. Have you tested this function with an object from permutations()? Splits for a permutations object have no assessment set by design, so not sure how this should be handled. Maybe trying to reverse the splits for this object should throw an error? Calling assessment() on a permutations object throws an error, so this would be consistent behaviour, I think.

@mikemahoney218
Copy link
Member Author

mikemahoney218 commented Jun 29, 2022

So this does currently error, though we could possibly provide a friendlier message than what's currently thrown:

library(rsample)

permutations(mtcars, cyl)
#> # Permutation sampling 
#> # Permuted columns: [cyl] 
#> # A tibble: 25 × 2
#>    splits         id            
#>    <list>         <chr>         
#>  1 <split [32/0]> Permutations01
#>  2 <split [32/0]> Permutations02
#>  3 <split [32/0]> Permutations03
#>  4 <split [32/0]> Permutations04
#>  5 <split [32/0]> Permutations05
#>  6 <split [32/0]> Permutations06
#>  7 <split [32/0]> Permutations07
#>  8 <split [32/0]> Permutations08
#>  9 <split [32/0]> Permutations09
#> 10 <split [32/0]> Permutations10
#> # … with 15 more rows

permutations(mtcars, cyl) |> 
  reverse_splits()
#> Error in `rsplit()` at rsample/R/misc.R:23:2:
#> ! At least one row should be selected for the analysis set.

Created on 2022-06-29 by the reprex package (v2.0.1)

(Which, as a sidenote @juliasilge , permutations() isn't in rset_subclasses in compat-vctrs-helpers, otherwise this would cause my test to fail -- should it be added to that list?)

@mattwarkentin
Copy link
Contributor

mattwarkentin commented Jun 29, 2022

Related: I am now seeing that the error thrown when calling assessment() on a permutations object is not correct:

library(rsample)

p <- permutations(mtcars, cyl)

assessment(p$splits[[1]])
#> Error in `as.data.frame()`:
#> ! There is no assessment data set for an `rsplit` object with class `rsplit`.

It should read something like:

#> ! There is no assessment data set for an rsplit object from a permutations set.

Or something along those lines...

@mikemahoney218
Copy link
Member Author

Related: I am now seeing that the error thrown when calling assessment() on a permutations object is not correct...

Opened #321 to track that one 😄

Copy link
Member

@juliasilge juliasilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much! 🙌

@juliasilge juliasilge merged commit c8c4a05 into main Jun 29, 2022
@juliasilge juliasilge deleted the mike/rev_splits branch June 29, 2022 22:54
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reversing analysis and assessment splits
3 participants