Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement nest(.by = ) and revive .key #1461

Merged
merged 6 commits into from
Jan 13, 2023

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Jan 12, 2023

Closes #1458

df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)

# Can now do this
nest(df, .by = x)
#> # A tibble: 3 × 2
#>       x data            
#>   <dbl> <list>          
#> 1     1 <tibble [3 × 2]>
#> 2     2 <tibble [2 × 2]>
#> 3     3 <tibble [1 × 2]>

# Rather than either of these:
# nest(df, data = -x)
# nest(df, data = c(y, z))

Also has these two cool new features:

# Implicit `select(df, -z)`
nest(df, data = y, .by = x)
#> # A tibble: 3 × 2
#>       x data            
#>   <dbl> <list>          
#> 1     1 <tibble [3 × 1]>
#> 2     2 <tibble [2 × 1]>
#> 3     3 <tibble [1 × 1]>

# Include `x` in the inner nesting as well
nest(df, data = everything(), .by = x)
#> # A tibble: 3 × 2
#>       x data            
#>   <dbl> <list>          
#> 1     1 <tibble [3 × 3]>
#> 2     2 <tibble [2 × 3]>
#> 3     3 <tibble [1 × 3]>

This also revives the .key argument, with a default of NULL. This is a bit tricky because it used to be deprecated(), and if S3 method authors updated their methods to also say .key = deprecated() as the default, then their code may break. I've handled this by silently converting deprecated() to NULL.

.key is used whenever ... isn't specified. This happens in 3 ways:

  • nest(df) where we nest everything
  • nest(df, .by = x) where we specify columns to nest by
  • df %>% group_by(x) %>% nest() where columns to nest by are implied by the groups

If both ... and .key are specified, a warning is thrown saying .key is ignored.


The only revdep that breaks because of these changes is panelr, because they have sticky [ methods and we now use df[cols] in nest(). I can send them a PR after we merge this.

R/nest.R Outdated Show resolved Hide resolved
@@ -106,39 +156,52 @@ nest <- function(.data, ..., .names_sep = NULL, .key = deprecated()) {
i = "Did you want `{(.key)} = {cols_fixed_label}`?"
))

return(nest(.data, !!!cols))
return(nest(.data, !!!cols, .by = {{ .by }}))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supports df %>% nest(y, .by = x) where we would get the warning about forgetting data = y but then it would continue as expected

Comment on lines +203 to +196
out <- vec_cbind(out, inner, .name_repair = "check_unique", .error_call = error_call)
out <- reconstruct_tibble(.data, out)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vec_cbind() with a grouped data frame will drop the <grouped_df> class and return a bare tibble. So we use reconstruct_tibble() after that to restore the grouped class if possible. There are tests for this.

Both pack() and chop() also do reconstruct_tibble() on the way out so we don't need it after them

Comment on lines -159 to +223
NextMethod()
nest.tbl_df(.data, ..., .key = .key, .names_sep = .names_sep)
Copy link
Member Author

@DavisVaughan DavisVaughan Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bit unfortunate. Adding a tidy-select .by argument means that you can't call NextMethod() anymore, you have to recall the generic (or the method directly, as is done here).

This did not break any S3 methods out there as far as I can tell.

Comment on lines +148 to 156
test_that("catches when `...` overwrites an existing column", {
df <- tibble(x = 1, y = 2)

# Hardcoded as an error.
# Name repair would likely break internal usage of `chop()`.
expect_snapshot(error = TRUE, {
nest(df, x = y)
})
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously not being caught, and was giving an even more obscure error from chop()

Comment on lines +309 to 310
if (identical(maybe_missing(key), deprecated())) {
# Temporary support for S3 method authors that set `.key = deprecated()`.
# Remove this entire helper all methods have been updated.
key <- NULL
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My "hack" to catch .key = deprecated() from other S3 methods

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we've finally gotten the nesting interface right!

R/nest.R Show resolved Hide resolved
@@ -106,39 +151,52 @@ nest <- function(.data, ..., .names_sep = NULL, .key = deprecated()) {
i = "Did you want `{(.key)} = {cols_fixed_label}`?"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we switch this to lifecycle_warn(always = TRUE)? Or maybe even make it defunct? It was deprecated in tidyr 1.0.0 (August 2019), so it's been >3 years with a warning on every use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'll fiddle with that in a follow up PR

@DavisVaughan DavisVaughan merged commit c5b189e into tidyverse:main Jan 13, 2023
@DavisVaughan DavisVaughan deleted the feature/nest-by branch January 13, 2023 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can we somehow have nest(.by = )?
2 participants