-
Notifications
You must be signed in to change notification settings - Fork 416
Implement keep_empty
for unnest_longer()
#1442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement keep_empty
for unnest_longer()
#1442
Conversation
dab3470
to
c3b0e34
Compare
R/unnest-longer.R
Outdated
if (is.integer(index)) { | ||
index <- 1L | ||
} else { | ||
index <- NA_character_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am slightly questioning our decision to use NA
when there aren't any names over ""
, like what names2()
would return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that there are no names here, it's that there are no values. So I think NA
is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really referring to this
tibble(
x = list(1:2, c(a = 3, b = 4))
) %>%
unnest_longer(x)
#> # A tibble: 4 × 2
#> x x_id
#> <dbl> <chr>
#> 1 1 <NA>
#> 2 2 <NA>
#> 3 3 a
#> 4 4 b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could change ^ to use ""
, and then consistently use NA_integer_
or NA_character_
when we promote empty vectors to size 1? So its like:
No names anywhere:
- Positional names for existing values
NA_integer_
if a zero length vector was promoted to size 1
At least one name somewhere:
- Elements with names use their names
- Elements without names use
""
NA_character_
if a zero length vector was promoted to size 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ""
for values without names is probably more consistent in this case
tibble(
x = list(c(a = 1, 2), c(3, 4))
) %>%
unnest_longer(x)
#> # A tibble: 4 × 2
#> x x_id
#> <dbl> <chr>
#> 1 1 "a"
#> 2 2 ""
#> 3 3 <NA> # <- better as ""
#> 4 4 <NA> # <- better as ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like using ""
for unnamed elements and NA
for elements that didn't exist. But I can see how that might be slicing things too fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what I'm suggesting? 3
and 4
above are "unnamed elements" to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this case:
library(tidyr)
tibble(x = list(c(a = 1, 2), c(3, 4), NULL)) %>%
unnest_longer(x, keep_empty = TRUE)
#> # A tibble: 5 × 2
#> x x_id
#> <dbl> <chr>
#> 1 1 "a"
#> 2 2 ""
#> 3 3 <NA> # should be ""
#> 4 4 <NA> # should be ""
#> 5 NA <NA> # should be NA
Created on 2023-01-05 with reprex v2.0.2
R/unnest-longer.R
Outdated
if (is.integer(index)) { | ||
index <- 1L | ||
} else { | ||
index <- NA_character_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that there are no names here, it's that there are no values. So I think NA
is correct.
And change to: - `""` index names for unnamed vectors - `NA` index names for empty vectors that are "kept"
This PR grew a bit, but I realized that I could vectorize
We already have After this PR I'll go back and remove I've also implemented the changes to the name handling we discussed here #1442 (comment). It was heavily tied to this so it was easiest to change it all at once. |
if (is_list_of(x)) { | ||
cli::cli_abort("`x` can't be a list-of. Unclass first and provide `ptype`.", .internal = TRUE) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The restriction of "no list-of" isn't technically necessary, in particular the previous list_init_empty()
helper tried to allow list-ofs and use their information. But this is part of what caused the failure in #1327, and it is cleaner to extract out the ptype
of list-ofs ahead of time, strip them down to bare lists, and pass all that through instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me because list_of()
can't contain NULL
s, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can actually (that's the missing value) but can't contain unspecified()
(although that branch probably wouldn't be hit for list-ofs because you'd typically also be supplying ptype
)
if (is_list_of(x)) { | ||
cli::cli_abort("`x` can't be a list-of. Unclass first and provide `ptype`.", .internal = TRUE) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me because list_of()
can't contain NULL
s, right?
|
||
if (vec_any_missing(x)) { | ||
null <- vec_detect_missing(x) | ||
null <- which(null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you assume that vec_assign()
will do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely will. But since I use null
in two separate assignments below, it seemed slightly more efficient to which()
it once here
Closes #1363
Closes #1339
We are going to use
keep_empty = FALSE
as the default to be consistent withunnest()
,unchop()
,separate_longer_position()
, etc. This is a breaking change as previouslyNULL
elements implicitly hadkeep_empty = TRUE
applied to them. However, we are now consistent with how we handleNULL
and empty vectors likeinteger()
, which I think is important.No changes to worse in revdeps, so I feel good about this!
"322b4064-9f31-474f-9750-36e8c7c2d8df"