Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-row list-column tibble names are not propogated through unnest() #723

Closed
paleolimbot opened this issue Aug 29, 2019 · 13 comments
Closed
Labels
bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames
Milestone

Comments

@paleolimbot
Copy link
Member

I don't know if this is is a problem per se, but is a change in behaviour from tidyr 0.8.3, where all column names from a list-column are included in the output, even if all of them have zero rows. rclimateca actively uses this behaviour (paleolimbot/rclimateca#17) but could be rewritten not to.

Using the development version of tidyr:

library(tidyr)
library(tibble)

df <- tibble(x = "string", data = list(tibble(y = character(0))))
colnames(unnest(df, data))
#> [1] "x"

Using the CRAN version:

library(tidyr)
library(tibble)

df <- tibble(x = "string", data = list(tibble(y = character(0))))
colnames(unnest(df, data))
#> [1] "x" "y"

Created on 2019-08-29 by the reprex package (v0.2.1)

@hadley
Copy link
Member

hadley commented Aug 29, 2019

That's definitely not desired. (I could've sworn I already fixed this though)

@paleolimbot
Copy link
Member Author

I looked at #650 and #483 before posting this, but those seem to result from list-columns of non-data frames.

@hadley
Copy link
Member

hadley commented Sep 7, 2019

Ok, so you can request it:

library(tidyr)
library(tibble)

df <- tibble(x = "string", data = list(tibble(y = character(0))))
colnames(unnest(df, data, keep_empty = TRUE))
#> [1] "x" "y"

Created on 2019-09-07 by the reprex package (v0.3.0)

So the question is: should this be the default for unnest()?

@hadley
Copy link
Member

hadley commented Sep 7, 2019

(And regardless of what the default should be, keep_empty needs its own documentation in unnest(); inheriting from unchop is confusing)

Or does it even need to be an option? unnest_longer() always calls chop() with keep_empty = TRUE?

@hadley hadley added bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames labels Sep 7, 2019
@hadley hadley added this to the v1.0.0 milestone Sep 7, 2019
@jennybc
Copy link
Member

jennybc commented Sep 7, 2019

Or does it even need to be an option? unnest_longer() always calls chop() with keep_empty = TRUE?

This feels correct to me. Because otherwise it's creeping into sapply() territory, where the shape of the return value depends on the data.

@hadley
Copy link
Member

hadley commented Sep 9, 2019

Changing this default also changes the result of these:

  df1 <- tibble(x = 1:2, y = list(NULL, tibble(z = 1)))
  out <- unnest(df1, y)

  df2 <- tibble(x = 1:2, y = list(NULL, 1))
  out <- unnest(df2, y)

With keep_empty = FALSE they return a single row; with keep_empty = TRUE they return two rows.

@hadley
Copy link
Member

hadley commented Sep 9, 2019

OTOH those are both errors in the tidyr 0.8.3:

library(tidyr)
library(tibble)
df1 <- tibble(x = 1:2, y = list(NULL, tibble(z = 1)))
out <- unnest(df1, y)
#> Each column must either be a list of vectors or a list of data frames [y]

df2 <- tibble(x = 1:2, y = list(NULL, 1))
out <- unnest(df2, y)
#> Each column must either be a list of vectors or a list of data frames [y]

So I think it should be safe to change, even this late in the game. But I'll still need to re-run the revdeps which means I'll have to delay release until tomorrow; I think that's a worthwhile tradeoff.

@hadley hadley closed this as completed in ba1389c Sep 9, 2019
@paleolimbot
Copy link
Member Author

In re-fixing rclimateca, I noticed that the development version after this was fixed still has slightly different behaviour: zero-row data frames become rows of NAs instead of having their rows removed (but columns preserved).

library(tidyr)
library(tibble)

df <- tibble(x = "string", data = list(tibble(y = character(0))))

# CRAN version
unnest(df, data)
#> # A tibble: 0 x 2
#> # … with 2 variables: x <chr>, y <chr>

# dev version
unnest(df, data)
#> # A tibble: 1 x 2
#>   x      y    
#>   <chr>  <chr>
#> 1 string <NA>

Created on 2019-09-09 by the reprex package (v0.2.1)

I will update rclimateca not to depend on this, but I think preserving the columns of zero-row tibbles was a nice feature. Maybe keep_empty should still judge "emptiness" based on length() instead of vec_size()? This is just a guess based on the NULL handling from the reprex above.

@hadley
Copy link
Member

hadley commented Sep 9, 2019

keep_empty was original added for #358, so I think it probably does need to be kept for backward compatibility so I probably should stick with it (i.e. revert ba1389c)

So maybe I should do back and fix your specific original request.

@hadley hadley reopened this Sep 9, 2019
@hadley
Copy link
Member

hadley commented Sep 9, 2019

Another useful case to consider:

df <- tibble(
  x = 1:2, 
  data = list(
    tibble(y = character(0)),
    tibble(z = integer(0))
  )
)
unnest(df, data)

@hadley
Copy link
Member

hadley commented Sep 9, 2019

Some more thinking revealed an (obvious-in-hindsight) problem in unchop(). Hopefully this doesn't reveal another problem elsewhere 😬

@hadley hadley closed this as completed in 954abca Sep 9, 2019
@hadley
Copy link
Member

hadley commented Sep 9, 2019

So much thinking for a four letter change 😐

@hadley hadley mentioned this issue Sep 9, 2019
20 tasks
@paleolimbot
Copy link
Member Author

Sorry 😬! I can confirm this fixes rclimateca!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames
Projects
None yet
Development

No branches or pull requests

3 participants