Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot_wider() should drop values_from with zero row data frames #1249

Closed
DavisVaughan opened this issue Dec 2, 2021 · 2 comments · Fixed by #1255
Closed

pivot_wider() should drop values_from with zero row data frames #1249

DavisVaughan opened this issue Dec 2, 2021 · 2 comments · Fixed by #1255
Labels
bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes"

Comments

@DavisVaughan
Copy link
Member

DavisVaughan commented Dec 2, 2021

library(tidyr)

pivot_wider(fish_encounters[1,], names_from = station, values_from = seen)
#> # A tibble: 1 × 2
#>   fish  Release
#>   <fct>   <int>
#> 1 4842        1

# Doesn't drop `seen`, but dropped `station`
pivot_wider(fish_encounters[0,], names_from = station, values_from = seen)
#> # A tibble: 0 × 2
#> # … with 2 variables: fish <fct>, seen <int>

# Works if you explicitly state the `id_cols`
pivot_wider(fish_encounters[0,], id_cols = fish, names_from = station, values_from = seen)
#> # A tibble: 0 × 1
#> # … with 1 variable: fish <fct>

I think I am expecting just an empty fish column

@DavisVaughan DavisVaughan added bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes" labels Dec 2, 2021
@DavisVaughan
Copy link
Member Author

DavisVaughan commented Dec 3, 2021

This mainly has to do with the fact that we lose data when building the spec if the data frame has 0 rows. This in turn affects how the id_cols is automatically built

library(tidyr)

# - We know that `station` is not a key because of the column name
# - We know that `seen` is not a key because of the .value column
build_wider_spec(fish_encounters[1,], names_from = station, values_from = seen)
#> # A tibble: 1 × 3
#>   .name   .value station
#>   <chr>   <chr>  <fct>  
#> 1 Release seen   Release

# - We know that `station` is not a key because of the column name
# - We DON'T know that `seen` should not be considered a key because there is no data
build_wider_spec(fish_encounters[0,], names_from = station, values_from = seen)
#> # A tibble: 0 × 3
#> # … with 3 variables: .name <chr>, .value <chr>, station <fct>

The combination of the non-.name/.value column names and the unique values in .value are used to compute the id_cols if they are not explicitly supplied. But this isn't a foolproof way to build id_cols, since we can lose information when creating the spec (as seen above).

Maybe we need build_wider_id_cols(data, id_cols = NULL, names_from = name, values_from = value) that would essentially be a setdiff(names(data), c(names_from, values_from)) if id_cols isn't NULL.

Then we could make id_cols mandatory in pivot_wider_spec()? This would be a breaking change. But it looks like it would break very little? And it seems reasonable to say that if we aren't building the spec automatically, then you should also have to supply the id cols?

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Dec 3, 2021

  • Add internal build_wider_id_cols()
  • Use it in pivot_wider() so we can at least fix this there
  • Leave fallback behavior silently in pivot_wider_spec() if id_cols = NULL because we are a bit worried about breaking existing code for this small edge case. This is the best we can do here, and we will add some tests documenting that this is the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant