Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better naming options for pack() #795

Closed
mgirlich opened this issue Oct 30, 2019 · 2 comments
Closed

Better naming options for pack() #795

mgirlich opened this issue Oct 30, 2019 · 2 comments
Labels
df-col 👜 feature
Milestone

Comments

@mgirlich
Copy link
Contributor

mgirlich commented Oct 30, 2019

I like the idea of pack and unpack, especially as they sometimes come quite natural when working with JSON data from APIs. In such a case I might want to unpack a tibble column, manipulate it and pack it afterwards again. E.g.

library(tidyr)
api_data <- tibble(
  Sepal = tibble(
    Length = c(5.1, 4.7),
    Width = c(3.5, 3.2)
  ),
  Petal = tibble(
    Length = c(1.4, 1.3),
    Width = c(0.2, 0.2)
  )
)

data_to_work_with <- api_data %>% 
  unpack(c(Sepal, Petal), names_sep = ".")

# some data manipulation

data_to_work_with %>% 
  pack(Sepal = starts_with("Sepal"))
#> # A tibble: 2 x 3
#>   Petal.Length Petal.Width Sepal$Sepal.Length $Sepal.Width
#>          <dbl>       <dbl>              <dbl>        <dbl>
#> 1          1.4         0.2                5.1          3.5
#> 2          1.3         0.2                4.7          3.2

Created on 2019-10-30 by the reprex package (v0.3.0)

Unfortunately, this repacking step isn't supported very nicely as I cannot change the names in the step of packing. I would imagine something along the following lines to be possible:

data_to_work_with %>% 
  pack(Sepal = c(Length = Sepal.Length, Width = Sepal.Width))

data_to_work_with %>% 
  pack(Sepal = c(Sepal.Length, Width = Width), .prefix = "Sepal.")

The first case is especially confusing as the names are simply ignored without a warning.

@hadley
Copy link
Member

hadley commented Nov 24, 2019

Slightly more straightforward reprex:

library(tidyr)

iris %>% 
  as_tibble() %>% 
  pack(Sepal = starts_with("Sepal"), Petal = starts_with("Petal"))
#> # A tibble: 150 x 3
#>    Species Sepal$Sepal.Length $Sepal.Width Petal$Petal.Length $Petal.Width
#>    <fct>                <dbl>        <dbl>              <dbl>        <dbl>
#>  1 setosa                 5.1          3.5                1.4          0.2
#>  2 setosa                 4.9          3                  1.4          0.2
#>  3 setosa                 4.7          3.2                1.3          0.2
#>  4 setosa                 4.6          3.1                1.5          0.2
#>  5 setosa                 5            3.6                1.4          0.2
#>  6 setosa                 5.4          3.9                1.7          0.4
#>  7 setosa                 4.6          3.4                1.4          0.3
#>  8 setosa                 5            3.4                1.5          0.2
#>  9 setosa                 4.4          2.9                1.4          0.2
#> 10 setosa                 4.9          3.1                1.5          0.1
#> # … with 140 more rows

Created on 2019-11-24 by the reprex package (v0.3.0)

Maybe this could be achieved by

iris %>% 
  as_tibble() %>% 
  pack(
    Sepal = starts_with("Sepal"), 
    Petal = starts_with("Petal"), 
    .names_sep = "."
)

Since you'd still want some way to strip the .. names_sep = NULL would remain the default, but names_sep would just remove the outer name from the inner name.

@hadley hadley added df-col 👜 feature labels Nov 24, 2019
@dah33
Copy link

dah33 commented Nov 25, 2019

One suggestion is a .names_prefix argument, similar to pivot_longer():

iris %>% 
  as_tibble() %>% 
  pack(
    Sepal = starts_with("Sepal"), 
    Petal = starts_with("Petal"), 
    .names_prefix = "(Sepal|Petal)\\."
  )

Although I note a couple of issues:

  • The use of regexp in the the prefix pattern could potentially be confusing, as "." is a common separator, but has a non-literal meaning in regexp.
  • The pack function could be more helpful, since it has already been given the variable name patterns start with Sepal and Petal.

Another approach would extend Hadley's:

iris %>% 
  as_tibble() %>% 
  pack(
    Sepal = starts_with("Sepal"), 
    Petal = starts_with("Petal"), 
    .names_sep = "."
)

To this:

iris %>% 
  as_tibble() %>% 
  pack(c(Sepal, Petal), names_sep = ".")

So the starts_with() are inferred. This would work as a natural complement of unpack():

iris %>% 
  as_tibble() %>% 
  pack(c(Sepal, Petal), names_sep = ".") %>%
  unpack(c(Sepal, Petal), names_sep = ".")

The pack() step above would give this result:

tibble(
  Species = iris$Species, 
  Sepal = tibble(Length = iris$Sepal.Length, Width = iris$Sepal.Width),
  Petal = tibble(Length = iris$Petal.Length, Width = iris$Petal.Width)
) # %>% unpack(c(Sepal, Petal), names_sep = ".")

@hadley hadley added this to the v1.1.0 milestone Nov 28, 2019
@hadley hadley closed this as completed in 557fd41 Nov 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
df-col 👜 feature
Projects
None yet
Development

No branches or pull requests

3 participants