Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot doesn't preserve attributes #1379

Closed
ilikegitlab opened this issue Jul 17, 2022 · 6 comments
Closed

pivot doesn't preserve attributes #1379

ilikegitlab opened this issue Jul 17, 2022 · 6 comments

Comments

@ilikegitlab
Copy link

It seems most (I tested filter, select, left_join, mutate, head) dplyr functions happily (and nicely) copy over attributes with metadata. For some reason, pivot functions do not:

new_tibble(tibble(a=3,b=4),metadata="test") %>% validate_tibble() %>% attr("metadata")
[1] "test"
but:
new_tibble(tibble(a=3,b=4),metadata="test") %>% validate_tibble() %>% pivot_longer(cols=c(a,b)) %>% attr("metadata")
NULL

This may be related to the discussions about extending tibbles, but I feel redefining the class adds lots of complexity for a simple user attribute (and I honestly couldn't make much sense of github.com/tidyverse/tibble/issues/275, and it being open means it is still not clarified I guess).

@hadley
Copy link
Member

hadley commented Oct 10, 2022

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

@hadley hadley added the reprex needs a minimal reproducible example label Oct 10, 2022
@ilikegitlab
Copy link
Author

library(tidyverse)

new_tibble(tibble(a=3, b=4), metadata="test") %>%
validate_tibble() %>% attr("metadata")
#> [1] "test"

new_tibble(tibble(a=3, b=4),metadata="test") %>%
validate_tibble() %>% pivot_longer(cols=c(a, b)) %>%
attr("metadata")
#> NULL

@hadley
Copy link
Member

hadley commented Oct 11, 2022

Somewhat more minimal reprex:

library(tidyr)

df <- tibble(a = 1)
attr(df, "metadata") <- "test"

df |>
  pivot_longer(a) |>
  attr("metadata")
#> NULL

Created on 2022-10-11 with reprex v2.0.2

@hadley hadley removed the reprex needs a minimal reproducible example label Oct 11, 2022
@hadley
Copy link
Member

hadley commented Oct 18, 2022

It's not clear to me that pivot_ functions should preserve attributes — I think it's reasonable to argue that they create new data frames in a way similar to dplyr::summarise(), rather than modifying an existing data frame like dplyr::mutate().

@hadley hadley closed this as completed Oct 18, 2022
@ilikegitlab
Copy link
Author

Why not label this as a design decision (attr are not supported with dplyr)? At least that would be clear.

I don't think it's reasonable that in a pipeline I'm loosing metadata depending on what functions i'm using. I cannot really follow your argument as here I'm just reshaping the same numbers after all (but then, I would even make a case for summarize). But when I join data together, suddenly attr of one of them are there? Maybe it makes sense to you.

@helge-baumann
Copy link

helge-baumann commented Oct 26, 2022

I'm not so sure if it adresses the same problem, but for me pivot_longer() does an amazing job at preserving attributes of variables. But I think preserving the attributes depends on supplying a class attribute when several columns are combined to one.

I'm afraid the following minimal example is not fully minimalistic, but I'll do my best:

library(tidyr)

wide <- 
  structure(
  list(
    x_1 = structure(c(1), label = "X"),
    y_1 = structure(c(2), label = "Y", labels = c(A = 1), class = c("haven_labelled", "vctrs_vctr", "double")),
    y_2 = structure(c(3), label = "Y", labels = c(A = 1), class = c("haven_labelled", "vctrs_vctr", "double")),
    z_1 = structure(c(2), label = "Z"),
    z_3 = structure(c(3), label = "Z")
  ), 
  row.names = c(NA, -1L), 
  class = c("tbl_df", "tbl", "data.frame"
  ))

long <- 
  wide %>% 
  pivot_longer(everything(), names_sep = "_", names_to=c('.value', 'wave'))

attributes(long$x)
#> $label
#> [1] "X"
attributes(long$y)
#> $label
#> [1] "Y"
#> 
#> $labels
#> A 
#> 1 
#> 
#> $class
#> [1] "haven_labelled" "vctrs_vctr"     "double"
attributes(long$z)
#> NULL

attributes of "x" are preserved, because there is only one occurrence.

Attributes of "y" are preserved - both variables are merged into one, with class provided.

Attributes of "z" are lost - both variables merged into one, without class provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants