Skip to content

pivot_longer: when .value needs to be described by two distinct parenthesis? #619

@MatthieuStigler

Description

@MatthieuStigler

Let us say my column name is x_1_mean, x_1_sd. I want to make it longer by pivoting the 1 in between, i.e. I want a resulting data with x_mean, x_sd

The .value feature is great, but is assumes that the corresponding regex for .value is defined by a single parenthesis "()". I.e., if columns were x_mean_1 and x_sd_1 one could just use

names_to = c(".value","time"),
names_pattern = "(.+)_([0-9]$)"

But now if my first .value pattern needs to be described by two distinct parenthesis, this will create a problem! What is the suggested solution? Obviously, I could make all longer, then selective wider, or use the *_spec way, but maybe could have a neater direct solution?

Possible solutions:

  • allow for a .repeated value, indicating that this pattern refers also to .value?
  • allows for repeated .value? Currently having repeated .value just transforms the seond .value into a V-named column. This is quite confusing for now?

Thanks!

library(tidyverse)

pnl <- tibble(
  x_1_mean = 1:4,
  x_2_mean = c(1, 1,0, 0),
  x_1_sd = c(0, 1, 1, 1),
  x_2_sd = rnorm(4),
  y_1_mean = 1:4,
  y_2_mean = c(1, 1,0, 0),
  y_1_sd = c(0, 1, 1, 1),
  y_2_sd = rnorm(4),
  unit = 1:4,
)

## current behaviour: .value gets transformed into variable
pnl %>% pivot_longer(
  cols = -unit, 
  names_to = c(".value","time", ".value"),
  names_pattern = "(x|y)_([0-9])_(mean|sd)")
#> # A tibble: 16 x 5
#>     unit time  V3          x      y
#>    <int> <chr> <chr>   <dbl>  <dbl>
#>  1     1 1     mean   1       1    
#>  2     1 2     mean   1       1    
#>  3     1 1     sd     0       0    
#>  4     1 2     sd    -0.579   0.989
#>  5     2 1     mean   2       2    
#>  6     2 2     mean   1       1    
#>  7     2 1     sd     1       1    
#>  8     2 2     sd    -0.389  -1.06 
#>  9     3 1     mean   3       3    
#> 10     3 2     mean   0       0    
#> 11     3 1     sd     1       1    
#> 12     3 2     sd     0.0427 -0.725
#> 13     4 1     mean   4       4    
#> 14     4 2     mean   0       0    
#> 15     4 1     sd     1       1    
#> 16     4 2     sd    -0.163   0.249


## end result, obtained with pivot_wider
pnl %>% pivot_longer(
  cols = -unit, 
  names_to = c(".value","time", ".value"),
  names_pattern = "(x|y)_([0-9])_(mean|sd)") %>% 
  pivot_wider(names_from = V3,
              values_from = c(x, y))
#> # A tibble: 8 x 6
#>    unit time  x_mean    x_sd y_mean   y_sd
#>   <int> <chr>  <dbl>   <dbl>  <dbl>  <dbl>
#> 1     1 1          1  0           1  0    
#> 2     1 2          1 -0.579       1  0.989
#> 3     2 1          2  1           2  1    
#> 4     2 2          1 -0.389       1 -1.06 
#> 5     3 1          3  1           3  1    
#> 6     3 2          0  0.0427      0 -0.725
#> 7     4 1          4  1           4  1    
#> 8     4 2          0 -0.163       0  0.249

Created on 2019-04-26 by the reprex package (v0.2.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementpivoting ♻️pivot rectangular data to different "shapes"

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions