Skip to content

Pivot with duplicate column names #472

@GillesSanMartin

Description

@GillesSanMartin

I'm more and more often confronted with messy spreadsheets with duplicate column names that need to be tidied up as for example in this SO question.

tidyr::gather simply refuse to do it (error message) and reshape::melt pr reshape2::melt return the wrong numbers without any warning. The data.table version of melt works as intended.

Details Here is a minimal reprex :
# Reprex
d <- data.frame(Group = c("A", "B"), 
                rbind(c(0, 0, 5, 5), 
                      c(0, 0, 10, 10)))
colnames(d) <- c("Group", "Var1", "Var2", "Var1", "Var2")

# Dataframe with duplicate column names --> quite frequent situation in messy spreadsheets...
d
#>   Group Var1 Var2 Var1 Var2
#> 1     A    0    0    5    5
#> 2     B    0    0   10   10

# With tidyr we have an error message : definitively better than to have the 
# wrong numbers...
tidyr::gather(d,,,-1)
#> Error: Can't bind data because some arguments have the same name

# with reshape and reshape2 : wrong results (0 everywhere, the 5 and 10 values have disapeared)
reshape::melt(d, id.vars = 1)
#>   Group variable value
#> 1     A     Var1     0
#> 2     B     Var1     0
#> 3     A     Var2     0
#> 4     B     Var2     0
#> 5     A     Var1     0
#> 6     B     Var1     0
#> 7     A     Var2     0
#> 8     B     Var2     0
reshape2::melt(d, id.vars = 1)
#>   Group variable value
#> 1     A     Var1     0
#> 2     B     Var1     0
#> 3     A     Var2     0
#> 4     B     Var2     0

# data.table::melt fails similarly when we work on a data.frame 
# but provides exactly the intended result if we work on a data.table
data.table::melt(d, id.vars = 1)
#>   Group variable value
#> 1     A     Var1     0
#> 2     B     Var1     0
#> 3     A     Var2     0
#> 4     B     Var2     0
data.table::melt(data.table::as.data.table(d), id.vars = 1)
#>    Group variable value
#> 1:     A     Var1     0
#> 2:     B     Var1     0
#> 3:     A     Var2     0
#> 4:     B     Var2     0
#> 5:     A     Var1     5
#> 6:     B     Var1    10
#> 7:     A     Var2     5
#> 8:     B     Var2    10

# base::stack provides the right values but good luck for the other columns ...
stack(d[,-1])
#>   values    ind
#> 1      0   Var1
#> 2      0   Var1
#> 3      0   Var2
#> 4      0   Var2
#> 5      5 Var1.1
#> 6     10 Var1.1
#> 7      5 Var2.1
#> 8     10 Var2.1

Created on 2018-06-25 by the reprex package (v0.2.0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementpivoting ♻️pivot rectangular data to different "shapes"

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions