Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spread overwrites existing column with identical key value #478

Closed
SportsTribution opened this issue Jul 23, 2018 · 3 comments
Closed

spread overwrites existing column with identical key value #478

SportsTribution opened this issue Jul 23, 2018 · 3 comments
Labels
bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes"
Milestone

Comments

@SportsTribution
Copy link

SportsTribution commented Jul 23, 2018

I was trying to do some data analysis on database table column pairs

DF <- data.frame(TABLE=c('TA','TB'),COLUMN=c('ID','TABLE'),value=c(1,1)) spread(DF,COLUMN,value)

gives

ID TABLE
1 1 NA
2 NA 1

same for tibble.

Should be in my opinion an ERROR or at least a WARNING.

(Hope I didn't make any mistakes)

@markdly
Copy link
Contributor

markdly commented Jul 23, 2018

@SportsTribution, a warning in this situation sounds reasonable to me as another tidyr user - I didn't know this was possible but will be on the lookout for it now! 😄

I imagine the maintainers / owners will probably want an example to be included as a reprex so I've added one below based on your example to try and help keep the issue moving along.

Issue:
If spread generates a new column with the same name as an existing column, the existing column is 'overwritten' without warning:

library(tidyverse)
df <- tibble(
  a = c(7, 7),
  key = c("a", "b"),
  value = c(1, 2)
) 
df %>% spread(key, value)
#> # A tibble: 1 x 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2

Also, the number of rows after spreading appear to be based on original column values that are overwritten

df %>% 
  mutate(a = c(8, 7)) %>% 
  spread(key, value)
#> # A tibble: 2 x 2
#>       a     b
#>   <dbl> <dbl>
#> 1    NA     2
#> 2     1    NA

@hadley

This comment has been minimized.

@hadley hadley added bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes" labels Jan 4, 2019
@hadley
Copy link
Member

hadley commented Mar 3, 2019

This is resolved by pivot() which uses the new tidyverse standard rules for duplicated function names:

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

df <- tibble(
  a = c(7, 7),
  key = c("a", "b"),
  value = c(1, 2)
) 
spec <- pivot_spec_wide(df, key, value)
spec
#> # A tibble: 2 x 3
#>   col_name measure key  
#>   <chr>    <chr>   <chr>
#> 1 a        value   a    
#> 2 b        value   b

df %>% pivot(spec)
#> New names:
#> a -> a..1
#> a -> a..2
#> # A tibble: 1 x 3
#>    a..1  a..2     b
#>   <dbl> <dbl> <dbl>
#> 1     7     1     2
  
df %>% 
  mutate(a = c(8, 7)) %>% 
  pivot(spec)
#> New names:
#> a -> a..1
#> a -> a..2
#> # A tibble: 2 x 3
#>    a..1  a..2     b
#>   <dbl> <dbl> <dbl>
#> 1     8     1    NA
#> 2     7    NA     2

Created on 2019-03-03 by the reprex package (v0.2.1.9000)

Need to turn this into a test to make sure the problem does not re-occur if the code is refactored in the future.

@hadley hadley added this to the v1.0.0 milestone Mar 3, 2019
@hadley hadley closed this as completed in 29cab54 Mar 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes"
Projects
None yet
Development

No branches or pull requests

3 participants