Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spread overwrites existing column with identical key value #478

Closed
SportsTribution opened this issue Jul 23, 2018 · 3 comments

Comments

@SportsTribution
Copy link

commented Jul 23, 2018

I was trying to do some data analysis on database table column pairs

DF <- data.frame(TABLE=c('TA','TB'),COLUMN=c('ID','TABLE'),value=c(1,1)) spread(DF,COLUMN,value)

gives

ID TABLE
1 1 NA
2 NA 1

same for tibble.

Should be in my opinion an ERROR or at least a WARNING.

(Hope I didn't make any mistakes)

@markdly

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2018

@SportsTribution, a warning in this situation sounds reasonable to me as another tidyr user - I didn't know this was possible but will be on the lookout for it now! 😄

I imagine the maintainers / owners will probably want an example to be included as a reprex so I've added one below based on your example to try and help keep the issue moving along.

Issue:
If spread generates a new column with the same name as an existing column, the existing column is 'overwritten' without warning:

library(tidyverse)
df <- tibble(
  a = c(7, 7),
  key = c("a", "b"),
  value = c(1, 2)
) 
df %>% spread(key, value)
#> # A tibble: 1 x 2
#>       a     b
#>   <dbl> <dbl>
#> 1     1     2

Also, the number of rows after spreading appear to be based on original column values that are overwritten

df %>% 
  mutate(a = c(8, 7)) %>% 
  spread(key, value)
#> # A tibble: 2 x 2
#>       a     b
#>   <dbl> <dbl>
#> 1    NA     2
#> 2     1    NA
@hadley

This comment was marked as resolved.

Copy link
Member

commented Jan 4, 2019

@markdly thanks for generating a reprex here, it's much appreciated!

@hadley

This comment has been minimized.

Copy link
Member

commented Mar 3, 2019

This is resolved by pivot() which uses the new tidyverse standard rules for duplicated function names:

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

df <- tibble(
  a = c(7, 7),
  key = c("a", "b"),
  value = c(1, 2)
) 
spec <- pivot_spec_wide(df, key, value)
spec
#> # A tibble: 2 x 3
#>   col_name measure key  
#>   <chr>    <chr>   <chr>
#> 1 a        value   a    
#> 2 b        value   b

df %>% pivot(spec)
#> New names:
#> a -> a..1
#> a -> a..2
#> # A tibble: 1 x 3
#>    a..1  a..2     b
#>   <dbl> <dbl> <dbl>
#> 1     7     1     2
  
df %>% 
  mutate(a = c(8, 7)) %>% 
  pivot(spec)
#> New names:
#> a -> a..1
#> a -> a..2
#> # A tibble: 2 x 3
#>    a..1  a..2     b
#>   <dbl> <dbl> <dbl>
#> 1     8     1    NA
#> 2     7    NA     2

Created on 2019-03-03 by the reprex package (v0.2.1.9000)

Need to turn this into a test to make sure the problem does not re-occur if the code is refactored in the future.

@hadley hadley added this to the v1.0.0 milestone Mar 3, 2019
@hadley hadley closed this in 29cab54 Mar 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.