Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated joining can cause duplicated column names #1460

Closed
huftis opened this issue Oct 19, 2015 · 5 comments
Closed

Repeated joining can cause duplicated column names #1460

huftis opened this issue Oct 19, 2015 · 5 comments
Assignees
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@huftis
Copy link

huftis commented Oct 19, 2015

Join operations add suffixes to avoid duplicated column names for (non-joining) columns. However, for repeated joining of similar data frames, duplicated column names can still occur. Example:

d1 = data.frame(id=1:5, foo=rnorm(5))
d2 = data.frame(id=1:5, foo=rnorm(5))
d3 = data.frame(id=1:5, foo=rnorm(5))

d = d1 %>% left_join(d1, by="id") %>%
  left_join(d2, by="id") %>% 
  left_join(d3, by="id")

This results in duplicated column names

 id foo.x foo.y foo.x foo.y
1  1  0.51  0.51  1.70 -1.99
2  2  0.48  0.48  1.78 -0.40
3  3 -0.83 -0.83  0.93  0.67
4  4 -2.36 -2.36 -0.02  2.11
5  5  0.90  0.90 -0.77 -1.31

which results in error when applying subsequent operations:

> d %>% select(id)
Error: found duplicated column name: foo.x, foo.y

Session info:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 
[2] LC_CTYPE=Norwegian-Nynorsk_Norway.1252   
[3] LC_MONETARY=Norwegian-Nynorsk_Norway.1252
[4] LC_NUMERIC=C                             
[5] LC_TIME=Norwegian-Nynorsk_Norway.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.3

loaded via a namespace (and not attached):
[1] magrittr_1.5   R6_2.1.1       assertthat_0.1 parallel_3.2.1 tools_3.2.1   
[6] DBI_0.3.1      Rcpp_0.12.1   
@romainfrancois
Copy link
Member

Not sure there's much we can really do about that. And it is easy to slide in a few rename calls to get control

@hadley
Copy link
Member

hadley commented Oct 21, 2015

@romainfrancois hmmm - I think the suffixes should be growing - i.e. you should get foo.x.x etc

@hadley hadley added bug an unexpected problem or unintended behavior data frame labels Oct 21, 2015
@hadley hadley added this to the 0.6 milestone Oct 21, 2015
@romainfrancois romainfrancois self-assigned this Oct 28, 2015
@romainfrancois
Copy link
Member

Made an attempt at this in branch join_names, I get:

> d1 %>% left_join(d1, by="id")
  id      foo.x      foo.y
1  1 -0.8497158 -0.8497158
2  2  0.5466572  0.5466572
3  3 -0.2568382 -0.2568382
4  4 -0.2244100 -0.2244100
5  5  1.6997032  1.6997032
> d1 %>% left_join(d1, by="id") %>% left_join(d2, by = "id" )
  id      foo.x      foo.y        foo
1  1 -0.8497158 -0.8497158 -0.4369988
2  2  0.5466572  0.5466572 -0.3358606
3  3 -0.2568382 -0.2568382 -0.8941454
4  4 -0.2244100 -0.2244100  0.9723835
5  5  1.6997032  1.6997032  2.6066989
> d1 %>% left_join(d1, by="id") %>% left_join(d2, by = "id" ) %>% left_join(d3, by = "id" )
  id      foo.x      foo.y    foo.x.x    foo.y.y
1  1 -0.8497158 -0.8497158 -0.4369988 -0.2919877
2  2  0.5466572  0.5466572 -0.3358606  1.5299791
3  3 -0.2568382 -0.2568382 -0.8941454  0.7805142
4  4 -0.2244100 -0.2244100  0.9723835  1.9929466
5  5  1.6997032  1.6997032  2.6066989 -0.1751198

@hadley is this what you have in mind ?

@hadley
Copy link
Member

hadley commented Oct 28, 2015

Yes, exactly.

@romainfrancois
Copy link
Member

Great, I'll merge after adding some tests then.

@hadley hadley modified the milestones: future, 0.5 May 26, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants