Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated joining can cause duplicated column names #1460

Closed
huftis opened this issue Oct 19, 2015 · 5 comments
Closed

Repeated joining can cause duplicated column names #1460

huftis opened this issue Oct 19, 2015 · 5 comments
Assignees
Labels
Milestone

Comments

@huftis
Copy link

@huftis huftis commented Oct 19, 2015

Join operations add suffixes to avoid duplicated column names for (non-joining) columns. However, for repeated joining of similar data frames, duplicated column names can still occur. Example:

d1 = data.frame(id=1:5, foo=rnorm(5))
d2 = data.frame(id=1:5, foo=rnorm(5))
d3 = data.frame(id=1:5, foo=rnorm(5))

d = d1 %>% left_join(d1, by="id") %>%
  left_join(d2, by="id") %>% 
  left_join(d3, by="id")

This results in duplicated column names

 id foo.x foo.y foo.x foo.y
1  1  0.51  0.51  1.70 -1.99
2  2  0.48  0.48  1.78 -0.40
3  3 -0.83 -0.83  0.93  0.67
4  4 -2.36 -2.36 -0.02  2.11
5  5  0.90  0.90 -0.77 -1.31

which results in error when applying subsequent operations:

> d %>% select(id)
Error: found duplicated column name: foo.x, foo.y

Session info:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 
[2] LC_CTYPE=Norwegian-Nynorsk_Norway.1252   
[3] LC_MONETARY=Norwegian-Nynorsk_Norway.1252
[4] LC_NUMERIC=C                             
[5] LC_TIME=Norwegian-Nynorsk_Norway.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.3

loaded via a namespace (and not attached):
[1] magrittr_1.5   R6_2.1.1       assertthat_0.1 parallel_3.2.1 tools_3.2.1   
[6] DBI_0.3.1      Rcpp_0.12.1   
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Oct 20, 2015

Not sure there's much we can really do about that. And it is easy to slide in a few rename calls to get control

@hadley
Copy link
Member

@hadley hadley commented Oct 21, 2015

@romainfrancois hmmm - I think the suffixes should be growing - i.e. you should get foo.x.x etc

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Oct 28, 2015

Made an attempt at this in branch join_names, I get:

> d1 %>% left_join(d1, by="id")
  id      foo.x      foo.y
1  1 -0.8497158 -0.8497158
2  2  0.5466572  0.5466572
3  3 -0.2568382 -0.2568382
4  4 -0.2244100 -0.2244100
5  5  1.6997032  1.6997032
> d1 %>% left_join(d1, by="id") %>% left_join(d2, by = "id" )
  id      foo.x      foo.y        foo
1  1 -0.8497158 -0.8497158 -0.4369988
2  2  0.5466572  0.5466572 -0.3358606
3  3 -0.2568382 -0.2568382 -0.8941454
4  4 -0.2244100 -0.2244100  0.9723835
5  5  1.6997032  1.6997032  2.6066989
> d1 %>% left_join(d1, by="id") %>% left_join(d2, by = "id" ) %>% left_join(d3, by = "id" )
  id      foo.x      foo.y    foo.x.x    foo.y.y
1  1 -0.8497158 -0.8497158 -0.4369988 -0.2919877
2  2  0.5466572  0.5466572 -0.3358606  1.5299791
3  3 -0.2568382 -0.2568382 -0.8941454  0.7805142
4  4 -0.2244100 -0.2244100  0.9723835  1.9929466
5  5  1.6997032  1.6997032  2.6066989 -0.1751198

@hadley is this what you have in mind ?

@hadley
Copy link
Member

@hadley hadley commented Oct 28, 2015

Yes, exactly.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Oct 28, 2015

Great, I'll merge after adding some tests then.

@hadley hadley removed this from the future milestone May 26, 2016
@hadley hadley added this to the 0.5 milestone May 26, 2016
@hadley hadley added this to the 0.5 milestone May 26, 2016
@hadley hadley removed this from the future milestone May 26, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants