Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column Naming in join is Duplicated when Due to First Suffix but not when Due to Second Suffix #3266

billdenney opened this issue Dec 29, 2017 · 4 comments


Copy link

@billdenney billdenney commented Dec 29, 2017

This is partly related to #3243.

When joining by specified column names, if names will duplicate due to the first suffix, then duplicated names are generated. If names will duplicated from the second suffix, names are specific. I would expect the latter behavior (the second pair of examples) and not the former (the first pair of examples).

library(dplyr, quietly = TRUE)
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union
d1 <- data.frame(A = 1, A.x = 2)
d2 <- data.frame(B = 3, A.x = 4, A = 5)
full_join(d1, d2, by = "A.x")
#>   A.x A.x  B A.y
#> 1   1   2 NA  NA
#> 2  NA   4  3   5
full_join(d2, d1, by = "A.x")
#>    B A.x A.x.x A.y
#> 1  3   4     5  NA
#> 2 NA   2    NA   1

d1 <- data.frame(A = 1, A.y = 2)
d2 <- data.frame(B = 3, A.y = 4, A = 5)
full_join(d1, d2, by = "A.y")
#>   A.x A.y  B A.y.y
#> 1   1   2 NA    NA
#> 2  NA   4  3     5
full_join(d2, d1, by = "A.y")
#>    B A.y A.x A.y.y
#> 1  3   4   5    NA
#> 2 NA   2  NA     1
krlmlr added a commit to krlmlr/dplyr that referenced this issue Dec 31, 2017
…arize-zero-columns', 'b-tidyverse#3266-join-clash' and 'b-tidyverse#3258-named' into r-0.7.5
@krlmlr krlmlr closed this in #3275 Jan 4, 2018
Copy link

@krlmlr krlmlr commented Mar 15, 2018

@billdenney: The bugfix causes a more severe problem, #3307, which leads to downstream failures for at least three packages. I might have to revert it and fix resolution of name clashes differently (#3425) in a major release. What's your use case?

Copy link
Contributor Author

@billdenney billdenney commented Mar 15, 2018

My use case was related to a dataset received from a client where they had many variants of the same column name (a long test with one column per response), and I had to merge slightly different variants of these where the same person responded.

It was a light use case where in my particular example, I just went about it another way (gather first then merge). I just found the issue then.

This bug was not a high priority for me, but I will probably experience it again in the future were the change reverted without an equivalent fix.

The short of it is: if you need to revert this for a while and replace it another way later, that's ok with me. (And, thanks for letting me know the reversion is coming.)

krlmlr added a commit that referenced this issue Mar 15, 2018
Closes #3307. Reopens #3266.

This reverts commit add0ccb.
@krlmlr krlmlr reopened this Mar 15, 2018
@krlmlr krlmlr added the bug label Mar 15, 2018
Copy link

@krlmlr krlmlr commented Mar 15, 2018

Will be resolved as part of #3425.

Copy link

@lock lock bot commented Sep 12, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue.

@lock lock bot locked and limited conversation to collaborators Sep 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants