Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect group-by columns when joining grouped data frames with overlapping columns #2330

Closed
davidkretch opened this issue Dec 17, 2016 · 0 comments

Comments

@davidkretch
Copy link
Contributor

join updates overlapping column names, but does not update corresponding group column names in attribute vars. This becomes an issue when a group column is not used in the join. The resulting data frame causes errors in mutate.

library(dplyr)


df1 <- data.frame(x = 1:10, y = 1:10)
df2 <- df1

df1g <- df1 %>% group_by(x, y)

df3 <- inner_join(df1g, df2, by = "x")

df3 %>%
  mutate(a = 1)

Output:

Error in mutate_impl(.data, dots) : unknown column 'y.x' 

I have a fix and will submit a pull request.

krlmlr pushed a commit that referenced this issue Jan 26, 2017
…2334)

* Fix subset_join to update group column names in attribute
vars when they are duplicate column names.

* Add tests for appropriate group columns after join.

* Add test for group indices on expanding join with grouped
data frame.

* Fix build_index_cpp to report correct missing group
column name. Currently when a group column name does not
exist in the data frame, it reports a name from the
names vector (all columns) instead of the vars vector
(group columns).

* Add test for error message on non-existent group columns.
@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant