Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems writing cross-join in dplyr #4206

JohnMount opened this issue Feb 20, 2019 · 2 comments · Fixed by #4741

Problems writing cross-join in dplyr #4206

JohnMount opened this issue Feb 20, 2019 · 2 comments · Fixed by #4741
feature tables 🧮


Copy link

JohnMount commented Feb 20, 2019

Sometimes one wants a cross-join, but the obvious way to write that in dplyr does not work (though it does work for dbplyr).

#> Warning: package 'dplyr' was built under R version 3.5.2
#> [1] ''
#> [1] '1.3.0'

d <- data.frame(g = c("a", "b"),
                stringsAsFactors = FALSE)

left_join(d, d, by = character(0))
#> Error: `by` must specify variables to join by

db <- DBI::dbConnect(RSQLite::SQLite(),
d2 <- dplyr::copy_to(db, d, "d2")

left_join(d2, d2, by = character(0))
#> # Source:   lazy query [?? x 2]
#> # Database: sqlite 3.22.0 [:memory:]
#>   g.x   g.y  
#>   <chr> <chr>
#> 1 a     a    
#> 2 a     b    
#> 3 b     a    
#> 4 b     b


Created on 2019-02-20 by the reprex package (v0.2.1)

@romainfrancois romainfrancois added tables 🧮 feature labels Mar 4, 2019
Copy link

romainfrancois commented Mar 4, 2019

joins are not part of our immediate focus for the next series. This is probably something that can be incubated in another package for the time being.

Copy link

hadley commented Jan 11, 2020

In the join refactoring I'm working on currently, this turns out to be a trivial fix so it's likely to make it in 1.0.0

hadley added a commit that referenced this issue Jan 12, 2020
hadley added a commit that referenced this issue Jan 13, 2020
Now based on a common approach to modifying x rather than creating a new data frame. New `join_rows()` and `join_cols()` provide common toolset for generate row and col indices. New `filter_join()` and `mutate_join()` reduce duplication in joining code.

Tests have been completely rewritten from scratch to focus on the concepts of preserving x (type, row order, and column order) that have become more clear.

The test for both empty suffixes has been removed. This mildly weakens the guarantees offered by the join functions, but makes `nest_join()` work as it used to. I don't think adding extra logic to make it error in regular joins is worth it.

* Fixes #4206: refactoring revealed a trivial implementation (mostly just removing an exiting error message)
* Fixes #4225: the data frame method now does all the work
* Fixes #4589: refactoring revealed that `nest_join(keep = T)` could share code with `full_join(keep = T)`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
feature tables 🧮
None yet

Successfully merging a pull request may close this issue.

3 participants