Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join fails to record "suffixed" vars #128

Closed
pnacht opened this issue Nov 21, 2019 · 3 comments
Closed

Join fails to record "suffixed" vars #128

pnacht opened this issue Nov 21, 2019 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@pnacht
Copy link

pnacht commented Nov 21, 2019

If I run the following trivial example, I get the expected output:

library(dplyr)
library(dtplyr)
library(data.table)

dt1 <- lazy_dt(data.table(a = 1:5, b = 6:10))
dt2 <- lazy_dt(data.table(a = letters[1:5], b = 6:10))

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  as.data.table()
>     b a.x a.y
> 1:  6   1   a
> 2:  7   2   b
> 3:  8   3   c
> 4:  9   4   d
> 5: 10   5   e

Note that the conflicting columns a are properly managed, using the standard dplyr format of adding .x and .y suffixes.

However, if I now try to drop one of the columns:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  select(
    -a.y
  ) %>%
  as.data.table()
> Error in is_character(x) : object 'a.y' not found

Interestingly, if I try to select one of the a columns (select(a.x)), I get the same error, but... if I instead try select(a) (selecting a column which shouldn't really exist anymore), I get the following output:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  select(
    a
  ) %>%
  as.data.table()
>    a.b
> 1:   1
> 2:   2
> 3:   3
> 4:   4
> 5:   5

where the selected column is clearly dt1$a, but for some reason the given column name is a.b. (if I try select(a.b), I get the same object not found error).

Meanwhile, if I try to drop a, both a columns are dropped:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  select(
    -a
  ) %>%
  as.data.table()
>     b
> 1:  6
> 2:  7
> 3:  8
> 4:  9
> 5: 10

So, how can I use select with joins where the tables have conflicting (and not joined-by) columns?

EDIT:

As mentioned in some answers, I can obviously execute the lazy evaluation before the select, which works. However, it throws a warning (since I'd like to keep my object as a data.table, not a data.frame) so it doesn't seem to be the intended method:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  as.data.table() %>%
  select(
    -a.x
  )
>     b a.y
> 1:  6   a
> 2:  7   b
> 3:  8   c
> 4:  9   d
> 5: 10   e
> Warning message:
> You are using a dplyr method on a raw data.table, which will call the data 
> frame implementation, and is likely to be inefficient.
> * 
> * To suppress this message, either generate a data.table translation with
> * `lazy_dt()` or convert to a data frame or tibble with
> * `as.data.frame()`/`as_tibble()`.
@hadley

This comment has been minimized.

@hadley hadley added the reprex needs a minimal reproducible example label Nov 22, 2019
@pnacht

This comment has been minimized.

@hadley
Copy link
Member

hadley commented Dec 24, 2019

Looks like the problem is that step_join(), because it doesn't update vars correctly.

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

dt1 <- lazy_dt(data.frame(a = 1:2, b = 6:7))
dt2 <- lazy_dt(data.frame(a = letters[1:2], b = 6:7))

dt <- dt1 %>% left_join(dt2, by = "b") 

dt %>% collect()
#> # A tibble: 2 x 3
#>       b   a.x a.y  
#>   <int> <int> <fct>
#> 1     6     1 a    
#> 2     7     2 b
dt %>% tbl_vars()
#> <dplyr:::vars>
#> [1] "a" "b"

Created on 2019-12-24 by the reprex package (v0.3.0)

@hadley hadley added bug an unexpected problem or unintended behavior and removed reprex needs a minimal reproducible example labels Dec 24, 2019
@hadley hadley changed the title Select after a join with conflicting columns gets confused with "suffixed" names Join fails to record "suffixed" vars Dec 24, 2019
@hadley hadley closed this as completed in ef13ce3 Dec 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants