Skip to content

Join fails to record "suffixed" vars #128

@pnacht

Description

@pnacht

If I run the following trivial example, I get the expected output:

library(dplyr)
library(dtplyr)
library(data.table)

dt1 <- lazy_dt(data.table(a = 1:5, b = 6:10))
dt2 <- lazy_dt(data.table(a = letters[1:5], b = 6:10))

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  as.data.table()
>     b a.x a.y
> 1:  6   1   a
> 2:  7   2   b
> 3:  8   3   c
> 4:  9   4   d
> 5: 10   5   e

Note that the conflicting columns a are properly managed, using the standard dplyr format of adding .x and .y suffixes.

However, if I now try to drop one of the columns:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  select(
    -a.y
  ) %>%
  as.data.table()
> Error in is_character(x) : object 'a.y' not found
Details

Interestingly, if I try to select one of the a columns (select(a.x)), I get the same error, but... if I instead try select(a) (selecting a column which shouldn't really exist anymore), I get the following output:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  select(
    a
  ) %>%
  as.data.table()
>    a.b
> 1:   1
> 2:   2
> 3:   3
> 4:   4
> 5:   5

where the selected column is clearly dt1$a, but for some reason the given column name is a.b. (if I try select(a.b), I get the same object not found error).

Meanwhile, if I try to drop a, both a columns are dropped:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  select(
    -a
  ) %>%
  as.data.table()
>     b
> 1:  6
> 2:  7
> 3:  8
> 4:  9
> 5: 10

So, how can I use select with joins where the tables have conflicting (and not joined-by) columns?

EDIT:

As mentioned in some answers, I can obviously execute the lazy evaluation before the select, which works. However, it throws a warning (since I'd like to keep my object as a data.table, not a data.frame) so it doesn't seem to be the intended method:

dt1 %>%
  left_join(
    dt2,
    by = "b"
  ) %>%
  as.data.table() %>%
  select(
    -a.x
  )
>     b a.y
> 1:  6   a
> 2:  7   b
> 3:  8   c
> 4:  9   d
> 5: 10   e
> Warning message:
> You are using a dplyr method on a raw data.table, which will call the data 
> frame implementation, and is likely to be inefficient.
> * 
> * To suppress this message, either generate a data.table translation with
> * `lazy_dt()` or convert to a data frame or tibble with
> * `as.data.frame()`/`as_tibble()`.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions