Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering with external vector and max gives lazy_select_query error #1048

Closed
chwpearse opened this issue Nov 18, 2022 · 2 comments · Fixed by #1062
Closed

Filtering with external vector and max gives lazy_select_query error #1048

chwpearse opened this issue Nov 18, 2022 · 2 comments · Fixed by #1062
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL
Milestone

Comments

@chwpearse
Copy link

Filtering a dbplyr table with col %in% ext_vector & col == max(col) causes an error. This works in version 2.1.1, but not 2.2.0

Either of these statements by themselves, or split into seperate filter functions run correctly.

library(tidyverse)
library(dbplyr)
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

lf <- lazy_frame(x = 1L)

to_filter <- c('a', 'b')

lf %>% filter(x == max(x, na.rm = T))
#> <SQL>
#> SELECT `x`
#> FROM (
#>   SELECT *, MAX(`x`) OVER () AS `q01`
#>   FROM `df`
#> ) `q01`
#> WHERE (`x` = `q01`)
lf %>% filter(x %in% to_filter)
#> <SQL>
#> SELECT *
#> FROM `df`
#> WHERE (`x` IN ('a', 'b'))
lf %>% filter(x == max(x, na.rm = T)) %>% filter(x %in% to_filter)
#> <SQL>
#> SELECT *
#> FROM (
#>   SELECT `x`
#>   FROM (
#>     SELECT *, MAX(`x`) OVER () AS `q01`
#>     FROM `df`
#>   ) `q01`
#>   WHERE (`x` = `q01`)
#> ) `q02`
#> WHERE (`x` IN ('a', 'b'))

lf %>% filter(x == max(x, na.rm = T), x %in% to_filter)
#> Error in lazy_select_query(x = mutated$lazy_query, last_op = "filter", : is_lazy_sql_part(where) is not TRUE

Created on 2022-11-18 with reprex v2.0.2

@hadley
Copy link
Member

hadley commented Dec 5, 2022

Minimal reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

lf <- lazy_frame(x = 1L)
to_filter <- c('a', 'b')

lf %>% filter(x == max(x, na.rm = TRUE), x %in% to_filter)
#> Error in lazy_select_query(x = mutated$lazy_query, last_op = "filter", : is_lazy_sql_part(where) is not TRUE

Created on 2022-12-04 with reprex v2.0.2

@mgirlich The problem is that is_lazy_sql_part() is returning false because the second query element (x %in% to_filter) is not an expression because by that point the value of to_filter has been inlined into the call. If I replace is_expression() with is_call() this query works and none of the test fails, so maybe you just misunderstood what the (confusingly named) is_expression() does?

@hadley hadley added bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL labels Dec 5, 2022
@hadley hadley added this to the 2.3.0 milestone Dec 5, 2022
@hadley hadley changed the title Filtering with external vector and max gives lazy_select_query error since version 2.2.0 Filtering with external vector and max gives lazy_select_query error Dec 5, 2022
@mgirlich
Copy link
Collaborator

mgirlich commented Dec 5, 2022

I already had a PR prepared for this (though not on Github) and considered using is_symbolic() instead of the incorrect is_symbol() | is_expression(). But maybe is_symbol() | is_call() is easier to understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants