Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter(var = var), unexpected behaviour #4443

Closed
michaelhogersosis opened this issue Jun 26, 2019 · 4 comments
Closed

filter(var = var), unexpected behaviour #4443

michaelhogersosis opened this issue Jun 26, 2019 · 4 comments

Comments

@michaelhogersosis
Copy link

michaelhogersosis commented Jun 26, 2019

Hi there,

Currently using dplyr 0.8.0.1 in combination with a postgres backend.

Suppose I run the following:
var <- '123'
tbl(dbCon, 'exampleTable') %>% dplyr::filter(var == var)
where the first var in filter is a database column and the second var is supposed to be the var I assigned in my R script. This then results in selecting the entire table as var = var is seen as true for every row.

I understand the logic behind the way it is executed, but perhaps a warning message would be nice, or preventing duplicate variables from being used at all within filter. As it stands, it may only produce erroneous behaviour, as one would never use filter to select an entire table.

Best regards,

Michael

@romatik
Copy link
Contributor

romatik commented Jun 27, 2019

First of all, you've probably meant var == var, not var = var.
Second of all, recent changes in dbplyr (second point here https://dbplyr.tidyverse.org/news/index.html#breaking-changes) mean that it is now up to you to control where things are evaluated. Here is an example of what I mean:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
iris2 <- dbplyr::tbl_memdb(iris) 
iris2 %>% dplyr::filter(Species == Species) %>% dplyr::tally()
#> # Source:   lazy query [?? x 1]
#> # Database: sqlite 3.22.0 [:memory:]
#>       n
#>   <int>
#> 1   150

Species <- "setosa"
iris2 %>% dplyr::filter(Species == !!Species) %>% dplyr::tally()
#> # Source:   lazy query [?? x 1]
#> # Database: sqlite 3.22.0 [:memory:]
#>       n
#>   <int>
#> 1    50

Created on 2019-06-27 by the reprex package (v0.3.0)

There are more details and examples here - https://community.rstudio.com/t/dbplyr-dplyr-filter-worked-fine-in-r-3-5-1-but-fails-in-r-3-6-0-using-in-in-database-query/31664

You are right that most of the time it's a mistake to have code that is var == var. However, I'm not sure it's always a mistake, so maybe it won't be fixed directly anytime soon.

@michaelhogersosis
Copy link
Author

Thank you for a possible solution. Changing the behavior indeed might have an unexpected effect on existing code..

I edited the post, it should show == as you mentioned.

@romainfrancois
Copy link
Member

duplicate to #3937

@lock
Copy link

lock bot commented Dec 26, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Dec 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants