Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R expression to SQL expression translation bug #634

Closed
nassuphis opened this issue Apr 11, 2021 · 7 comments · Fixed by #674
Closed

R expression to SQL expression translation bug #634

nassuphis opened this issue Apr 11, 2021 · 7 comments · Fixed by #674
Labels
bug an unexpected problem or unintended behavior func trans 🌍 Translation of individual functions to SQL

Comments

@nassuphis
Copy link

the expression ((1/(1 + abs(x))) - 1) * sign(x) does not seem to translate accurately into SQL under
certain conditions

in the below example, the expressions bug1 and bug2 are the same, the equality test returns TRUE.
expression bug1 produces different results in R and SQL
expression bug2 produces the same results in R and SQL

require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
require(dbplyr)
#> Loading required package: dbplyr
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

bug_expr<-function(v)
{
  x <- enexpr(v)
  k  <- expr((1 / ( 1 + abs(!!x)))-1)
  expr(!!k*sign(!!x))
}

bug1<- bug_expr(x)
bug2<- expr(((1/(1 + abs(x))) - 1) * sign(x))

bug1==bug2
#> [1] TRUE

res <-
  memdb_frame(x=0:-3) %>%
  mutate(v_db=!!bug_expr(x)) %>%
  mutate(v1_db=!!bug1) %>%
  mutate(v2_db=!!bug2) %>%
  collect() %>% 
  mutate(v_r=!!bug_expr(x)) %>%
  mutate(v1_r=!!bug1) %>%
  mutate(v2_r=!!bug2)
  
all(res$v_db==res$v_r)
#> [1] FALSE
all(res$v1_db==res$v1_r)
#> [1] FALSE
all(res$v2_db==res$v2_r)
#> [1] TRUE
@nassuphis
Copy link
Author

we can see that expressions bug1 and bug2 are identical as far as R is concerned, but they
produce different SQL:

> bug1==bug2
[1] TRUE
> 
> bug1
((1/(1 + abs(x))) - 1) * sign(x)
> bug2
((1/(1 + abs(x))) - 1) * sign(x)
> 
> memdb_frame(x=0:-3) %>%
+   mutate(v1_db=!!bug1) %>%
+   show_query()
<SQL>
SELECT `x`, (1.0 / (1.0 + ABS(`x`))) - 1.0 * SIGN(`x`) AS `v1_db`
FROM `dbplyr_026`
> 
> memdb_frame(x=0:-3) %>%
+   mutate(v2_db=!!bug2) %>%
+   show_query()
<SQL>
SELECT `x`, ((1.0 / (1.0 + ABS(`x`))) - 1.0) * SIGN(`x`) AS `v2_db`
FROM `dbplyr_027`
> 

@nassuphis
Copy link
Author

the difference is how they are generated.
one is a constant, the other is made from combining expressions in the script
adding some extra parentheses makes the problem go away

@nassuphis
Copy link
Author

it seems that the SQL translation is correct while the R version inserts a parenthesis for some reason

@nassuphis
Copy link
Author

nassuphis commented Apr 11, 2021

the expressions:
image

@nassuphis
Copy link
Author

nassuphis commented Apr 11, 2021

bug1 should not evaluate to the same value as bug2 but it does in R.
the SQL evaluation is correct, while the expression in the header is wrong.

> eval(bug1,list(x=-3))
[1] 0.75
> eval(bug2,list(x=-3))
[1] 0.75
> mutate(memdb_frame(x=-3),!!bug1)
# Source:   lazy query [?? x 2]
# Database: sqlite 3.35.2 [:memory:]
      x `((1/(1 + abs(x))) - 1) * sign(x)`
  <dbl>                              <dbl>
1    -3                               1.25
> mutate(memdb_frame(x=-3),!!bug2)
# Source:   lazy query [?? x 2]
# Database: sqlite 3.35.2 [:memory:]
      x `((1/(1 + abs(x))) - 1) * sign(x)`
  <dbl>                              <dbl>
1    -3                               0.75
> 

@nassuphis
Copy link
Author

require(rlang)
#> Loading required package: rlang
require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
require(dbplyr)
#> Loading required package: dbplyr
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql
e<-expr(!!expr((1 / ( 1 + abs(x)))-1)*sign(x))
mutate(memdb_frame(x=-3),!!e) 
#> # Source:   lazy query [?? x 2]
#> # Database: sqlite 3.35.2 [:memory:]
#>       x `((1/(1 + abs(x))) - 1) * sign(x)`
#>   <dbl>                              <dbl>
#> 1    -3                               1.25
mutate(memdb_frame(x=-3),!!parse_expr(deparse(e))) 
#> # Source:   lazy query [?? x 2]
#> # Database: sqlite 3.35.2 [:memory:]
#>       x `((1/(1 + abs(x))) - 1) * sign(x)`
#>   <dbl>                              <dbl>
#> 1    -3                               0.75

Created on 2021-04-11 by the reprex package (v2.0.0)

@mgirlich
Copy link
Collaborator

I simplified the reprex

library(dplyr, warn.conflicts = FALSE)
library(dbplyr)

translate_sql(!!expr(2 - 1) * x)
#> <SQL> 2.0 - 1.0 * `x`
translate_sql(!!expr((2 - 1) * x))
#> <SQL> (2.0 - 1.0) * `x`

Created on 2021-04-12 by the reprex package (v2.0.0)

In sql_infix() the SQL code is constructed with

build_sql(x, sql(f), y)

where x evaluates to sql("2.0 - 1.0"). The simplest solution would be to just add extra parentheses around x and y but they would often be unnecessary. Otherwise, it would be necessary to somehow pass around the information that x needs to be in parentheses.

@hadley hadley added bug an unexpected problem or unintended behavior func trans 🌍 Translation of individual functions to SQL labels Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior func trans 🌍 Translation of individual functions to SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants