-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbplyr does not translate base R integer division %/% #3057
Comments
Hi @emilyriederer , does a regular using forward slash by itself works in you environment:
|
Hi @edgararuiz -- unfortunately not. It might work just as happenstance due to integer division, but the SQL translation also turns 5 into a floating point: SELECT "x", "x" / 5.0 AS "z" FROM "data_db" |
Ok, can we try appending an "L" to the right of 5?
|
Minimal reprex dbplyr::translate_sql(x %/% 5)
#> <SQL> "x" / 5.0 |
To make equivalent to
|
But this needs to be thought through correctly - I have a vague recollection that negative values might cause issues. |
sql_int_div <- function() {
function(x, m) {
build_sql("((", x, " - (", x, " % ", m, ")) / ", m, ")")
}
} That definition gets us pretty close, but it turns out that library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
df <- tibble(
x = c(10, 10, -10, -10),
y = c(3, -3, 3, -3)
)
df %>% mutate(x %% y, x %/% y)
#> # A tibble: 4 x 4
#> x y `x%%y` `x%/%y`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 10 3 1 3
#> 2 10 -3 -2 -4
#> 3 -10 3 2 -4
#> 4 -10 -3 -1 3
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mf <- con %>% copy_to(df)
mf %>% mutate(x %% y, x %/% y)
#> # Source: lazy query [?? x 4]
#> # Database: sqlite 3.19.3 [:memory:]
#> x y `x%%y` `x%/%y`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 10 3 1 3
#> 2 10 -3 1 -3
#> 3 -10 3 -1 -3
#> 4 -10 -3 -1 3 I'm not sure how to handle this :( |
This is a result of C98 and beyond (and by extension SQLite) using truncated division where the modulo operator takes the sign of the dividend, and R using the mathematically preferred floored division with the modulo sign taking the sign of the divisor. Quite frankly, C(SQLite) and R are doing fundamentally different arithmetic. There's some fascinating reading on the subject here and of course an abridged version on Wikipedia. This will likely also vary across SQL dialects, which makes it more difficult to pin down in a unified way. Given the above, I'm not sure it is reasonable to expect equivalent output in every language. Python covered some of the complexities in this discussion in PEP-228: Reworking Python's Numeric Model and PEP-238: Changing the Division Operator, highlighting that this is not just a "dplyr issue", but rather a significantly larger architectural decision in R and computer arithmetic itself. @emilyriederer, In the case of RedShift, a python udf could be constructed in your database leveraging numpy to replicate the output from R. |
@alex-gable thanks for that awesome summary of the problem! |
I think the best way to handle this is simple to document it. |
Wow - thank you all for the very helpful, detailed responses. All of the context here is fascinating. I'm embarrassed to discover that I completely "went dark" on this thread. Somehow, I'm not getting notifications but luckily spotted this atop the new GitHub feed. Thanks again! |
This issue was moved by krlmlr to tidyverse/dbplyr/issues/108. |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Base R integer division (%/%) is translated to normal division (/) in SQL.
Apologies for non-reprex. I couldn't think of a way to get around the fact that a user must establish a connection to get this to run. When run connected to a AWS Redshift DB, data$z contains values 0, 1, and 2 while data_db$z contains floating point values.
The cause is that %/% is translated as / without accounting for integer-format. One correct SQL translation (at least for Redshift) would be CAST("x"/5 AS INTEGER) or FLOOR("x"/5)
The text was updated successfully, but these errors were encountered: