Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between ifelse and case_when using dbplyr #3774

Closed
simon-anasta opened this issue Aug 27, 2018 · 5 comments
Closed

Inconsistency between ifelse and case_when using dbplyr #3774

simon-anasta opened this issue Aug 27, 2018 · 5 comments

Comments

@simon-anasta
Copy link

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.


When writing queries using dbplyr for SQL server, ifelse and case_when produce different SQL code. The SQL code from case_when results in an exception. See also this question

Sorry the below is not a perfect reprex. Security restrictions prevent me from including business specific details in my example.

# libraries
library(DBI)
library(dplyr)
library(dbplyr)

# establish connection to database table
connection_string = "database.specific.string"
# mine looks something like "DRIVER=...; Trusted_Connection=...; DATABASE=...' SERVER=..."
db_connection = dbConnect(odbc::odbc(), .connection_string = connection_string)
my_table = tbl(db_connection, from = my_table_name)

# attempted query
tmp = my_table %>%
    mutate(new_col = case_when(col1 == col2 ~ "a",
                               TRUE ~ "b"))

# check SQL code for query
show_query(tmp)

# compared to
tmp = my_table %>%
    mutate(new_col = ifelse(col1 == col2, "a", "b"))

# check SQL code for query
show_query(tmp)

The resulting SQL query from the case_when command takes the form:

SELECT 
    col1, col2,
    CASE
       WHEN CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) THEN ('a')
       WHEN (TRUE) THEN ('b')
    END AS new_col
FROM my_database.my_table_name

This produces an exception: "An expression of non-boolean type specified in a context where a condition is expected, near 'THEN'"

The resulting SQL query from the ifelse command takes the form:

SELECT 
    col1, col2,
    CASE
       WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0)))) = TRUE THEN ('a')
       WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0)))) = FALSE THEN ('b')
    END AS new_col
FROM my_database.my_table_name

This works as expected.

As best I can determine, this is a bug. I suspect the SQL syntax from case_when should be more similar to the SQL syntax from ifelse. In particular I suspect the additional brackets and the "= TRUE" in the ifelse version should also appear in the case_when version.

@cderv
Copy link
Contributor

cderv commented Aug 27, 2018

First, this is a {dbplyr} issue. When you have one, please fill them in dbplyr GH issue.

This is why it is related to some dbplyr issues: tidyverse/dbplyr#112, tidyverse/dbplyr#102 and the PR tidyverse/dbplyr#142 that should solve those. I think it should also solve your issue.

This PR replaces the last part with when(TRUE) by a ELSE clause. You can try by intalling the branch (devtools::install_github("tidyverse/dbplyr#142")) then the translation for case_when will use a ELSE clause.

# to get the PR
# devtools::install_github("tidyverse/dbplyr#142")
library(dbplyr)
# The translation as a ELSE clause
translate_sql(
  case_when(
    col1 == col2 ~ "a",
    TRUE ~ "b")
)
#> <SQL> CASE
#> WHEN ("col1" = "col2") THEN ('a')
#> ELSE ('b')
#> END

# IF ELSE clause is different
translate_sql(
  if_else(col1 == col2, "a","b")
)
#> <SQL> CASE WHEN ("col1" = "col2") THEN ('a') WHEN NOT("col1" = "col2") THEN ('b') END

# but you can recreate not using `TRUE~"b`
translate_sql(
  case_when(
    col1 == col2 ~ "a",
    !col1 == col2 ~ "b")
)
#> <SQL> CASE
#> WHEN ("col1" = "col2") THEN ('a')
#> WHEN (NOT("col1" = "col2")) THEN ('b')
#> END

Created on 2018-08-27 by the reprex package (v0.2.0).

You could try if this fix also deals with your exception that you linked to.

@ghost
Copy link

ghost commented Aug 27, 2018

This issue was moved by romainfrancois to tidyverse/dbplyr#148.

@ghost ghost closed this as completed Aug 27, 2018
@simon-anasta
Copy link
Author

Regarding this issue being a dbplyr issue: The cran dbplyr page specifies to report dbplyr bugs on the dplyr GitHub, so I followed instructions. Perhaps the cran page needs updating?

Thank you @cderv for your prompt and detailed response.

@cderv
Copy link
Contributor

cderv commented Aug 27, 2018

Perhaps the cran page needs updating?

It already has been in tidyverse/dbplyr#137 but not released to CRAN yet. Will be fixed in next release.

@lock
Copy link

lock bot commented Feb 23, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Feb 23, 2019
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants