New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbplyr - MS SQL - filter for is.na not working #2940

Closed
edgararuiz opened this Issue Jul 4, 2017 · 14 comments

Comments

Projects
None yet
6 participants
@edgararuiz
Collaborator

edgararuiz commented Jul 4, 2017

The translation for is.na() is not working when used in filter() for the MS SQL translation. The reason is because we're using the same translation used in mutate().

show_query({
  flights %>%
    filter(!is.na(dep_time))
})

Results in:

<SQL>
SELECT *
FROM "flights"
WHERE (NOT(CASE WHEN "dep_time" IS NULL THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END))

filter() should translate is.null() to x IS NULL and !is.null() to x IS NOT NULL

So the correct translation of the code above should be:

<SQL>
Select * from flights where dep_time is not null
@krlmlr

This comment has been minimized.

Show comment
Hide comment
@krlmlr

krlmlr Jul 12, 2017

Member

Thanks. How does filter(is.na(...)) translate?

Member

krlmlr commented Jul 12, 2017

Thanks. How does filter(is.na(...)) translate?

@edgararuiz

This comment has been minimized.

Show comment
Hide comment
@edgararuiz

edgararuiz Jul 12, 2017

Collaborator

Hi @krlmlr , it translates to:

<SQL>
SELECT *
FROM "flights"
WHERE (NOT(CASE WHEN "dep_time" IS NULL THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END))

I plan to work on this issue when I get back from vacation, I think I know what I need to update. I'll be ok with this issue be assigned to me if you like.

Thanks

Collaborator

edgararuiz commented Jul 12, 2017

Hi @krlmlr , it translates to:

<SQL>
SELECT *
FROM "flights"
WHERE (NOT(CASE WHEN "dep_time" IS NULL THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END))

I plan to work on this issue when I get back from vacation, I think I know what I need to update. I'll be ok with this issue be assigned to me if you like.

Thanks

@krlmlr

This comment has been minimized.

Show comment
Hide comment
@krlmlr

krlmlr Jul 12, 2017

Member

Isn't this the translation of filter(!is.na(...))? I asked about filter(is.na(...)).

We'll be happy to review a pull request.

Member

krlmlr commented Jul 12, 2017

Isn't this the translation of filter(!is.na(...))? I asked about filter(is.na(...)).

We'll be happy to review a pull request.

@jschelbert

This comment has been minimized.

Show comment
Hide comment
@jschelbert

jschelbert Jul 24, 2017

Bump... Here is the output you requested for

show_query({
    flights %>%
        filter(is.na(dep_time))
})

just missing the NOT compared to before:

<SQL>
SELECT *
FROM "flights"
WHERE (CASE WHEN "dep_time" IS NULL THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END)

Seems like we have to redefine the is.na (and probably also the is.null) in the file /R/db-odbc-mssql.R BUT then handle the previous behavior which is wanted for mutate(foo = is.na(bar)).

jschelbert commented Jul 24, 2017

Bump... Here is the output you requested for

show_query({
    flights %>%
        filter(is.na(dep_time))
})

just missing the NOT compared to before:

<SQL>
SELECT *
FROM "flights"
WHERE (CASE WHEN "dep_time" IS NULL THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END)

Seems like we have to redefine the is.na (and probably also the is.null) in the file /R/db-odbc-mssql.R BUT then handle the previous behavior which is wanted for mutate(foo = is.na(bar)).

@juliangkr

This comment has been minimized.

Show comment
Hide comment
@juliangkr

juliangkr Sep 12, 2017

Hello,

was this issue fixed? I connected to a oracle database using DBI and ROracle. Everything works fine, but when I want to filter by !is.na I receive the following error:

Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch,  : 
  ORA-00920: invalid relational operator

My query is:

tbl.obj = dplyr::tbl(con, table_name)
    tbl.obj %>% dplyr::filter(!is.na(col))

I tested it with an installation from CRAN and from github

juliangkr commented Sep 12, 2017

Hello,

was this issue fixed? I connected to a oracle database using DBI and ROracle. Everything works fine, but when I want to filter by !is.na I receive the following error:

Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch,  : 
  ORA-00920: invalid relational operator

My query is:

tbl.obj = dplyr::tbl(con, table_name)
    tbl.obj %>% dplyr::filter(!is.na(col))

I tested it with an installation from CRAN and from github

@edgararuiz

This comment has been minimized.

Show comment
Hide comment
@edgararuiz

edgararuiz Sep 12, 2017

Collaborator

Hi @juliangkr , this fix was meant for MS SQL so your issue may be separate. Would you mind sharing with me the code you used to connect, including the code used to get ROracle working with dplyr. Also, would you mind posting the result of: tbl.obj %>% dplyr::filter(!is.na(col)) %>% show_query()? This may required a new Issue to be opened, but please allow me to take a look at what I requested to make a determination. Thanks.

Collaborator

edgararuiz commented Sep 12, 2017

Hi @juliangkr , this fix was meant for MS SQL so your issue may be separate. Would you mind sharing with me the code you used to connect, including the code used to get ROracle working with dplyr. Also, would you mind posting the result of: tbl.obj %>% dplyr::filter(!is.na(col)) %>% show_query()? This may required a new Issue to be opened, but please allow me to take a look at what I requested to make a determination. Thanks.

@juliangkr

This comment has been minimized.

Show comment
Hide comment
@juliangkr

juliangkr Sep 13, 2017

The complete code is:

connection_str = paste0('(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=', host, ')(PORT=1521)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=', service_name, ')))')

 drv = dbDriver("Oracle") 
 con = DBI::dbConnect(drv, 
                      user=username,
                      password=password,
                      dbname=connection_str)

tab = dplyr::tbl(con, "Samples") %>%
    dplyr::filter(!is.na(name))

class(con) returns:

class(con)
[1] "OraConnection"
attr(,"package")
[1] "ROracle"

EDIT:

As suggested in other posts I added:

  sql_translate_env.OraConnection <- dbplyr:::sql_translate_env.Oracle
  sql_select.OraConnection <- dbplyr:::sql_select.Oracle
  sql_subquery.OraConnection <- dbplyr:::sql_subquery.Oracle

I keep getting the error.
Thank you

** EDIT:**
@edgararuiz sorry I forgot to add the requested query:

tbl.obj %>% 
    dplyr::select(TEST) %>% 
    dplyr::filter(!is.null(TEST))
Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch,  :
  ORA-00920: invalid relational operator


tbl.obj %>% 
    dplyr::select(TEST) %>% 
    dplyr::filter(!is.null(TEST)) %>%
    dplyr::show_query()
<SQL>
SELECT *
FROM (SELECT "TEST" AS "TEST"
FROM (MY_DB.TEST_TABLE) "cfukeouggs") "knllrwlijx"
WHERE (NOT(CASE WHEN"TEST" IS NULL THEN 1 ELSE 0 END ))

juliangkr commented Sep 13, 2017

The complete code is:

connection_str = paste0('(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=', host, ')(PORT=1521)))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=', service_name, ')))')

 drv = dbDriver("Oracle") 
 con = DBI::dbConnect(drv, 
                      user=username,
                      password=password,
                      dbname=connection_str)

tab = dplyr::tbl(con, "Samples") %>%
    dplyr::filter(!is.na(name))

class(con) returns:

class(con)
[1] "OraConnection"
attr(,"package")
[1] "ROracle"

EDIT:

As suggested in other posts I added:

  sql_translate_env.OraConnection <- dbplyr:::sql_translate_env.Oracle
  sql_select.OraConnection <- dbplyr:::sql_select.Oracle
  sql_subquery.OraConnection <- dbplyr:::sql_subquery.Oracle

I keep getting the error.
Thank you

** EDIT:**
@edgararuiz sorry I forgot to add the requested query:

tbl.obj %>% 
    dplyr::select(TEST) %>% 
    dplyr::filter(!is.null(TEST))
Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch,  :
  ORA-00920: invalid relational operator


tbl.obj %>% 
    dplyr::select(TEST) %>% 
    dplyr::filter(!is.null(TEST)) %>%
    dplyr::show_query()
<SQL>
SELECT *
FROM (SELECT "TEST" AS "TEST"
FROM (MY_DB.TEST_TABLE) "cfukeouggs") "knllrwlijx"
WHERE (NOT(CASE WHEN"TEST" IS NULL THEN 1 ELSE 0 END ))
@juliangkr

This comment has been minimized.

Show comment
Hide comment
@juliangkr

juliangkr Oct 16, 2017

Hello @edgararuiz, I still was not able to fix this issue. Could you please have a second look on my problem? Additionally, I sometimes get an error, that my query is not a correct SQL statement. I think my queries can be quiet long, but with dplyr in combination with dplyrOracle there was no problem with the long queries.

juliangkr commented Oct 16, 2017

Hello @edgararuiz, I still was not able to fix this issue. Could you please have a second look on my problem? Additionally, I sometimes get an error, that my query is not a correct SQL statement. I think my queries can be quiet long, but with dplyr in combination with dplyrOracle there was no problem with the long queries.

@edgararuiz

This comment has been minimized.

Show comment
Hide comment
@edgararuiz

edgararuiz Oct 16, 2017

Collaborator

Hi @juliangkr , is the issue still there after using the version of dbplyr in the PR? devtools::install_github("tidyverse/dbplyr", ref = devtools::github_pull(34))

For your second problem, I wonder if you're experiencing the same problem I'm having that prevents the sql_optimiser from running when a print action happens: #3084 . Can you try a collect() in the code and see if it fails as well?

Collaborator

edgararuiz commented Oct 16, 2017

Hi @juliangkr , is the issue still there after using the version of dbplyr in the PR? devtools::install_github("tidyverse/dbplyr", ref = devtools::github_pull(34))

For your second problem, I wonder if you're experiencing the same problem I'm having that prevents the sql_optimiser from running when a print action happens: #3084 . Can you try a collect() in the code and see if it fails as well?

@juliangkr

This comment has been minimized.

Show comment
Hide comment
@juliangkr

juliangkr Oct 16, 2017

Hi @edgararuiz, thank you for your quick answer! With the PR version it works perfectly, thank you!

The second issue seems to be the same as in #3084, if i execute a command without collect() I get the error message ORA-00933: SQL command not properly ended otherwise not.

juliangkr commented Oct 16, 2017

Hi @edgararuiz, thank you for your quick answer! With the PR version it works perfectly, thank you!

The second issue seems to be the same as in #3084, if i execute a command without collect() I get the error message ORA-00933: SQL command not properly ended otherwise not.

@edgararuiz

This comment has been minimized.

Show comment
Hide comment
@edgararuiz

edgararuiz Oct 16, 2017

Collaborator

That's great to hear!

Also, thank you for testing and letting me know that the issues are related. I think I know where the fix needs to be done, but I haven't put together a PR yet, I'll follow up with you when I have something for that.

Collaborator

edgararuiz commented Oct 16, 2017

That's great to hear!

Also, thank you for testing and letting me know that the issues are related. I think I know where the fix needs to be done, but I haven't put together a PR yet, I'll follow up with you when I have something for that.

@Prometheus77

This comment has been minimized.

Show comment
Hide comment
@Prometheus77

Prometheus77 May 24, 2018

This issue doesn't appear to be fixed in the CRAN version as of yet, as I'm experiencing the same problem. Any update on when we can expect this to be included?

Prometheus77 commented May 24, 2018

This issue doesn't appear to be fixed in the CRAN version as of yet, as I'm experiencing the same problem. Any update on when we can expect this to be included?

@edgararuiz

This comment has been minimized.

Show comment
Hide comment
@edgararuiz

edgararuiz May 24, 2018

Collaborator

Hi @Prometheus77 , would you mind opening a new issue, and include a reprex? If represe is not possible, just the code (feel free to redact any private info), the results of the code, and also the output of your sessionInfo() please?

Collaborator

edgararuiz commented May 24, 2018

Hi @Prometheus77 , would you mind opening a new issue, and include a reprex? If represe is not possible, just the code (feel free to redact any private info), the results of the code, and also the output of your sessionInfo() please?

@Prometheus77

This comment has been minimized.

Show comment
Hide comment
@Prometheus77

Prometheus77 May 24, 2018

I just installed the utf8 package and that fixed the issue for me. The underlying query is still using the NOT(("X") IS NULL) syntax instead of "X" IS NOT NULL, but it appears to be working properly now.

Here's a snippet:

library(tidyverse)
library(DBI)
library(dbplyr)

con <- dbConnect(odbc::odbc(), .connection_string = paste0("driver={SQL Server Native Client 11.0};server=MYSERVER;database=MYDB;trusted_connection=yes"))

con %>% tbl(ident_q("MYDB.dbo.mytable")) %>%
  filter(!is.na(mycolumn))

Prometheus77 commented May 24, 2018

I just installed the utf8 package and that fixed the issue for me. The underlying query is still using the NOT(("X") IS NULL) syntax instead of "X" IS NOT NULL, but it appears to be working properly now.

Here's a snippet:

library(tidyverse)
library(DBI)
library(dbplyr)

con <- dbConnect(odbc::odbc(), .connection_string = paste0("driver={SQL Server Native Client 11.0};server=MYSERVER;database=MYDB;trusted_connection=yes"))

con %>% tbl(ident_q("MYDB.dbo.mytable")) %>%
  filter(!is.na(mycolumn))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment