Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use **try_cast** instead of **cast** with Microsoft SQL (2012 and later) #380

Closed
DavidPatShuiFong opened this issue Nov 30, 2019 · 4 comments
Labels
feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL help wanted ❤️ we'd love your help!

Comments

@DavidPatShuiFong
Copy link
Contributor

DavidPatShuiFong commented Nov 30, 2019

A failure to cast (e.g. as.numeric or as.double, when the string contains a non-numeric character) results in an error:

> library(magrittr)
> x <- mtcars %>% dplyr::mutate(wt = as.character(wt))
> x[1,6] <- "<2.62" # adds a non-numeric character '<' to an element
> DBI::dbWriteTable(con, "##mtcars", x) # where 'con' is an already defined connection to a MSSQL database
> con %>% dplyr::tbl("##mtcars") %>% dplyr::mutate(wt = as.numeric(wt))
Error: <SQL> 'SELECT TOP(11) "mpg", "cyl", "disp", "hp", "drat", CAST("wt" AS NUMERIC) AS "wt", "qsec", "vs", "am", "gear", "carb"
FROM "##mtcars"'
  nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Error converting data type varchar to numeric. 

For Microsoft SQL (2012 and later) try_cast allows a more graceful return of NA. In dbplyr 1.4.2, this can be done by changing lines 225-232 in translate-sql-helpers.R (but this will probably only work for Microsoft SQL 2012 and later):

#' @rdname sql_variant
#' @export
sql_cast <- function(type) {
  type <- sql(type)
  function(x) {
    sql_expr(try_cast(!!x %as% !!type))
  }
}

...which results in a more graceful failure (returns NA):

> con %>% dplyr::tbl("##mtcars") %>% dplyr::mutate(wt = as.numeric(wt))
# Source:   lazy query [?? x 11]
# Database: Microsoft SQL Server 12.00.2000[bpsrawdata@DAVIDLENOVOPC\BPSINSTANCE/BPSSAMPLES]
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9     NA  16.5     0     1     4     4
 2  21       6  160    110  3.9      3  17.0     0     1     4     4
 3  22.8     4  108     93  3.85     2  18.6     1     1     4     1
 4  21.4     6  258    110  3.08     3  19.4     1     0     3     1
 5  18.7     8  360    175  3.15     3  17.0     0     0     3     2
 6  18.1     6  225    105  2.76     3  20.2     1     0     3     1
 7  14.3     8  360    245  3.21     4  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69     3  20       1     0     4     2
 9  22.8     4  141.    95  3.92     3  22.9     1     0     4     2
10  19.2     6  168.   123  3.92     3  18.3     1     0     4     4
# ... with more rows

Is it possible to re-define sql_cast for Microsoft SQL backends only?

@hadley hadley added feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL help wanted ❤️ we'd love your help! labels Dec 13, 2019
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Dec 14, 2019
Addresses issue tidyverse#380
Preferable for MSSQL 2012 and above, as failure to CAST results in an error. By contrast TRY_CAST (mostly) returns a NA in circumstances where the cast fails. However, this is a MSSQL (2012+) feature, and could break when used with other database backends.
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Dec 14, 2019
try_cast results in a more elegant failure (returns NA) if a cast is not possible ('cast' will throw an error).
Address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Dec 14, 2019
in MSSQL casting to NUMERIC casts to integer, and is actually more flexible/graceful, since it returns NA if unable to convert individual rows. Whereas casting to  INTEGER/INT results in the entire column becoming NA if a single row cannot be converted to integer.
Address issue tidyverse#380
@DavidPatShuiFong
Copy link
Contributor Author

DavidPatShuiFong commented Dec 14, 2019

I've added commits DavidPatShuiFong@30d079c, DavidPatShuiFong@34936b1 and DavidPatShuiFong@4933037 which adds sql_try_cast to R/translate-sql-helpers.R. Then, within R/backend-mssql.R, calls are made to sql_try_cast instead of sql_cast.

In DavidPatShuiFong@34936b1, sql_try_cast is used only if MSSQL is version 11 (equivalent to MSSQL 2012) or greater.

In DavidPatShuiFong@4933037 I've changed the casting of as.integer from INT/INTEGER to NUMERIC. Casting to NUMERIC also returns an integer and is more flexible because if a single row contains a string which can't be converted, try_casting to NUMERIC returns a NA for just that row, whereas INT/INTEGER returns NA for the entire column!

Following is some additional test code.

With this test code trying to use as.character and as.POSIXct to convert the string version of POSIXct datetimes causes the con to 'hang' and become unuseable with the current master branch of dbplyr 1.4.2. The commits I have added don't change that behaviour.

library(magrittr)
library(lubridate)
library(tidyr)
library(dplyr)

df <- data.frame(China = c(5.72, 3.86, 2.6, 2.65, 2.31, 1.66, 1.60, 1.61, 1.63),
                Taiwan = c(3.83, 2.99, 2.48, 1.96, 1.74, 1.74, 1.5, 1.16, 1.05),
                Vietnam = c(6.46, 5.97, 5.05, 4.20, 3.55, 2.71, 2.01, 1.89, 1.94),
                ROK = c(4.34, 3.46, 2.54, 1.84, 1.58, 1.62, 1.35, 1.17, 1.21),
                Year = as.POSIXct(c(1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010), origin = "1970-01-01"),
                row.names = c("1970", "1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010"))

df <- df %>%
 gather(key = "Country", value = "NchildPerWoman", -Year) %>%
 rename(n = NchildPerWoman) %>%
 mutate(n = as.character(n), Year = as.character(Year)) # convert two columns to character

df[1,3] <- "5.72x" # creates invalid numeric value
df[2,1] <- "xyz" # create invalid date

# assumes 'con' is a connection to a MSSQL database

DBI::dbWriteTable(con, "##nchild", df)
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.character(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.integer64(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.integer(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.numeric(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.double(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.logical(n))


con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as.Date(Year))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as_date(Year))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as_datetime(Year))

con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as.character(Year)) # this causes hang on master
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as.POSIXct(Year)) # this causes hang on master

@hadley
Copy link
Member

hadley commented Sep 17, 2020

Do you want to turn this into a PR?

@DavidPatShuiFong
Copy link
Contributor Author

Do you want to turn this into a PR?

Yes, thanks. will change into a PR over the next few eays.

DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 19, 2020
try_cast results in a more elegant failure (returns NA) if a cast is not possible ('cast' will throw an error).
Address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 19, 2020
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 19, 2020
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 20, 2020
code to handle invalid DBIObject (as is the case during 'testthat' execution)
change testthat expectations (in test-backend-mssql.R) to TRY_CAST as appropriate
address issue tidyverse#380
@DavidPatShuiFong
Copy link
Contributor Author

Do you want to turn this into a PR?

Turned into pull request #496

DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 21, 2020
try_cast results in a more elegant failure (returns NA) if a cast is not possible ('cast' will throw an error).
Address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 21, 2020
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 21, 2020
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 21, 2020
code to handle invalid DBIObject (as is the case during 'testthat' execution)
change testthat expectations (in test-backend-mssql.R) to TRY_CAST as appropriate
address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 21, 2020
'mssql_use_try_cast()' replaces 'sql_cast()' with 'sql_try_cast()'
add NEWS.md entry
address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 21, 2020
'mssql_use_try_cast()' replaces 'sql_cast()' with 'sql_try_cast()'
add NEWS.md entry
address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 22, 2020
try_cast results in a more elegant failure (returns NA) if a cast is not possible ('cast' will throw an error).
Address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 22, 2020
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 22, 2020
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 22, 2020
code to handle invalid DBIObject (as is the case during 'testthat' execution)
change testthat expectations (in test-backend-mssql.R) to TRY_CAST as appropriate
address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 22, 2020
'mssql_use_try_cast()' replaces 'sql_cast()' with 'sql_try_cast()'
add NEWS.md entry
address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 22, 2020
mssql_version/mssql_interpret_version return list of three (not four) as currently DBI::dbGetInfo(x)$db.version does not return the fourth number 'Build'
mssql_interpret_version returns a list of integers (not character)
mssql_interpret_version_string rename to mssql_interpret_version
address issue tidyverse#380
DavidPatShuiFong added a commit to DavidPatShuiFong/dbplyr that referenced this issue Sep 22, 2020
mssql_version/mssql_interpret_version return list of three (not four) as currently DBI::dbGetInfo(x)$db.version does not return the fourth number 'Build'
mssql_interpret_version returns a list of integers (not character)
mssql_interpret_version_string rename to mssql_interpret_version
address issue tidyverse#380
@hadley hadley closed this as completed in 8950018 Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL help wanted ❤️ we'd love your help!
Projects
None yet
Development

No branches or pull requests

2 participants