-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use **try_cast** instead of **cast** with Microsoft SQL (2012 and later) #380
Comments
Addresses issue tidyverse#380 Preferable for MSSQL 2012 and above, as failure to CAST results in an error. By contrast TRY_CAST (mostly) returns a NA in circumstances where the cast fails. However, this is a MSSQL (2012+) feature, and could break when used with other database backends.
try_cast results in a more elegant failure (returns NA) if a cast is not possible ('cast' will throw an error). Address issue tidyverse#380
in MSSQL casting to NUMERIC casts to integer, and is actually more flexible/graceful, since it returns NA if unable to convert individual rows. Whereas casting to INTEGER/INT results in the entire column becoming NA if a single row cannot be converted to integer. Address issue tidyverse#380
I've added commits DavidPatShuiFong@30d079c, DavidPatShuiFong@34936b1 and DavidPatShuiFong@4933037 which adds sql_try_cast to In DavidPatShuiFong@34936b1, sql_try_cast is used only if MSSQL is version 11 (equivalent to MSSQL 2012) or greater. In DavidPatShuiFong@4933037 I've changed the casting of as.integer from INT/INTEGER to NUMERIC. Casting to NUMERIC also returns an integer and is more flexible because if a single row contains a string which can't be converted, try_casting to NUMERIC returns a NA for just that row, whereas INT/INTEGER returns NA for the entire column! Following is some additional test code. With this test code trying to use as.character and as.POSIXct to convert the string version of POSIXct datetimes causes the con to 'hang' and become unuseable with the current master branch of library(magrittr)
library(lubridate)
library(tidyr)
library(dplyr)
df <- data.frame(China = c(5.72, 3.86, 2.6, 2.65, 2.31, 1.66, 1.60, 1.61, 1.63),
Taiwan = c(3.83, 2.99, 2.48, 1.96, 1.74, 1.74, 1.5, 1.16, 1.05),
Vietnam = c(6.46, 5.97, 5.05, 4.20, 3.55, 2.71, 2.01, 1.89, 1.94),
ROK = c(4.34, 3.46, 2.54, 1.84, 1.58, 1.62, 1.35, 1.17, 1.21),
Year = as.POSIXct(c(1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010), origin = "1970-01-01"),
row.names = c("1970", "1975", "1980", "1985", "1990", "1995", "2000", "2005", "2010"))
df <- df %>%
gather(key = "Country", value = "NchildPerWoman", -Year) %>%
rename(n = NchildPerWoman) %>%
mutate(n = as.character(n), Year = as.character(Year)) # convert two columns to character
df[1,3] <- "5.72x" # creates invalid numeric value
df[2,1] <- "xyz" # create invalid date
# assumes 'con' is a connection to a MSSQL database
DBI::dbWriteTable(con, "##nchild", df)
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.character(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.integer64(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.integer(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.numeric(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.double(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(n = as.logical(n))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as.Date(Year))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as_date(Year))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as_datetime(Year))
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as.character(Year)) # this causes hang on master
con %>% dplyr::tbl("##nchild") %>% dplyr::mutate(Year = as.POSIXct(Year)) # this causes hang on master |
Do you want to turn this into a PR? |
Yes, thanks. will change into a PR over the next few eays. |
try_cast results in a more elegant failure (returns NA) if a cast is not possible ('cast' will throw an error). Address issue tidyverse#380
Address issue tidyverse#380
Address issue tidyverse#380
code to handle invalid DBIObject (as is the case during 'testthat' execution) change testthat expectations (in test-backend-mssql.R) to TRY_CAST as appropriate address issue tidyverse#380
Turned into pull request #496 |
try_cast results in a more elegant failure (returns NA) if a cast is not possible ('cast' will throw an error). Address issue tidyverse#380
Address issue tidyverse#380
Address issue tidyverse#380
code to handle invalid DBIObject (as is the case during 'testthat' execution) change testthat expectations (in test-backend-mssql.R) to TRY_CAST as appropriate address issue tidyverse#380
'mssql_use_try_cast()' replaces 'sql_cast()' with 'sql_try_cast()' add NEWS.md entry address issue tidyverse#380
'mssql_use_try_cast()' replaces 'sql_cast()' with 'sql_try_cast()' add NEWS.md entry address issue tidyverse#380
try_cast results in a more elegant failure (returns NA) if a cast is not possible ('cast' will throw an error). Address issue tidyverse#380
Address issue tidyverse#380
Address issue tidyverse#380
code to handle invalid DBIObject (as is the case during 'testthat' execution) change testthat expectations (in test-backend-mssql.R) to TRY_CAST as appropriate address issue tidyverse#380
'mssql_use_try_cast()' replaces 'sql_cast()' with 'sql_try_cast()' add NEWS.md entry address issue tidyverse#380
mssql_version/mssql_interpret_version return list of three (not four) as currently DBI::dbGetInfo(x)$db.version does not return the fourth number 'Build' mssql_interpret_version returns a list of integers (not character) mssql_interpret_version_string rename to mssql_interpret_version address issue tidyverse#380
mssql_version/mssql_interpret_version return list of three (not four) as currently DBI::dbGetInfo(x)$db.version does not return the fourth number 'Build' mssql_interpret_version returns a list of integers (not character) mssql_interpret_version_string rename to mssql_interpret_version address issue tidyverse#380
A failure to cast (e.g. as.numeric or as.double, when the string contains a non-numeric character) results in an error:
For Microsoft SQL (2012 and later) try_cast allows a more graceful return of NA. In
dbplyr 1.4.2
, this can be done by changing lines 225-232 intranslate-sql-helpers.R
(but this will probably only work for Microsoft SQL 2012 and later):...which results in a more graceful failure (returns NA):
Is it possible to re-define sql_cast for Microsoft SQL backends only?
The text was updated successfully, but these errors were encountered: