Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non standard handling of iso-date strings in PrestoDB/AWS Athena #384

Closed
OssiLehtinen opened this issue Dec 13, 2019 · 9 comments
Closed

Non standard handling of iso-date strings in PrestoDB/AWS Athena #384

OssiLehtinen opened this issue Dec 13, 2019 · 9 comments
Labels
feature func trans 🌍

Comments

@OssiLehtinen
Copy link
Contributor

@OssiLehtinen OssiLehtinen commented Dec 13, 2019

Most databases are able to interpret iso-date strings as dates in, e.g., "where" clauses, such as

select * from table where date_column = '2019-01-01'

This is not the case with PrestoDB/AWS Athena, however. Instead, the strings need to be cast as dates first like:
select * from table where date_column = date('2019-01-01'),
select * from table where date_column = date'2019-01-01' or
select * from table where date_column = cast('2019-01-01' as date)

However, when a date variable is used as a parameter id dplyr/dbplyr, is is translated to a iso-date string in the generated sql, so a command such as:

today <- Sys.Date()
tbl(AthenaCon, "table") %>% filter(date_column <= today)

produces and error, as the generated sql does not have the explicit cast to date of the iso-date string. One needs to manually cast the date variable as a date as in
... %>% filter(date_column <= as.Date(today)), but this feels unnecessary, as R knows today is a date already.

A DBI backend for AWS Athena is being developed at https://github.com/DyfanJones/RAthena, and I'm asking for some guidance on how to address this issue there.

My question is: is there a method for overriding the default behavior of dbplyr, where dates are translated to quoted strings and instead have them translated to, e.g., "date'2019-01-01'"?

I'm sorry if this seems like a trivial question, but I'm drawing a blank from reading the database specific source files and packages. All I can come up with is to use assignInNameSpace to replace the escape.Date -method, but this seems pretty brutal.

Thanks for any hints!

@hadley hadley transferred this issue from tidyverse/dplyr Dec 13, 2019
@hadley
Copy link
Member

@hadley hadley commented Dec 13, 2019

It's not super elegant, but you can use the technique below to work around the problem in the interim:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

athena_date <- function(x) sql(paste0("date('", x, "')"))

today <- Sys.Date()

tbl_lazy(mtcars) %>% 
  filter(date_column <= !!athena_date(today))
#> <SQL>
#> SELECT *
#> FROM `df`
#> WHERE (`date_column` <= date('2019-12-13'))

Created on 2019-12-13 by the reprex package (v0.3.0)

Otherwise, you have to wait for a custom backend to provide the correct escaping behaviour.

@OssiLehtinen
Copy link
Contributor Author

@OssiLehtinen OssiLehtinen commented Dec 13, 2019

Thanks for the response!

The question would still be how to implement the escaping in the custom backend "RAthena"? Any rough guidelines would be highly appreciated, to help the development of the package.

@hadley
Copy link
Member

@hadley hadley commented Dec 13, 2019

I don't have that loaded in my head right now, sorry. But it should be possible — you'll just need to track through the various escaping generics in dbplyr.

@OssiLehtinen
Copy link
Contributor Author

@OssiLehtinen OssiLehtinen commented Dec 13, 2019

I've been scratching my head trying to track the right generics, but a bit of a beginner with those, so no success so far. Well, I'll scratch some more, knowing doing this should be possible.

@hadley
Copy link
Member

@hadley hadley commented Dec 13, 2019

Oh hmmm, maybe it's not currently possible? It looks like we'd need a new sql_escape_date generic. Does date() work for both dates and date-times? Would you mind pointing me to the documentation?

@hadley hadley added feature func trans 🌍 labels Dec 13, 2019
@OssiLehtinen
Copy link
Contributor Author

@OssiLehtinen OssiLehtinen commented Dec 13, 2019

OK!

Here's some documentation on Presto's datatypes (Athena is Presto under the hood): https://prestosql.io/docs/current/language/types.html

In general, Presto doesn't automatically convert between varchar and other types, even for constants. So for a timestamp constant one would use "timestamp '2019-01-01 01:01:01.000'", for dates "date '2019-01-01'" and so on.

Edit: the official (pretty much identical) documentation, as prestosql is a fork of prestodb: https://prestodb.io/docs/current/language/types

@OssiLehtinen
Copy link
Contributor Author

@OssiLehtinen OssiLehtinen commented Dec 16, 2019

I tried my hand on implementing the required changes, which can be found in the related pull request. Pretty new to this process, so hopefully it's not a complete mess and didn't step on too many toes along the way.

Is it, btw, better to have the backend-athena.R type of content (the specific method implementations) here or in the custom backend package? In your book 'Advanced R' you point out that the owner of the generic should define the meththods too, but there seem to be exceptions to the rule with these backends.

@hadley
Copy link
Member

@hadley hadley commented Sep 22, 2020

Fixed in #391

@hadley hadley closed this as completed Sep 22, 2020
@hadley
Copy link
Member

@hadley hadley commented Sep 27, 2020

I wasn't paying attention and you're right, these should live in the backend packages. So I've removed from dbplyr and filed issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature func trans 🌍
Projects
None yet
Development

No branches or pull requests

2 participants