-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBI sources can't provide sampling implementation #2993
Comments
Hi @hannesmuehleisen , this will probably have to be implemented for each individual database type. Which database are you working with at this time? |
Well I think this should probably be part of the SQL translation method. Indeed the syntax is different for all the systems that I can see. A generic method would be a first step. I am the author of the MonetDBLite package, which includes a dplyr backend. |
What's the syntax that MonetDB uses for sampling? |
Seems like Alternatively, |
yes both these solutions seem fine to me! I like how dplyr relies more on a cooperative DBI driver nowadays. |
Actually this is more complicated than I expected. The right way to implement this is to:
I think this is better than adding a @hannesmuehleisen could you provide a few examples/links to sampling implementations that you've seen? |
I assume you mean sample functionality in SQL? Here you go: Oracle: So indeed some variety and some system use counts, some percentages and others workarounds. A more crucial difference is whether the sample is taken of the base table or the query result (!). We are in the process in switching from the latter to the former, this gets interesting once filters or aggregations or joins are involved. |
What about grouped/stratified samples? Any widespread support for that? |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
When using
src_sql
, it is possible to attach a custom class totbl
objects. But the documentation forsrc_sql
states that it is deprecated and thatsrc_dbi
should be used instead. When usingsrc_dbi
, no custom class can be set fortbl
objects (at least not as far as I can tell).This is mostly fine, since most generics dispatch on the connection class. Unfortunately,
sample_n
andsample_frac
only dispatch on thetbl
class and there is no generic for db-backed tables that could be used to provide a custom implementation.My suggestion would be to add a DBI generic for
sample_*
that calls a method defined on the connection if available.The text was updated successfully, but these errors were encountered: