dbplyr (development version)
(To become dbplyr 2.0.0)
dplyr 1.0.0 compatibility
-
across()is now translated into individual SQL statements (#480). -
rename()andselect()support dplyr 1.0.0 tidyselect syntax (apart from predicate functions which can't easily work on computed queries) (#502). -
relocate()makes it easy to move columns (#494) andrename_with()makes it easy to rename columns programmatically (#502). -
slice_min(),slice_max(), andslice_order()are now supported.slice_head()andslice_tail()throw clear error messages (#394)
SQL generation
-
Documentation has been radically improved with new topics for each major verb and each backend giving more details about the SQL translation.
-
intersect(),union()andsetdiff()gain anallargument to add theALLargument (#414). -
Join functions gains a
na_matchesargument that allows you to control whether or notNA(NULL) values match otherNAvalues. The default is"never", which is the usual behaviour in databases. You can setna_matches = "na"to match R's usual join behaviour (#180). Additional arguments error (instead of being silently swallowed) (#382). -
Joins now only use aliases where needed to disambiguate columns; this should make generated queries more readable.
-
Subqueries no longer include an
ORDER BYclause. This is not part of the SQL spec, and has very limited support across databases. Now such queries generate a warning suggesting that you move yourarrange()call later in the pipeline (#276). (There's one exception:ORDER BYis still generated ifLIMITis present; this tends to affect the returns rows but not necessarily their order). -
Subquery names are now scoped within the query. This makes query text deterministic which helps some query optimisers/cachers (#336).
-
sql_optimise()now can partially optimise a pipeline; due to an unfortunate bug it previously gave up too easily. -
in_schema()quotes each input individually (#287) (usesql()to opt out of quoting, if needed). AndDBI::Id()should work anywhere thatin_schema()does.
SQL translation
-
Experimental new SAP HANA backend (#233). Requires the latest version of odbc.
-
All backends:
-
You can now use
::in translations, so that (e.g.)dbplyr::n()is translated tocount(*)(#207). -
%/%now generates a clear error message; previously it was translated to/which is not correct (#108). -
n()is translated tocount(*)instead ofcount()(#343). -
sub_str()translation is more consistent in edge cases (@ianmcook). -
All
median()(@lorenzwalthert, #483),pmin(),pmax()(#479),sd()andvar()functions have anna.rmargument that warns once when notTRUE. This makes them consistent withmean()andsum(). -
substring()is now translated the same way assubstr()(#378).
-
-
blob vectors can now be used with
!!and!!!operators, for example infilter()(@okhoma, #433) -
MySQL uses standard SQL for index creation.
-
MS SQL translation does better a distinguishing between bit and boolean (#377, #318).
ifandifelseonce again generateIIF, creating simpler expressions.as.*()function usesTRY_CAST()instead ofCAST()for version 11+ (2012+) (@DavidPatShuiFong, #380). -
odbc no longer translates
count(); this was an accidental inclusion. -
Oracle translation now depends on Oracle 12c, and uses a "row-limiting" clause for
head(). It gains translations fortoday()andnow(), and improvedas.Date()translation (@rlh1994, #267). -
PostgreSQL: new translations for lubridate period functions
years(),months(),days(), andfloor_date()(@bkkkk, #333) and stringr functionsstr_squish(),str_remove(), andstr_remove_all()(@shosaco). -
New RedShift translations when used with
RPostgres::Redshift().-
str_replace()errors since there's no Redshift translation, andstr_replace_all()usesREGEXP_REPLACE()(#446). -
paste()andpaste0()use||(#458). -
as.numeric()andas.double()cast toFLOAT(#408). -
substr()andstr_sub()useSUBSTRING()(#327).
-
-
SQLite gains translations for lubridate functions
today(),now(),year(),month(),day(),hour(),minute(),second(),yday()(#262), and correct translation formedian()(#357).
Extensibility
If you are the author of a dbplyr backend, please see vignette("backend-2") for details.
-
New
dbplyr_edition()generic allows you to opt-in to the 2nd edition of the dbplyr API. -
db_write_table()now callsDBI::dbWriteTable()instead of nine generics that formerly each did a small part:db_create_indexes(),db_begin(),db_rollback(),db_commit(),db_list_tables(),drop_drop_table(),db_has_table(),db_create_table(), anddb_data_types(). You can now delete the methods for these generics.db_query_rows()is no longer used; it appears that it hasn't been used for some time, so if you have a method, you can delete it. -
DBI::dbQuoteIdentifier()is now used instead ofsql_escape_ident()andDBI::dbQuoteString()instead ofsql_escape_string(). -
A number of
db_*generics have been replaced with new SQL generation generics:dplyr::db_analyze()->dbplyr::sql_table_analyze()dplyr::db_create_index()->dbplyr::sql_table_index()dplyr::db_explain()->dbplyr::sql_queriy_explain()dplyr::db_query_fields()->dbplyr::sql_query_fields()dplyr::db_save_query()->dbplyr::sql_query_save()
This makes them easier to test and is an important part of the process of moving all database generics in dbplyr (#284).
-
A number of other generics have been renamed to facilitate the move from dplyr to dbplyr:
dplyr::sql_select()->dbplyr::sql_query_select()dplyr::sql_join()->dbplyr::sql_query_join()dplyr::sql_semi_join()->dbplyr::sql_query_semi_join()dplyr::sql_set_op()->dbplyr::sql_query_set_op()dplyr::sql_subquery()->dbplyr::sql_query_wrap()dplyr::db_desc()->dbplyr::db_connection_describe()
-
New
db_temporary_table()generic makes it easier to work with databases that require temporary tables to be specially named. -
New
sql_expr_matches()generic allows databases to use more efficient alternatives when determine if two values "match" (i.e. like equality but a pair ofNULLs will also match). For more details, see https://modern-sql.com/feature/is-distinct-from -
New
sql_join_suffix()allows backends to control the default suffixes used (#254).
Minor improvements and bug fixes
-
All old lazy eval shims have been removed. These have been deprecated for some time.
-
Date-time escaping methods for Athena and Presto have moved to the packages where they belong.
-
Attempting to embed a Shiny reactive in a query now gives a helpful error (#439).
-
copy_lahman()andcopy_nycflights13()(and hencenycflights13_sqlite()) and friends now return DBI connections rather than the now deprecatedsrc_dbi()(#440). -
copy_to()can nowoverwritewhen table is specified with schema (#489), and gains anin_transactionargument used to optionally suppress the transaction wrapper (#368). -
distinct()no longer duplicates column if grouped (#354). -
transmute()now correctly tracks variables it needs when creating subqueries (#313). -
mutate()grouping variables no longer generates a downstream error (#396) -
mutate()correctly generates subqueries when you re-use the same variable three or more times (#412). -
window_order()overrides ordering, rather than appending to it.
dbplyr 1.4.4
-
Internally
DBI::dbExecute()now usesimmediate = TRUE; this improves support for session-scoped temporary tables in MS SQL (@krlmlr, #438). -
Subqueries with
ORDER BYuseTOP 9223372036854775807instead ofTOP 100 PERCENTon SQL Server for compatibility with Azure Data Warehouse (#337, @alexkyllo). -
escape()now supportsblobvectors using newsql_escape_raw()generic. It enables using blob variables in dplyr verbs, for example to filter nvarchar values by UTF-16 blobs (see https://github.com/r-dbi/DBI/issues/215#issuecomment-356376133). (@okhoma, #433) -
Added
setOldClass()calls for"ident"and"ident_q"classes for compatibility with dplyr 1.0.0 (#448, @krlmlr). -
Postgres
str_detect()translation uses same argument names as stringr, and gains anegateargument (#444). -
semi_join()andanti_join()now correctly support thesql_onargument (#443, @krlmlr).
dbplyr 1.4.3
-
dbplyr now uses RPostgres (instead of RPostgreSQL) and RMariaDB (instead of RMySQL) for its internal tests and data functions (#427).
-
The Date and POSIXt methods for
escape()now use exportedsql_escape_date()andsql_escape_datetime()generics to allow backend specific formatting of date and datetime literals. These are used to provide methods for Athena and Presto backends (@OssiLehtinen, #384, #391). -
first(),last(),nth(),lead()andlag()now respect thewindow_frame()(@krlmlr, #366). -
SQL server: new translations for
str_flatten()(@PauloJhonny, #405). -
SQL server: temporary datasets are now session-local, not global (#401).
-
Postgres: correct
str_detect(),str_replace()andstr_replace_all()translation (@shosaco, #362).
dbplyr 1.4.2
-
Fix bug when partially evaluating unquoting quosure containing a single symbol (#317)
-
Fixes for rlang and dpylr compatibility.
dbplyr 1.4.1
Minor improvements to SQL generation
-
x %in% ystrips names ofy(#269). -
Enhancements for scoped verbs (
mutate_all(),summarise_if(),filter_at()etc) (#296, #306). -
MS SQL use
TOP 100 PERCENTas stop-gap to allow subqueries withORDER BY(#277). -
Window functions now translated correctly for Hive (#293, @cderv).
dbplyr 1.4.0
Breaking changes
-
Error: `con` must not be NULL: If you see this error, it probably means that you have forgotten to passcondown to a dbplyr function. Previously, dbplyr defaulted to usingsimulate_dbi()which introduced subtle escaping bugs. (It's also possible I have forgotten to pass it somewhere that the dbplyr tests don't pick up, so if you can't figure it out, please let me know). -
Subsetting (
[[,$, and[) functions are no longer evaluated locally. This makes the translation more consistent and enables useful new idioms for modern databases (#200).
New features
-
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/) and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain support for window functions, available in Maria DB 10.2, MySQL 8.0, and SQLite 3.25 (#191).
-
Overall, dplyr generates many fewer subqueries:
-
Joins and semi-joins no longer add an unneeded subquery (#236). This is facilitated by the new
bare_identifier_okargument tosql_render(); the previous argument was calledrootand confused me. -
Many sequences of
select(),rename(),mutate(), andtransmute()can be collapsed into a single query, instead of always generating a subquery (#213).
-
-
New
vignette("sql")describes some advantages of dbplyr over SQL (#205) and gives some advice about writing literal SQL inside of dplyr, when you need to (#196). -
New
vignette("reprex")gives some hints on creating reprexes that work anywhere (#117). This is supported by a newtbl_memdb()that matches the existingtbl_lazy(). -
All
..._join()functions gain ansql_onargument that allows specifying arbitrary join predicates in SQL code (#146, @krlmlr).
SQL translations
-
New translations for some lubridate functions:
today(),now(),year(),month(),day(),hour(),minute(),second(),quarter(),yday()(@colearendt, @derekmorr). Also added new translation foras.POSIXct(). -
New translations for stringr functions:
str_c(),str_sub(),str_length(),str_to_upper(),str_to_lower(), andstr_to_title()(@colearendt). Non-translated stringr functions throw a clear error. -
New translations for bitwise operations:
bitwNot(),bitwAnd(),bitwOr(),bitwXor(),bitwShiftL(), andbitwShiftR(). Unlike the base R functions, the translations do not coerce arguments to integers (@davidchall, #235). -
New translation for
x[y]toCASE WHEN y THEN x END. This enablessum(a[b == 0])to work as you expect from R (#202).yneeds to be a logical expression; if not you will likely get a type error from your database. -
New translations for
x$yandx[["y"]]tox.y, enabling you to index into nested fields in databases that provide them (#158). -
The
.dataand.envpronouns of tidy evaluation are correctly translated (#132). -
New translation for
median()andquantile(). Works for all ANSI compliant databases (SQL Server, Postgres, MariaDB, Teradata) and has custom translations for Hive. Thanks to @edavidaja for researching the SQL variants! (#169) -
na_if()is correct translated toNULLIF()(rather thanNULL_IF) (#211). -
n_distinct()translation throws an error when given more than one argument. (#101, #133). -
New default translations for
paste(),paste0(), and the hyperbolic functions (these previously were only available for ODBC databases). -
Corrected translations of
pmin()andpmax()toLEAST()andGREATEST()for ANSI compliant databases (#118), toMIN()andMAX()for SQLite, and to an error for SQL server. -
New translation for
switch()to the simple form ofCASE WHEN(#192).
SQL simulation
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
-
SQL simulation has been overhauled. It now works reliably, is better documented, and always uses ANSI escaping (i.e.
`for field names and'for strings). -
tbl_lazy()now actually puts adbplyr::srcin the$srcfield. This shouldn't affect any downstream code unless you were previously working around this weird difference betweentbl_lazyandtbl_sqlclasses. It also includes thesrcclass in its class, and when printed, shows the generated SQL (#111).
Database specific improvements
-
MySQL/MariaDB
-
Translations also applied to connections via the odbc package (@colearendt, #238)
-
Basic support for regular expressions via
str_detect()and
str_replace_all()(@colearendt, #168). -
Improved translation for
as.logical(x)toIF(x, TRUE, FALSE).
-
-
Oracle
- New custom translation for
paste()andpaste0()(@cderv, #221)
- New custom translation for
-
Postgres
- Basic support for regular expressions via
str_detect()and
str_replace_all()(@colearendt, #168).
- Basic support for regular expressions via
-
SQLite
explain()translation now generatesEXPLAIN QUERY PLANwhich generates a higher-level, more human friendly explanation.
-
SQL server
-
Improved translation for
as.logical(x)toCAST(x as BIT)(#250). -
Translates
paste(),paste0(), andstr_c()to+. -
copy_to()method applies temporary table name transformation earlier so that you can now overwrite temporary tables (#258). -
db_write_table()method uses correct argument name for passing along field types (#251).
-
Minor improvements and bug fixes
-
Aggregation functions only warn once per session about the use of
na.rm = TRUE(#216). -
table names generated by
random_table_name()have the prefix "dbplyr_", which makes it easier to find them programmatically (@mattle24, #111) -
Functions that are only available in a windowed (
mutate()) query now throw an error when called in a aggregate (summarise()) query (#129) -
arrange()understands the.by_groupargument, making it possible sort by groups if desired. The default isFALSE(#115) -
distinct()now handles computed variables likedistinct(df, y = x + y)(#154). -
escape(),sql_expr()andbuild_sql()no longer acceptcon = NULLas a shortcut forcon = simulate_dbi(). This made it too easy to forget to passconalong, introducing extremely subtle escaping bugs.win_over()gains aconargument for the same reason. -
New
escape_ansi()always uses ANSI SQL 92 standard escaping (for use in examples and documentation). -
mutate(df, x = NULL)dropsxfrom the output, just like when working with local data frames (#194). -
partial_eval()processes inlined functions (including rlang lambda functions). This makes dbplyr work with more forms of scoped verbs likedf %>% summarise_all(~ mean(.)),df %>% summarise_all(list(mean))(#134). -
sql_aggregate()now takes an optional argumentf_rfor passing tocheck_na_rm(). This allows the warning to show the R function name rather than the SQL function name (@sverchkov, #153). -
sql_infix()gains apadargument for the rare operator that doesn't need to be surrounded by spaces. -
sql_prefix()no longer turns SQL functions into uppercase, allowing for correct translation of case-sensitive SQL functions (#181, @mtoto). -
summarise()gives a clear error message if you refer to a variable created in that samesummarise()(#114). -
New
sql_call2()which is torlang::call2()assql_expr()is torlang::expr(). -
show_query()andexplain()usecat()rather than message. -
union(),union_all(),setdiff()andintersect()do a better job of matching columns across backends (#183).
dbplyr 1.3.0
- Now supports for dplyr 0.8.0 (#190) and R 3.1.0
API changes
-
Calls of the form
dplyr::foo()are now evaluated in the database, rather than locally (#197). -
varsargument totbl_sql()has been formally deprecated; it hasn't actually done anything for a while (#3254). -
srcandtblobjects now include a class generated from the class of the underlying connection object. This makes it possible for dplyr backends to implement different behaviour at the dplyr level, when needed. (#2293)
SQL translation
-
x %in% yis now translated toFALSEifyis empty (@mgirlich, #160). -
New
as.integer64(x)translation toCAST(x AS BIGINT)(#3305) -
case_whennow translates with a ELSE clause if a formula of the formTRUE~<RHS>is provided . (@cderv, #112) -
cummean()now generatesAVG()notMEAN()(#157) -
str_detect()now uses correct parameter order (#3397) -
MS SQL
- Cumulative summary functions now work (#157)
ifelse()usesCASE WHENinstead ofIIF; this allows more complex operations, such as%in%, to work properly (#93)
-
Oracle
- Custom
db_drop_table()now only drops tables if they exist (#3306) - Custom
setdiff()translation (#3493) - Custom
db_explain()translation (#3471)
- Custom
-
SQLite
- Correct translation for
as.numeric()/as.double()(@chris-park, #171).
- Correct translation for
-
Redshift
substr()translation improved (#3339)
Minor improvements and bug fixes
-
copy_to()will only remove existing table whenoverwrite = TRUEand the table already exists, eliminating a confusing "NOTICE" from PostgreSQL (#3197). -
partial_eval()handles unevaluated formulas (#184). -
pull.tbl_sql()now extracts correctly from grouped tables (#3562). -
sql_render.op()now correctly forwards theconargument (@kevinykuo, #73).
dbplyr 1.2.2
- R CMD check fixes
dbplyr 1.2.1
- Forward compatibility fixes for rlang 0.2.0
dbplyr 1.2.0
New top-level translations
-
New translations for
- MS Access (#2946) (@DavisVaughan)
- Oracle, via odbc or ROracle (#2928, #2732, @edgararuiz)
- Teradata.
- Redshift.
-
dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest.
New features
-
copy_to()can now "copy" tbl_sql in the same src, providing another way to cache a query into a temporary table (#3064). You can alsocopy_totbl_sqls from another source, andcopy_to()will automatically collect then copy. -
Initial support for stringr functions:
str_length(),str_to_upper(),str_to_lower(),str_replace_all(),str_detect(),str_trim(). Regular expression support varies from database to database, but most simple regular expressions should be ok.
Tools for developers
-
db_compute()gains ananalyzeargument to matchdb_copy_to(). -
New
remote_name(),remote_con(),remote_src(),remote_query()andremote_query_plan()provide a standard API for get metadata about a remote tbl (#3130, #2923, #2824). -
New
sql_expr()is a more convenient building block for low-level SQL translation (#3169). -
New
sql_aggregate()andwin_aggregate()for generating SQL and windowed SQL functions for aggregates. These take one argument,x, and warn ifna.rmis notTRUE(#3155).win_recycled()is equivalent towin_aggregate()and has been soft-deprecated. -
db_write_tablenow needs to return the table name
Minor improvements and bug fixes
-
Multiple
head()calls in a row now collapse to a single call. This avoids a printing problem with MS SQL (#3084). -
escape()now works with integer64 values from the bit64 package (#3230) -
if,ifelse(), andif_else()now correctly scope the false condition so that it only applies to non-NULL conditions (#3157) -
ident()andident_q()handle 0-length inputs better, and should be easier to use with S3 (#3212) -
in_schema()should now work in more places, particularly incopy_to()(#3013, @baileych) -
SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220).
-
mutate()has better logic for splitting a single mutate into multiple subqueries (#3095). -
Improved
paste()andpaste0()support in MySQL, PostgreSQL (#3168), and RSQLite (#3176). MySQL and PostgreSQL gain support forstr_flatten()which behaves likepaste(x, collapse = "-")(but for technical reasons can't be implemented as a straightforward translation ofpaste()). -
same_src.tbl_sql()now performs correct comparison instead of always returningTRUE. This means thatcopy = TRUEonce again allows you to perform cross-database joins (#3002). -
select()queries no longer alias column names unnecessarily (#2968, @DavisVaughan). -
select()andrename()are now powered by tidyselect, fixing a few renaming bugs (#3132, #2943, #2860). -
summarise()once again performs partial evaluation before database submission (#3148). -
test_src()makes it easier to access a single test source.
Database specific improvements
-
MS SQL
-
Better support for temporary tables (@Hong-Revo)
-
Different translations for filter/mutate contexts for:
NULLevaluation (is.na(),is.null()), logical operators (!,&,&&,|,||), and comparison operators (==,!=,<,>,>=,<=)
-
-
MySQL:
copy_to()(viadb_write_table()) correctly translates logical variables to integers (#3151). -
odbc: improved
n()translation in windowed context. -
SQLite: improved
na_iftranslation (@cwarden) -
PostgreSQL: translation for
grepl()added (@zozlak) -
Oracle: changed VARVHAR to VARCHAR2 datatype (@washcycle, #66)
dbplyr 1.1.0
New features
-
full_join()over non-overlapping columnsby = character()translated toCROSS JOIN(#2924). -
case_when()now translates to SQL "CASE WHEN" (#2894) -
x %in% c(1)now generates the same SQL asx %in% 1(#2898). -
New
window_order()andwindow_frame()give you finer control over the window functions that dplyr creates (#2874, #2593). -
Added SQL translations for Oracle (@edgararuiz).
Minor improvements and bug fixes
-
x %in% c(1)now generates the same SQL asx %in% 1(#2898). -
head(tbl, 0)is now supported (#2863). -
select()ing zero columns gives a more information error message (#2863). -
Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823).
-
PostgreSQL gains a better translation for
round()(#60). -
Added custom
db_analyze_table()for MS SQL, Oracle, Hive and Impala (@edgararuiz) -
Added support for
sd()for aggregate and window functions (#2887) (@edgararuiz) -
You can now use the magrittr pipe within expressions, e.g.
mutate(mtcars, cyl %>% as.character()). -
If a translation was supplied for a summarise function, but not for the equivalent windowed variant, the expression would be translated to
NULLwith a warning. Nowsql_variant()checks that all aggregate functions have matching window functions so that correct translations or clean errors will be generated (#2887)
dbplyr 1.0.0
New features
-
tbl()andcopy_to()now work directly with DBI connections (#2423, #2576), so there is no longer a need to generate a dplyr src.library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2
-
glimpse()now works with remote tables (#2665) -
dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation.
-
compute()andcollapse()now preserve the "ordering" of rows. This only affects the computation of window functions, as the rest of SQL does not care about row order (#2281). -
copy_to()gains anoverwriteargument which allows you to overwrite an existing table. Use with care! (#2296) -
New
in_schema()function makes it easy to refer to tables in schema:in_schema("my_schema_name", "my_table_name").
Deprecated and defunct
query()is no longer exported. It hasn't been useful for a while so this shouldn't break any code.
Verb-level SQL generation
-
Partial evaluation occurs immediately when you execute a verb (like
filter()ormutate()) rather than happening when the query is executed (#2370). -
mutate.tbl_sql()will now generate as many subqueries as necessary so that you can refer to variables that you just created (like in mutate with regular dataframes) (#2481, #2483). -
SQL joins have been improved:
-
SQL joins always use the
ON ...syntax, avoidingUSING ...even for natural joins. Improved handling of tables with columns of the same name (#1997, @javierluraschi). They now generate SQL more similar to what you'd write by hand, eliminating a layer or two of subqueries (#2333) -
[API] They now follow the same rules for including duplicated key variables that the data frame methods do, namely that key variables are only kept from
x, and never fromy(#2410) -
[API] The
sql_join()generic now gains avarsargument which lists the variables taken from the left and right sides of the join. If you have a customsql_join()method, you'll need to update how your code generates joins, following the template insql_join.generic(). -
full_join()throws a clear error when you attempt to use it with a MySQL backend (#2045) -
right_join()andfull_join()now return results consistent with local data frame sources when there are records in the right table with no match in the left table.right_join()returns values ofbycolumns from the right table.full_join()returns coalesced values ofbycolumns from the left and right tables (#2578, @ianmcook)
-
-
group_by()can now perform an inline mutate for database backends (#2422). -
The SQL generation set operations (
intersect(),setdiff(),union(), andunion_all()) have been considerably improved.By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching variables with NULL (#2556).
-
rename()andgroup_by()now combine correctly (#1962) -
tbl_lazy()andlazy_tbl()have been exported. These help you test generated SQL with out an active database connection. -
ungroup()correctly resets grouping variables (#2704).
Vector-level SQL generation
-
New
as.sql()safely coerces an input to SQL. -
More translators for
as.character(),as.integer()andas.double()(#2775). -
New
ident_q()makes it possible to specifier identifiers that do not need to be quoted. -
Translation of inline scalars:
-
Logical values are now translated differently depending on the backend. The default is to use "true" and "false" which is the SQL-99 standard, but not widely support. SQLite translates to "0" and "1" (#2052).
-
Infand-Infare correctly escaped -
Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004).
-
Quoting defaults to
DBI::dbEscapeString()andDBI::dbQuoteIdentifier()respectively.
-
-
::and:::are handled correctly (#2321) -
x %in% 1is now correctly translated tox IN (1)(#511). -
ifelse()andif_else()use correct argument names in SQL translation (#2225). -
ident()now returns an object with classc("ident", "character"). It no longer contains "sql" to indicate that this is not already escaped. -
is.na()andis.null()gain extra parens in SQL translation to preserve correct precedence (#2302). -
[API]
log(x, b)is now correctly translated to the SQLlog(b, x)(#2288). SQLite does not support the 2-argument log function so it is translated tolog(x) / log(b). -
nth(x, i)is now correctly translated tonth_value(x, i). -
n_distinct()now accepts multiple variables (#2148). -
[API]
substr()is now translated to SQL, correcting for the difference in the third argument. In R, it's the position of the last character, in SQL it's the length of the string (#2536). -
win_over()escapes expression using current database rules.
Backends
-
copy_to()now usesdb_write_table()instead ofdb_create_table()anddb_insert_into().db_write_table.DBIConnection()usesdbWriteTable(). -
New
db_copy_to(),db_compute()anddb_collect()allow backends to override the entire database process behindcopy_to(),compute()andcollect().db_sql_render()allow additional control over the SQL rendering process. -
All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points.
-
sql_escape_logical()allows you to control the translation of literal logicals (#2614). -
src_desc()has been replaced bydb_desc()and now dispatches on the connection, eliminating the last method that required dispatch on the class of the src. -
win_over(),win_rank(),win_recycled(),win_cumulative(),win_current_group()andwin_current_order()are now exported. This should make it easier to provide customised SQL for window functions (#2051, #2126). -
SQL translation for Microsoft SQL Server (@edgararuiz)
-
SQL translation for Apache Hive (@edgararuiz)
-
SQL translation for Apache Impala (@edgararuiz)
Minor bug fixes and improvements
-
collect()once again defaults to return all rows in the data (#1968). This makes it behave the same asas.data.frame()andas_tibble(). -
collect()only regroups by variables present in the data (#2156) -
collect()will automatically LIMIT the result to then, the number of rows requested. This will provide the query planner with more information that it may be able to use to improve execution time (#2083). -
common_by()gets a better error message for unexpected inputs (#2091) -
copy_to()no longer checks that the table doesn't exist before creation, instead preferring to fall back on the database for error messages. This should reduce both false positives and false negative (#1470) -
copy_to()now succeeds for MySQL if a character column containsNA(#1975, #2256, #2263, #2381, @demorenoc, @eduardgrebe). -
copy_to()now returns it's output invisibly (since you're often just calling for the side-effect). -
distinct()reports improved variable information for SQL backends. This means that it is more likely to work in the middle of a pipeline (#2359). -
Ungrouped
do()on database backends now collects all data locally first (#2392). -
Call
dbFetch()instead of the deprecatedfetch()(#2134). UseDBI::dbExecute()for non-query SQL commands (#1912) -
explain()andshow_query()now invisibly return the first argument, making them easier to use inside a pipeline. -
print.tbl_sql()displays ordering (#2287) and prints table name, if known. -
print(df, n = Inf)andhead(df, n = Inf)now work with remote tables (#2580). -
db_desc()andsql_translate_env()get defaults for DBIConnection. -
Formatting now works by overriding the
tbl_sum()generic instead ofprint(). This means that the output is more consistent with tibble, and thatformat()is now supported also for SQL sources (tidyverse/dbplyr#14).
Lazy ops
-
[API] The signature of
op_basehas changed toop_base(x, vars, class) -
[API]
translate_sql()andpartial_eval()have been refined:-
translate_sql()no longer takes a vars argument; instead callpartial_eval()yourself. -
Because it no longer needs the environment
translate_sql()_now works with a list of dots, rather than alazy_dots. -
partial_eval()now takes a character vector of variable names rather than a tbl. -
This leads to a simplification of the
opdata structure: dots is now a list of expressions rather than alazy_dots.
-
-
[API]
op_vars()now returns a list of quoted expressions. This enables escaping to happen at the correct time (i.e. when the connection is known).