copy_to()now work directly with DBI connections (#2423, #2576),
so there is no longer a need to generate a dplyr src.
library(dplyr) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars) mtcars2 <- tbl(con, "mtcars") mtcars2
glimpse()now works with remote tables (#2665)
dplyr has gained a basic SQL optimiser, which collapses certain nested
SELECT queries into a single query (#1979). This will improve query
execution performance for databases with less sophisticated query optimisers,
and fixes certain problems with ordering and limits in subqueries (#1979).
A big thanks goes to @hhoeflin for figuring out this optimisation.
collapse()now preserve the "ordering" of rows.
This only affects the computation of window functions, as the rest
of SQL does not care about row order (#2281).
overwriteargument which allows you to overwrite
an existing table. Use with care! (#2296)
in_schema()function makes it easy to refer to tables in schema:
Deprecated and defunct
query()is no longer exported. It hasn't been useful for a while
so this shouldn't break any code.
Verb-level SQL generation
Partial evaluation occurs immediately when you execute a verb (like
mutate()) rather than happening when the query is executed
mutate.tbl_sql()will now generate as many subqueries as necessary so
that you can refer to variables that you just created (like in mutate
with regular dataframes) (#2481, #2483).
SQL joins have been improved:
SQL joins always use the
ON ...syntax, avoiding
USING ...even for
natural joins. Improved handling of tables with columns of the same name
(#1997, @javierluraschi). They now generate SQL more similar to what you'd
write by hand, eliminating a layer or two of subqueries (#2333)
[API] They now follow the same rules for including duplicated key variables
that the data frame methods do, namely that key variables are only
x, and never from
sql_join()generic now gains a
varsargument which lists
the variables taken from the left and right sides of the join. If you
have a custom
sql_join()method, you'll need to update how your
code generates joins, following the template in
full_join()throws a clear error when you attempt to use it with a
MySQL backend (#2045)
full_join()now return results consistent with
local data frame sources when there are records in the right table with
no match in the left table.
right_join()returns values of
from the right table.
full_join()returns coalesced values of
columns from the left and right tables (#2578, @ianmcook)
group_by()can now perform an inline mutate for database backends (#2422).
The SQL generation set operations (
union_all()) have been considerably improved.
By default, the component SELECT are surrounded with parentheses, except on
SQLite. The SQLite backend will now throw an error if you attempt a set operation
on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching
variables with NULL (#2556).
group_by()now combine correctly (#1962)
lazy_tbl()have been exported. These help you test
generated SQL with out an active database connection.
ungroup()correctly resets grouping variables (#2704).
Vector-level SQL generation
as.sql()safely coerces an input to SQL.
More tranlators for
ident_q()makes it possible to specifier identifiers that do not
need to be quoted.
Translation of inline scalars:
Logical values are now translated differently depending on the backend.
The default is to use "true" and "false" which is the SQL-99 standard,
but not widely support. SQLite translates to "0" and "1" (#2052).
-Infare correctly escaped
Better test for whether or not a double is similar to an integer and
hence needs a trailing 0.0 added (#2004).
Quoting defaults to
:::are handled correctly (#2321)
x %in% 1is now correctly translated to
x IN (1)(#511).
if_else()use correct argument names in SQL translation
ident()now returns an object with class
c("ident", "character"). It
no longer contains "sql" to indicate that this is not already escaped.
is.null()gain extra parens in SQL translation to preserve
correct precedence (#2302).
log(x, b)is now correctly translated to the SQL
SQLite does not support the 2-argument log function so it is translated
log(x) / log(b).
nth(x, i)is now correctly translated to
n_distinct()now accepts multiple variables (#2148).
substr()is now translated to SQL, correcting for the difference
in the third argument. In R, it's the position of the last character,
in SQL it's the length of the string (#2536).
win_over()escapes expression using current database rules.
db_collect()allow backends to
override the entire database process behind
db_sql_render()allow additional control over the SQL
All generics whose behaviour can vary from database to database now
provide a DBIConnection method. That means that you can easily scan
the NAMESPACE to see the extension points.
sql_escape_logical()allows you to control the translation of
literal logicals (#2614).
src_desc()has been replaced by
db_desc()and now dispatches on the
connection, eliminating the last method that required dispatch on the class
of the src.
win_current_order()are now exported. This
should make it easier to provide customised SQL for window functions
SQL translation for Microsoft SQL Server (@edgararuiz)
SQL translation for Apache Hive (@edgararuiz)
SQL translation for Apache Impala (@edgararuiz)
Minor bug fixes and improvements
collect()once again defaults to return all rows in the data (#1968).
This makes it behave the same as
collect()only regroups by variables present in the data (#2156)
collect()will automatically LIMIT the result to the
n, the number of
rows requested. This will provide the query planner with more information
that it may be able to use to improve execution time (#2083).
common_by()gets a better error message for unexpected inputs (#2091)
copy_to()no longer checks that the table doesn't exist before creation,
intead preferring to fall back on the database for error messages. This
should reduce both false positives and false negative (#1470)
copy_to()now returns it's output invisibly (since you're often just
calling for the side-effect).
distinct()reports improved variable information for SQL backends. This
means that it is more likely to work in the middle of a pipeline (#2359).
do()on database backends now collects all data locally first
dbFetch()instead of the deprecated
DBI::dbExecute()for non-query SQL commands (#1912)
show_query()now invisibly return the first argument,
making them easier to use inside a pipeline.
print.tbl_sql()displays ordering (#2287) and prints table name, if known.
print(df, n = Inf)and
head(df, n = Inf)now work with remote tables
sql_translate_env()get defaults for DBIConnection.
Formatting now works by overriding the
tbl_sum()generic instead of
print(). This means that the output is more consistent with tibble, and that
format()is now supported also for SQL sources (#14).
[API] The signature of
op_basehas changed to
op_base(x, vars, class)
partial_eval()have been refined:
translate_sql()no longer takes a vars argument; instead call
Because it no longer needs the environment
works with a list of dots, rather than a
partial_eval()now takes a character vector of variable names
rather than a tbl.
This leads to a simplification of the
dots is now a list of expressions rather than a
op_vars()now returns a list of quoted expressions. This
enables escaping to happen at the correct time (i.e. when the connection