add_count()is now generic (#5837).
if_all()abort when a predicate is mistakingly used as
Multiple calls to
if_all()in the same expression are now
properly disambiguated (#5782).
if_all()expressions. This greatly
improves performance with grouped data frames.
across()now inlines lambda-formulas. This is slightly more performant and
will allow more optimisations in the future.
select()no longer creates duplicate variables when renaming a variable
to the same name as a grouping variable (#5841).
Fixed quosure handling in
dplyr::group_by()that caused issues with extra
row-wise data frames of 0 rows and list columns are supported again (#5804).
Fixed edge case of
weight_by=is used and there
0 rows (#5729).
across()can again use columns in functions defined inline (#5734).
Using testthat 3rd edition.
Fixed bugs introduced in
across()in previous version (#5765).
group_by()keeps attributes unrelated to the grouping (#5760).
Improved performance for
across(). This makes
mutate(across())perform as well as the superseded colwise equivalents (#5697).
summarise()silently ignores NULL results (#5708).
Fixed a performance regression in
mutate()when warnings occur once per
group (#5675). We no longer instrument warnings with debugging information
mutate()is called within
summarise()no longer informs when the result is ungrouped (#5633).
tally()are now generic.
Removed default fallbacks to lazyeval methods; this will yield better error messages when
you call a dplyr function with the wrong input, and is part of our long term
plan to remove the deprecated lazyeval interface.
Improved performance with many columns, with a dynamic data mask using active
bindings and lazy chops (#5017).
mutate()and friends preserves row names in data frames once more (#5418).
relocate()can rename columns it relocates (#5569).
group_by()have better error messages when the mutate step fails (#5060).
between()is not vectorised (#5493).
across()issue where data frame columns would could not be referred to
all_of()in the nested case (
across()handles data frames with 0 columns (#5523).
mutate()always keeps grouping variables, unconditional to
dplyr now depends on R 3.3.0
cur_data()but includes the grouping variables (#5342).
tally()no longer automatically weights by column
present (#5298). dplyr 1.0.0 introduced this behaviour because of Hadley's
faulty memory. Historically
tally()automatically weighted and
did not, but this behaviour was accidentally changed in 0.8.2 (#4408) so that
neither automatically weighted by
n. Since 0.8.2 is almost a year old,
and the automatically weighting behaviour was a little confusing anyway,
we've removed it from both
wt = n()is now deprecated; now just omit the
coalesce()now supports data frames correctly (#5326).
The call stack is preserved on error. This makes it possible to
into problematic code called from dplyr verbs (#5308).
bind_cols()no longer converts to a tibble, returns a data frame if the input is a data frame.
mutate()use vctrs coercion
rules. There are two main user facing changes:
Combining factor and character vectors silently creates a character
vector; previously it created a character vector with a warning.
Combining multiple factors creates a factor with combined levels;
previously it created a character vector with a warning.
bind_rows()and other functions use vctrs name repair, see
Data frames, tibbles and grouped data frames are no longer considered equal, even if the data is the same.
Equality checks for data frames no longer ignore row order or groupings.
all.equal()internally. When comparing data frames, tests that used to pass may now fail.
distinct()keeps the original column order.
distinct()on missing columns now raises an error, it has been a compatibility warning for a long time.
group_modify()puts the grouping variable to the front.
row_number()can no longer be called directly when dplyr is not loaded,
and this now generates an error:
dplyr::mutate(mtcars, x = n()).
Fix by prefixing with
dplyr::mutate(mtcars, x = dplyr::n())
The old data format for
grouped_dfis no longer supported. This may affect you if you have serialized grouped data frames to disk, e.g. with
saveRDS()or when using knitr caching.
lag()are stricter about their inputs.
Extending data frames requires that the extra class or classes are added first, not last.
Having the exta class at the end causes some vctrs operations to fail with a mesage like:
Input must be a vector, not a `<data.frame/...>` object
right_join()no longer sorts the rows of the resulting tibble according to the order of the RHS
byargument in tibble
cur_group_rows()) provide a full set of options to you access information
about the "current" group in dplyr verbs. They are inspired by
rows_delete()) provide a new API to insert and delete rows from a second data frame or table. Support for updating mutable backends is planned (#4654).
summarise()create multiple columns from a single expression
if you return a data frame (#2326).
rename()use the latest version of the tidyselect interface.
Practically, this means that you can now combine selections using Boolean
|), and use predicate functions with
where(is.character)) to select variables by type (#4680). It also makes
it possible to use
rename()to repair data frames with
duplicated names (#4615) and prevents you from accidentally introducing
duplicate names (#4643). This also means that dplyr now re-exports
slice()gains a new set of helpers:
slice_tail()select the first and last rows, like
tail(), but return
nrows per group.
slice_sample()randomly selects rows, taking over from
slice_max()select the rows with the minimum or
maximum values of a variable, taking over from the confusing
summarise()can create summaries of greater than length 1 if you use a
summary function that returns multiple values.
.groups=argument to control the grouping structure.
relocate()verb makes it easy to move columns around within a data
rename_with()is designed specifically for the purpose of renaming
selected columns with a function (#4771).
ungroup()can now selectively remove grouping variables (#3760).
mutate()(for data frames only), gains experimental new arguments
.afterthat allow you to control where the new columns are
mutate()(for data frames only), gains an experimental new argument
.keepthat allows you to control which variables are kept from
.keep = "all"is the default; it keeps all variables.
.keep = "none"retains no input variables (except for grouping keys),
so behaves like
.keep = "unused"keeps only variables
not used to make new columns.
.keep = "used"keeps only the input variables
used to create new columns; it's useful for double checking your work (#3721).
with_groups()makes it easy to temporarily group or
across()that can be used inside
and other verbs to apply a function (or a set of functions) to a selection of
vignette("colwise")for more details.
c_across()that can be used inside
in row-wise data frames to easily (e.g.) compute a row-wise mean of all
numeric variables. See
vignette("rowwise")for more details.
rowwise()is no longer questioning; we now understand that it's an
important tool when you don't have vectorised code. It now also allows you to
specify additional variables that should be preserved in the output when
summarising (#4723). The rowwise-ness is preserved by all operations;
you need to explicit drop it with
nest_by(). It has the same interface as
but returns a rowwise data frame of grouping keys, supplemental with a
list-column of data frames containing the rest of the data.
The implementation of all dplyr verbs have been changed to use primitives
provided by the vctrs package. This makes it easier to add support for
new types of vector, radically simplifies the implementation, and makes
all dplyr verbs more consistent.
The place where you are mostly likely to be impacted by the coercion
changes is when working with factors in joins or grouped mutates:
now when combining factors with different levels, dplyr creates a new
factor with the union of the levels. This matches base R more closely,
and while perhaps strictly less correct, is much more convenient.
dplyr dropped its two heaviest dependencies: Rcpp and BH. This should make
it considerably easier and faster to build from source.
The implementation of all verbs has been carefully thought through. This
mostly makes implementation simpler but should hopefully increase consistency,
and also makes it easier to adapt to dplyr to new data structures in the
new future. Pragmatically, the biggest difference for most people will be
that each verb documents its return value in terms of rows, columns, groups,
and data frame attributes.
Row names are now preserved when working with data frames.
group_by()uses hashing from the
Grouped data frames now have
re-generate the underlying grouping. Note that modifying grouping variables
in multiple steps (i.e.
df$grp1 <- 1; df$grp2 <- 1) will be inefficient
since the data frame will be regrouped after each modification.
[.grouped_dfnow regroups to respect any grouping columns that have
been removed (#4708).
summarise()can now modify grouping variables (#4709).
group_by()does not create an arbitrary NA group when grouping by factors
drop = TRUE(#4460).
- All deprecations now use the lifecycle,
that means by default you'll only see a deprecation warning once per session,
and you can control with
options(lifecycle_verbosity = x)where
xis one of NULL, "quiet", "warning", and "error".
id(), deprecated in dplyr 0.5.0, is now defunct.
failwith(), deprecated in dplyr 0.7.0, is now defunct.
nasahave been pulled out into a separate cubelyr package
Use of pkgconfig for setting
na_matchesargument to join functions is now
deprecated (#4914). This was rarely used, and I'm now confident that the
default is correct for R.
dropargument has been deprecated because it didn't
actually affect the output.
add_rownames(): please use
tbl_df(): please use
eval_tbls2()are now deprecated. That were only used in a handful of
packages, and we now believe that you're better off performing comparisons
more directly (#4675).
combine(): please use
funs(): please use
group_by(add = ): please use
group_by(.dots = )/
group_by_prepare(.dots = ): please use
The use of zero-arg
group_indices()to retrieve the group id for the
"current" group is deprecated; instead use
Passing arguments to
group_indices()to change the
grouping has been deprecated, instead do grouping first yourself.
changes(): please use
progress_estimated()is soft deprecated; it's not the responsibility of
dplyr to provide progress bars (#4935).
src_local()has been deprecated; it was part of an approach to testing
dplyr backends that didn't pan out.
src_sqlite()has been deprecated.
We've recommended against them for some time. Instead please use the approach
described at http://dbplyr.tidyverse.org/.
The scoped helpers (all functions ending in
been superseded by
across(). This dramatically reduces the API surface for
dplyr, while at the same providing providing a more flexible and less
error-prone interface (#4769).
select_*()have been superseded by
do()is superseded in favour of
sample_frac()have been superseded by
?sample_nfor details about why, and for examples converting from
old to new usage.
top_n()has been superseded by
for details about why, and how to convert old to new usage (#4494).
all_equal()is questioning; it solves a problem that no longer seems
rowwise()is no longer questioning.
vignette("programming")has been completely rewritten to reflect our
latest vocabulary, the most recent rlang features, and our current
recommendations. It should now be substantially easier to program with
Minor improvements and bug fixes
dplyr now has a rudimentary, experimental, and stop-gap, extension mechanism
dplyr no longer provides a
all.equal.tbl_df()method. It never should have
done so in the first place because it owns neither the generic nor the class.
It also provided a problematic implementation because, by default, it
ignored the order of the rows and the columns which is usually important.
This is likely to cause new test failures in downstream packages; but on
the whole we believe those failures to either reflect unexpected behaviour
or tests that need to be strengthened (#2751).
coalesce()now uses vctrs recycling and common type coercion rules (#5186).
add_count()do a better job of preserving input class
and attributes (#4086).
distinct()errors if you request it use variables that don't exist
(this was previously a warning) (#4656).
summarise()get better error messages.
filter()handles data frame results when all columns are logical vectors
by reducing them with
&(#4678). In particular this means
be used in
that you can optionally choose to keep both sets of join keys (#4589). This is
useful when you want to figure out which rows were missing from either side.
Join functions can now perform a cross-join by specifying
by = character()
list()for ungrouped data; previously it returned
NULLwhich was type-unstable (when there are groups it returns a list
The first argument of
has been changed to
.datafor consistency with other generics.
group_keys.rowwise_df()gives a 0 column data frame with
group_map()is now a generic (#4576).
group_by(..., .add = TRUE)replaces
group_by(..., add = TRUE),
with a deprecation message. The old argument name was a mistake because
it prevents you from creating a new grouping var called
it violates our naming conventions (#4137).
setequal()generics are now
imported from the generics package. This reduces a conflict with lubridate.
order_by()gives an informative hint if you accidentally call it instead
count()now message if the default output
exists in the data frame. To quiet the message, you'll need to supply an
name(#4284). You can override the default weighting to using a
constant by setting
wt = 1.
starwarsdataset now does a better job of separating biological sex from
gender identity. The previous
gendercolumn has been renamed to
since it actually describes the individual's biological sex. A new
column encodes the actual gender identity using other information about
the Star Wars universe (@MeganBeckett, #4456).
Better performance for extracting slices of factors and ordered factors (#4501).
rename_all()call the function with a simple character
vector, not a
ntile()is now more consistent with database implementations if the buckets have irregular size (#4495).