-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring back legacy arrange()
behavior with dplyr.legacy_locale
#6327
Bring back legacy arrange()
behavior with dplyr.legacy_locale
#6327
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we can drop dplyr.locale
— I don't think we need some escape hatch to control arrange done elsewhere because in most cases, you can just re-arrange, and it's not likely to be that expensive (because you're probably rearranging for display, so there's not going to be that much data).
R/arrange.R
Outdated
#' - Defaults to [dplyr_locale()], which uses the `"C"` locale unless this is | ||
#' explicitly overriden. See the help page for [dplyr_locale()] for the | ||
#' exact details. | ||
#' - If `NULL`, the default, then the `"C"` locale will be used unless this is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow "legacy"
here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the only way to get legacy behavior in group_by()
is to set the dplyr.legacy_locale
global option, I think I'd like to keep that as the only way to get legacy behavior in arrange()
too. I think it should be "all or nothing" legacy behavior that gets applied to both.
46b0156
to
98d18ec
Compare
if (length(dots) == 0L) { | ||
out <- seq_len(nrow(data)) | ||
return(out) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this because we no longer need an early exit when there aren't any ...
supplied. This was problematic before because of the paste0()
recycling issue and some problems with order()
with no inputs, but I've nicely worked around those now.
Okay I have removed support for Also updated my original comment in this PR to reflect the new state of how I'm going to request one last review to ensure that this all looks good |
NEWS.md
Outdated
change improves reproducibility across R sessions and operating systems | ||
|
||
* `arrange()` and `group_by()` now both default to using the C locale when | ||
ordering or grouping character vectors. This brings _substantial_ performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to include some benchmarks in the blog post.
I wonder if we should say that this makes dplyr consistent with data.table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another pass through the NEWS, simplifying the individual group-by and arrange bullets, since the combined bullet does a better job of talking about the C locale parts
I mentioned dplyr/data.table consistency in the 3rd bullet as part of a list of reasons why we made this change
@@ -176,8 +166,22 @@ arrange_rows <- function(data, | |||
}) | |||
|
|||
directions <- directions[quo_names %in% names(data)] | |||
|
|||
if (is.null(locale) && dplyr_legacy_locale()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could simplify the logic by doing this earlier — if the legacy situation, we currently jump through a bunch of hoops to remove desc()
calls before eventually adding them back in again. But maybe it's not worth it because of the transmute gymnastics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we add them back in?
I think it has to happen here, both paths do the transmute()
bits and both paths need to strip out desc()
.
Unless im missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you mean at the end of dplyr_proxy_order_legacy()
. CRAN dplyr does it this way too, so I'm inclined to keep it. I think figuring out the transmute stuff in combination with moving this code is going to be more challenging than its worth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good.
- Matches legacy `group_by()` behavior which uses the same option - No longer export `dplyr_locale()`, which is no longer useful - Default `.locale` to `NULL` instead of `dplyr_locale()` - If multiple locale options are set, hierarchy of precedence is `.locale`, `dplyr.locale`, `dplyr.legacy_locale`
Co-authored-by: Hadley Wickham <h.wickham@gmail.com>
4b4c844
to
c9b34a0
Compare
Follow up to #6297 and #6263
So the general idea is:
group_by()
andarrange()
sort in the C localedplyr.legacy_locale = TRUE
arrange()
locale with an explicit.locale
argument,or withdplyr.locale = "fr"
, but we prefer the formergroup_by()
(unless you revert to legacy behavior)A few other notes about this PR:
dplyr_locale()
, which is no longer useful.locale
toNULL
instead ofdplyr_locale()
.locale
,,dplyr.locale
dplyr.legacy_locale
One question: Do we even need
dplyr.locale
anymore? Is being able to revert to the legacy behavior good enough? If we thinkdplyr.locale
would stick around permanently, then it makes sense to keep it too, since I would likedplyr.legacy_locale
to phase out.dplyr.locale
might be useful as a permanent escape hatch for the case where you are calling something that callsarrange()
for you and doesn't expose.locale
. (Update: We removeddplyr.locale
)