Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a .locale argument to arrange() and implement dplyr_locale() #6263

Merged
merged 9 commits into from
Jun 13, 2022

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented May 10, 2022

Supersedes and Closes #5942 (once again. It just got too out of date to be useful.)
Closes #4962
Closes #5090
Part of #5808

This PR adds a .locale argument to arrange(), with 3 possible options:

  • dplyr_locale(), the default. See below.
  • A single stringi locale identifier, like "en_US" or "fr". This requires stringi.
  • "C", for sorting in the C locale. This does not require stringi.

dplyr_locale() is a new exported helper and returns a string representing the default locale to use. It has the following properties:

  • It defaults to "C"
  • If the dplyr.locale option is set to a stringi locale identifier or "C", that overrides the above default behavior.

This is a breaking change, as we no longer respect the LC_COLLATE option.

  • Run revdeps one more time after we decide that this is The Way - "1bfa0724-c563-48de-9068-51476e19cf82" (Update - no changes to worse in revdeps)

@DavisVaughan

This comment was marked as outdated.

This allows us to isolate changes in this PR to just `arrange()`, keeping other usage of `vec_order()` the same for now.
Even though these are superseded, this should help ease the transition a little, since without this argument it would be difficult to choose a different locale, and if stringi wasn't installed then you'd unconditionally get a warning you couldn't silence
With the intention being that it shouldn't show up on the pkgdown site reference page. We expect most people to get to this page through `arrange()`.
This is reproducible across all R sessions and OSes, and is much faster, which makes it a good default.

It will also make the default of `arrange()` continue to align with what `group_by() + summarize()` returns, since that will also unconditionally use the C locale.
This is a technical detail of `vec_order_radix()` so it is better to keep the conversion close to where we actually call it. Should make calling `arrange_rows()` on its own a little easier too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance drop-off for arrange()
1 participant