Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fct_reorder() yields unexpected result in presence of missing values #315

Closed
DanChaltiel opened this issue Aug 10, 2022 · 0 comments · Fixed by #341
Closed

fct_reorder() yields unexpected result in presence of missing values #315

DanChaltiel opened this issue Aug 10, 2022 · 0 comments · Fixed by #341
Labels
feature a feature request or enhancement
Milestone

Comments

@DanChaltiel
Copy link

Hi,

When using fct_reorder() in presence of missing values, you often do not get the expected result.

For instance, in the following code, the "blue" level gets an NA summary and is therefore sent to the last level of the result. In larger datasets, where missing values happen everywhere, this results in fct_reorder() doing frustratingly nothing.

library(tidyverse)
df = tribble(
  ~color,     ~a,
  "purple",      1,
  "purple",      2,
  "blue",     3, #NA
  "blue",     4,
  "green",    5,
  "green",    6
)
df$color = factor(df$color)
df$color %>% levels
#> [1] "blue"   "green"  "purple"
fct_reorder(df$color, df$a) %>% levels
#> [1] "purple" "blue"   "green"

df$a[3]=NA
fct_reorder(df$color, df$a) %>% levels
#> [1] "purple" "green"  "blue"
fct_reorder(df$color, df$a, na.rm=TRUE) %>% levels
#> [1] "purple" "blue"   "green"

Created on 2022-08-10 by the reprex package (v2.0.1)

This is especially unexpected as the default function, median(), has na.rm=FALSE by default. Using other common summary functions like min() and max() has the same problem.

There is a mention of this in the documentation (... Other arguments passed on to .fun. A common argument is na.rm = TRUE.), but I don't think this is explicit enough.

Could there be some kind of warning to suggest we add na.rm=TRUE? For instance if(any(is.na(summary))) warn("missing").

Otherwise, maybe this should be mentioned up in the description, for instance something like "Any missing value returned by the summary function for a level will cause this level to be sent to the end." (ok that's not well written but you get the point)

You might even want the user to explicitly opt-in for na.rm=FALSE, and by default inject na.rm=TRUE to the summary function if na.rm is in formals(.fun).. This is a bit invasive, I'll give you that, but I cannot see any real use case where na.rm=FALSE could be wanted.

@hadley hadley added the feature a feature request or enhancement label Dec 21, 2022
@hadley hadley added this to the v1.0.0 milestone Jan 3, 2023
@hadley hadley changed the title fct_reorder() yields unexpected result in presence of missing values fct_reorder() yields unexpected result in presence of missing values Jan 3, 2023
hadley added a commit that referenced this issue Jan 9, 2023
* Drop missing values in `.x` by default. Fixes #315.
* New `.na_last` argument to control position of NAs. Fixes #266.
hadley added a commit that referenced this issue Jan 10, 2023
* Drop missing values in `.x` by default. Fixes #315.
* New `.default` argument to control position of NAs. Fixes #266.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants