Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the grid functions retain NA values in a factor? #1275

Closed
DavisVaughan opened this issue Dec 15, 2021 · 3 comments · Fixed by #1276
Closed

Should the grid functions retain NA values in a factor? #1275

DavisVaughan opened this issue Dec 15, 2021 · 3 comments · Fixed by #1276
Labels
ask :bowtie: bug an unexpected problem or unintended behavior grids #️⃣ expanding, nesting, crossing, ...

Comments

@DavisVaughan
Copy link
Member

DavisVaughan commented Dec 15, 2021

It currently drops them because of the way sorted_unique() works. (This also would affect forcats::fct_unique() since the implementation came from there). We get them back in complete() because that does a full_join() and the NA was in the original data.

library(tidyr)

df <- tibble(a = factor(c("x", NA), levels = c("x", "y")))

# dropped `NA`
expand(df, a)
#> # A tibble: 2 × 1
#>   a    
#>   <fct>
#> 1 x    
#> 2 y

# we get it back from the full-join
complete(df, a)
#> # A tibble: 3 × 1
#>   a    
#>   <fct>
#> 1 x    
#> 2 y    
#> 3 <NA>

tidyr:::sorted_unique
#> function (x) 
#> {
#>     if (is.factor(x)) {
#>         factor(levels(x), levels(x), exclude = NULL, ordered = is.ordered(x))
#>     }
#>     else if (is_bare_list(x)) {
#>         vec_unique(x)
#>     }
#>     else {
#>         vec_sort(vec_unique(x))
#>     }
#> }
#> <bytecode: 0x7fa27f1ac8b0>
#> <environment: namespace:tidyr>

I am fairly certain we want to keep them, since it keeps NAs with non-factor types?

library(tidyr)

df <- tibble(a = c(NA, 1))

# keeps NA at the end
expand(df, a)
#> # A tibble: 2 × 1
#>       a
#>   <dbl>
#> 1     1
#> 2    NA
@DavisVaughan DavisVaughan added ask :bowtie: bug an unexpected problem or unintended behavior grids #️⃣ expanding, nesting, crossing, ... labels Dec 15, 2021
@DavisVaughan
Copy link
Member Author

This affects crossing() and nesting() too

library(tidyr)

a <- factor(c("x", NA), levels = c("x", "y"))

crossing(a)
#> # A tibble: 2 × 1
#>   a    
#>   <fct>
#> 1 x    
#> 2 y

Created on 2021-12-15 by the reprex package (v2.0.1)

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Dec 15, 2021

If we kept the NA, we'd add it to the end of the levels() set if:

  • The original data contained an NA AND
  • The factor levels() don't already contain NA as a level

This is a little tricky, but I think that is correct

@DavisVaughan DavisVaughan changed the title Should expand() retain NA values in a factor? Should the grid functions retain NA values in a factor? Dec 15, 2021
@DavisVaughan
Copy link
Member Author

TIL base R has addNA() for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ask :bowtie: bug an unexpected problem or unintended behavior grids #️⃣ expanding, nesting, crossing, ...
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant