Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crossing() removes NA from factor levels #410

Closed
echasnovski opened this issue Feb 10, 2018 · 5 comments
Closed

crossing() removes NA from factor levels #410

echasnovski opened this issue Feb 10, 2018 · 5 comments

Comments

@echasnovski
Copy link
Contributor

@echasnovski echasnovski commented Feb 10, 2018

library(tidyr)
packageVersion("tidyr")
#> [1] '0.8.0.9000'
x_fac <- factor(c(1, NA), exclude = NULL)
levels(x_fac)
#> [1] "1" NA
x_cross <- crossing(x_fac)
levels(x_cross$x_fac)
#> [1] "1"

The root of this seems to be in ulevels():

ulevels(factor(c(1, NA), exclude = NULL))
#> [1] 1    <NA>
#> Levels: 1

Adding exclude = NULL in factor() inside ulevels() solves the issue (and passes all current tests).

If this is an unintended behavior, I am ready to make a PR.

@batpigandme

This comment has been hidden.

@echasnovski

This comment has been hidden.

@echasnovski

This comment has been hidden.

@echasnovski
Copy link
Contributor Author

@echasnovski echasnovski commented Feb 14, 2018

This might be a little more complicated. Modification of ulevels() with exclude = NULL changes factor levels if there is NA in vector but not in levels. It seems very important to always preserve factor levels in crossing() (and hence expand() and complete()), but they also should account for present NAs in vector. So mayby this version of ulevels() is better?

ulevels <- function(x) {
  if (is.factor(x)) {
    orig_levs <- levels(x)
    x <- addNA(x, ifany = TRUE)
    levs <- levels(x)
    factor(levs, levels = orig_levs, ordered = is.ordered(x), exclude = NULL)
  } else {
    sort(unique(x), na.last = TRUE)
  }
}

This version also passes all tests. With it, crossing() preserves factor levels. Also complete() will not convert factors to characters (due to dplyr::left_join() behaviour):

# Code is RUN WITH MODIFIED version of tidyr
library(tidyr)

# `crossing()` preserves levels
x_na_lev <- factor(c(1, NA), exclude = NULL)
crossing(x_na_lev)$x_na_lev
#> [1] 1    <NA>
#> Levels: 1 <NA>
x_na_lev_extra <- factor(c(1, NA), levels = c(1, 2, NA), exclude = NULL)
crossing(x_na_lev_extra)$x_na_lev_extra
#> [1] 1    2    <NA>
#> Levels: 1 2 <NA>
x_no_na_lev <- factor(c(1, NA))
crossing(x_no_na_lev)$x_no_na_lev
#> [1] 1    <NA>
#> Levels: 1
x_no_na_lev_extra <- factor(c(1, NA), levels = c(1, 2))
crossing(x_no_na_lev_extra)$x_no_na_lev_extra
#> [1] 1    2    <NA>
#> Levels: 1 2

# `complete()` also preserves with no warnings
df <- data.frame(x_na_lev, x_na_lev_extra, x_no_na_lev, x_no_na_lev_extra,
                 data = 10:11)
str(complete(df, x_na_lev, x_na_lev_extra, x_no_na_lev, x_no_na_lev_extra))
#> Classes 'tbl_df', 'tbl' and 'data.frame':    36 obs. of  5 variables:
#>  $ x_na_lev         : Factor w/ 2 levels "1",NA: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ x_na_lev_extra   : Factor w/ 3 levels "1","2",NA: 1 1 1 1 1 1 2 2 2 2 ...
#>  $ x_no_na_lev      : Factor w/ 1 level "1": 1 1 1 NA NA NA 1 1 1 NA ...
#>  $ x_no_na_lev_extra: Factor w/ 2 levels "1","2": 1 2 NA 1 2 NA 1 2 NA 1 ...
#>  $ data             : int  10 NA NA NA NA NA NA NA NA NA ...

@hadley

This comment has been hidden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants