New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_factor implicitly relevels empty string "" and actual NA to factor "NA" #864

Closed
djbirke opened this Issue Jun 11, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@djbirke

djbirke commented Jun 11, 2018

readr_1.1.1 and tidyverse_1.2.1

I want to convert a string column to factors, taking "NA" as an actual factor level (say "NA" means "North America"), and also taking "" as an actual factor level. However, both "" and <NA> are mapped to "NA" by parse_factor, while NC is mapped to <NA>.

library(tidyverse)

df <- tibble(
  id = 1:5,
  group = c('NA', 'NB', 'NC', '', NA)
)

df %>%
  mutate(f = parse_factor(group,
                          levels = c('', 'NA', 'NB'),
                          na = character()))
#> Warning: 1 parsing failure.
#> row # A tibble: 1 x 4 col     row   col expected           actual expected   <int> <int> <chr>              <chr>  actual 1     3    NA value in level set NC
#> # A tibble: 5 x 3
#>      id group f    
#>   <int> <chr> <fct>
#> 1     1 NA    NA   
#> 2     2 NB    NB   
#> 3     3 NC    <NA> 
#> 4     4 ""    NA   
#> 5     5 <NA>  NA

But I expected

#> # A tibble: 5 x 3
#>      id group f    
#>   <int> <chr> <fct>
#> 1     1 NA    NA   
#> 2     2 NB    NB   
#> 3     3 NC    <NA> 
#> 4     4 ""    ""
#> 5     5 <NA>  <NA>

@djbirke djbirke changed the title from parse_factor implictly relevels empty string "" and actual NA to factor "NA" to parse_factor implicitly relevels empty string "" and actual NA to factor "NA" Jun 12, 2018

@jimhester jimhester added the bug label Nov 13, 2018

@jimhester jimhester closed this in e0189f2 Nov 14, 2018

@jimhester

This comment has been minimized.

Member

jimhester commented Nov 14, 2018

Thanks, should now be fixed!

library(readr)
x <- c("", "NC", "NC", "NC", "", "", "NB", "NA")

tibble::as_tibble(parse_factor(x, levels = c("NA", "NB", "NC", ""), na = character()))
#> # A tibble: 8 x 1
#>   value
#>   <fct>
#> 1 ""   
#> 2 NC   
#> 3 NC   
#> 4 NC   
#> 5 ""   
#> 6 ""   
#> 7 NB   
#> 8 NA

Created on 2018-11-14 by the reprex package (v0.2.1)

@djbirke

This comment has been minimized.

djbirke commented Nov 14, 2018

Awesome, thank you for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment