Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_factor implicitly relevels empty string "" and actual NA to factor "NA" #864

Closed
djbirke opened this issue Jun 11, 2018 · 3 comments
Closed
Labels
bug an unexpected problem or unintended behavior

Comments

@djbirke
Copy link

djbirke commented Jun 11, 2018

readr_1.1.1 and tidyverse_1.2.1

I want to convert a string column to factors, taking "NA" as an actual factor level (say "NA" means "North America"), and also taking "" as an actual factor level. However, both "" and <NA> are mapped to "NA" by parse_factor, while NC is mapped to <NA>.

library(tidyverse)

df <- tibble(
  id = 1:5,
  group = c('NA', 'NB', 'NC', '', NA)
)

df %>%
  mutate(f = parse_factor(group,
                          levels = c('', 'NA', 'NB'),
                          na = character()))
#> Warning: 1 parsing failure.
#> row # A tibble: 1 x 4 col     row   col expected           actual expected   <int> <int> <chr>              <chr>  actual 1     3    NA value in level set NC
#> # A tibble: 5 x 3
#>      id group f    
#>   <int> <chr> <fct>
#> 1     1 NA    NA   
#> 2     2 NB    NB   
#> 3     3 NC    <NA> 
#> 4     4 ""    NA   
#> 5     5 <NA>  NA

But I expected

#> # A tibble: 5 x 3
#>      id group f    
#>   <int> <chr> <fct>
#> 1     1 NA    NA   
#> 2     2 NB    NB   
#> 3     3 NC    <NA> 
#> 4     4 ""    ""
#> 5     5 <NA>  <NA>
@djbirke djbirke changed the title parse_factor implictly relevels empty string "" and actual NA to factor "NA" parse_factor implicitly relevels empty string "" and actual NA to factor "NA" Jun 12, 2018
@jimhester jimhester added the bug an unexpected problem or unintended behavior label Nov 13, 2018
@jimhester
Copy link
Member

Thanks, should now be fixed!

library(readr)
x <- c("", "NC", "NC", "NC", "", "", "NB", "NA")

tibble::as_tibble(parse_factor(x, levels = c("NA", "NB", "NC", ""), na = character()))
#> # A tibble: 8 x 1
#>   value
#>   <fct>
#> 1 ""   
#> 2 NC   
#> 3 NC   
#> 4 NC   
#> 5 ""   
#> 6 ""   
#> 7 NB   
#> 8 NA

Created on 2018-11-14 by the reprex package (v0.2.1)

@djbirke
Copy link
Author

djbirke commented Nov 14, 2018

Awesome, thank you for your work!

@lock
Copy link

lock bot commented May 13, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators May 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants