Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_dta can't save labelled class without labels #442

Closed
cimentadaj opened this issue Mar 15, 2019 · 4 comments
Labels
bug
Milestone

Comments

@cimentadaj
Copy link

@cimentadaj cimentadaj commented Mar 15, 2019

According to the documentation of labelled, the argument labels accepts a NULL, which is reasonable, given that you might want to save a column with no labels but an actual label. write_dta throws a weird error when saving with the labels argument to NULL.

library(haven)

# Note this this throws a warning but according to the documentation of `labelled`,
# it accepts a NULL
x <- labelled(1:20, labels = NULL, label = 'This is clearly a integer')
#> Warning in is.na(object): is.na() applied to non-(list or vector) of type
#> 'NULL'

x
#> <Labelled integer>: This is clearly a integer
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

write_dta(data.frame(x = x), tempfile(fileext = '.dta'), version = 14)
#> Error in write_dta_(data, normalizePath(path, mustWork = FALSE), version = stata_file_format(version)): Not compatible with requested type: [type=NULL; target=integer].

# Adds toy labels
x <- labelled(1:20, labels = c('M' = 1), label = 'This is clearly a integer')

write_dta(data.frame(x = x), tempfile(fileext = '.dta'), version = 14)
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.3 (2017-11-30)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       Europe/Madrid               
#>  date     2019-03-15
#> Packages -----------------------------------------------------------------
#>  package   * version    date       source                          
#>  backports   1.1.2      2017-12-13 cran (@1.1.2)                   
#>  base      * 3.4.3      2017-12-07 local                           
#>  compiler    3.4.3      2017-12-07 local                           
#>  crayon      1.3.4      2017-09-16 cran (@1.3.4)                   
#>  datasets  * 3.4.3      2017-12-07 local                           
#>  devtools    1.13.4     2017-11-09 CRAN (R 3.4.2)                  
#>  digest      0.6.18     2018-10-10 cran (@0.6.18)                  
#>  evaluate    0.10.1     2017-06-24 CRAN (R 3.4.0)                  
#>  forcats     0.4.0      2019-02-17 cran (@0.4.0)                   
#>  graphics  * 3.4.3      2017-12-07 local                           
#>  grDevices * 3.4.3      2017-12-07 local                           
#>  haven     * 2.1.0.9000 2019-03-15 Github (tidyverse/haven@a4f3f86)
#>  hms         0.4.2      2018-03-10 cran (@0.4.2)                   
#>  htmltools   0.3.6      2017-04-28 CRAN (R 3.4.0)                  
#>  knitr       1.19       2018-01-29 CRAN (R 3.4.3)                  
#>  magrittr    1.5        2014-11-22 CRAN (R 3.4.0)                  
#>  memoise     1.1.0      2018-09-09 Github (hadley/memoise@06d16ec) 
#>  methods   * 3.4.3      2017-12-07 local                           
#>  pillar      1.3.1      2018-12-15 cran (@1.3.1)                   
#>  pkgconfig   2.0.2      2018-08-16 cran (@2.0.2)                   
#>  Rcpp        1.0.0      2018-11-07 cran (@1.0.0)                   
#>  rlang       0.3.1      2019-01-08 cran (@0.3.1)                   
#>  rmarkdown   1.8        2017-11-17 CRAN (R 3.4.2)                  
#>  rprojroot   1.3-2      2018-01-03 cran (@1.3-2)                   
#>  stats     * 3.4.3      2017-12-07 local                           
#>  stringi     1.2.3      2018-06-12 cran (@1.2.3)                   
#>  stringr     1.3.1      2018-05-10 cran (@1.3.1)                   
#>  tibble      2.0.1      2019-01-12 cran (@2.0.1)                   
#>  tools       3.4.3      2017-12-07 local                           
#>  utils     * 3.4.3      2017-12-07 local                           
#>  withr       2.1.2.9000 2018-12-05 Github (r-lib/withr@be57595)    
#>  yaml        2.1.17     2018-02-27 cran (@2.1.17)
@Gootjes

This comment was marked as resolved.

Copy link

@Gootjes Gootjes commented May 6, 2019

It is the same for write_sav!
However, as a workaround, I believe you can still set a label by doing attributes(x)$label <- "a label".
So, somewhere in the code, I think haven expects labels to be set when something is of class haven_labelled, and rejects a NULL value.

@nbchan

This comment was marked as resolved.

Copy link

@nbchan nbchan commented Aug 26, 2019

Wrote this function thanks to the workaround suggested by @Gootjes. Hope someone would find this useful.

library(tibble)
library(dplyr)
library(magrittr)
library(haven)

fix_haven_labelled_write_problem <- function(df){
    # Provides a workaround to the problem of writing haven-labelled variables (with null `labels`
    # and non-null `label`) to Stata / SPSS files. Returns a dataframe compatible with functions
    # such as `write_sav`. 
    # Reference: https://github.com/tidyverse/haven/issues/442
    
    # df: dataframe
    
    # vector of problematic column names
    cnames <- df %>%
        select_if(~ is.labelled(.) & is.null(attributes(.)$labels)) %>%
        colnames
    
    # convert the problematic columns back to their generic data types
    df_fixed <- df %>%
        mutate_if(~ is.labelled(.) & is.null(attributes(.)$labels) & typeof(.) == 'character',
                  as.character) %>%
        mutate_if(~ is.labelled(.) & is.null(attributes(.)$labels) & typeof(.) == 'integer',
                  as.integer) %>%
        mutate_if(~ is.labelled(.) & is.null(attributes(.)$labels) & typeof(.) == 'double',
                  as.double)
    
    # add an attribute called `label` to the respective columns
    for(cname in cnames){
        attributes(df_fixed[[cname]])$label <- df[[cname]] %>% attributes %$% label
    }
    
    return(df_fixed)
}

df <- tibble(x=labelled(1:20, labels = NULL, label = 'This is clearly a integer'))

df %>% write_dta(tempfile(fileext = '.dta'), version = 14)
#> Error in write_dta_(data, normalizePath(path, mustWork = FALSE), version = stata_file_format(version)): Not compatible with requested type: [type=NULL; target=integer].

df %>% fix_haven_labelled_write_problem %>% write_dta(tempfile(fileext = '.dta'), version = 14)
# works

Created on 2019-08-26 by the reprex package (v0.3.0)

@Gootjes

This comment has been minimized.

Copy link

@Gootjes Gootjes commented Aug 30, 2019

@hadley

This comment has been minimized.

Copy link
Member

@hadley hadley commented Nov 6, 2019

Minimal reprex:

library(haven)

x <- labelled(1:3, labels = NULL, label = 'x')
df <- data.frame(x = x)

write_dta(df, tempfile(fileext = '.dta'), version = 14)
#> Error in get_result(output = out, options) : 
#>  callr subprocess failed: could not start R, exited with non-zero status, has crashed or was killed
@hadley hadley added the bug label Nov 6, 2019
@hadley hadley added this to the v2.2.0 milestone Nov 6, 2019
@hadley hadley closed this in 6b689a8 Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.