Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_spss and write_spss: missing labels for strings #409

Closed
beckerbenj opened this issue Oct 1, 2018 · 10 comments · Fixed by #581
Closed

read_spss and write_spss: missing labels for strings #409

beckerbenj opened this issue Oct 1, 2018 · 10 comments · Fixed by #581
Labels
feature a feature request or enhancement labelled_spss

Comments

@beckerbenj
Copy link

beckerbenj commented Oct 1, 2018

When importing string variables via read_spss missing labels (and sometimes also value labels) seem to behave strangely depending on the width of the variables (at least in educational large scale assessments missing and value labels for string variables are not that uncommon). I used the current GitHub version of haven.

The sav-file I attached looks like this (using SPSS 22.0.0.1 or SPSS 25)

test1.zip

spss_variables
spss_data

Importing results in the following attributes on variable level:

rawDat <- haven::read_spss(file = "N:/spss/test1.sav", user_na = TRUE)
lapply(rawDat, attributes)
#> $v1
#> $v1$na_values
#> [1] 99
#> 
#> $v1$class
#> [1] "haven_labelled_spss" "haven_labelled"     
#> 
#> $v1$format.spss
#> [1] "F8.2"
#> 
#> $v1$labels
#> one 
#>   1 
#> 
#> 
#> $v2
#> $v2$na_values
#> [1] NA
#> 
#> $v2$class
#> [1] "haven_labelled_spss" "haven_labelled"     
#> 
#> $v2$format.spss
#> [1] "A8"
#> 
#> $v2$labels
#> one 
#> "1" 
#> 
#> 
#> $v3
#> $v3$format.spss
#> [1] "A9"
#> 
#> $v3$class
#> [1] "haven_labelled"
#> 
#> $v3$labels
#> one 
#> "1" 
#> 
#> 
#> $v4
#> $v4$format.spss
#> [1] "A21"

Created on 2018-10-01 by the reprex package (v0.2.1)

When writing to sav missing labels for string variables are also dropped:

# set up data frame
df <- data.frame(v1 = c(1, 99), v2 = c("aa", "99"), stringsAsFactors = FALSE)
attributes(df$v1) <- list(na_values = 99, class = c("haven_labelled_spss", "haven_labelled"), format.spss = "F8.2", labels = c(one = 1))
attributes(df$v2) <- list(na_values = "99", class = c("haven_labelled_spss", "haven_labelled"), format.spss = "A2", labels = c(sth = "aa"))
# write sav
haven::write_sav(df, path = "N:/spss/test2.sav")
# read sav
spssDF <- haven::read_spss(file = "N:/spss/test2.sav", user_na = TRUE)
lapply(spssDF, attributes)
#> $v1
#> $v1$na_values
#> [1] 99
#> 
#> $v1$class
#> [1] "haven_labelled_spss" "haven_labelled"     
#> 
#> $v1$format.spss
#> [1] "F8.2"
#> 
#> $v1$labels
#> one 
#>   1 
#> 
#> 
#> $v2
#> $v2$format.spss
#> [1] "A2"
#> 
#> $v2$class
#> [1] "haven_labelled"
#> 
#> $v2$labels
#>  sth 
#> "aa"

Created on 2018-10-01 by the reprex package (v0.2.1)

And the spss variable view looks like this:

spss_test2

Is there any way to import missing and value labels consistently from sav files to R?

Thank You!

@hadley
Copy link
Member

hadley commented Jan 24, 2019

Can you please confirm that I've captured the problem in the following reprex?

library(haven)

path <- tempfile()
df1 <- tibble::tibble(
  x = labelled_spss(c("1", "99"), na_values = "99", c(one = "1"))
)

write_sav(df1, path)
df2 <- read_sav(path, user_na = TRUE)

attr(df1$x, "na_values")
#> [1] "99"
attr(df2$x, "na_values")
#> NULL

Created on 2019-01-24 by the reprex package (v0.2.1.9000)

@hadley hadley added the reprex needs a minimal reproducible example label Jan 24, 2019
@beckerbenj

This comment has been minimized.

@hadley hadley added bug an unexpected problem or unintended behavior and removed reprex needs a minimal reproducible example labels Feb 1, 2019
@hadley

This comment has been minimized.

@hadley hadley added feature a feature request or enhancement and removed bug an unexpected problem or unintended behavior labels Feb 1, 2019
@evanmiller

This comment has been minimized.

@hadley

This comment has been minimized.

@evanmiller
Copy link
Collaborator

Try WizardMac/ReadStat@963b6eb (dev branch)

@hadley
Copy link
Member

hadley commented Feb 3, 2019

@evanmiller does readstat_variable_get_missing_ranges_count() need an update too? When I read test1.zip, I see values of 1, 1, 0, 0.

@evanmiller
Copy link
Collaborator

@hadley
Copy link
Member

hadley commented Feb 4, 2019

Note to self: WIP in local missing-string branch.

@evanmiller
Copy link
Collaborator

Okay the new record type should work in ReadStat master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement labelled_spss
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants