New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_sav() truncates long labels #157

Closed
sjpierce opened this Issue Apr 23, 2016 · 4 comments

Comments

Projects
None yet
4 participants
@sjpierce
Copy link

sjpierce commented Apr 23, 2016

Below is a reproducible example of a situation where haven::write_sav exports an SPSS file that contains incorrect value labels. When one adds labels to a character variable (making it a labelled vector in R because you want it to have value labels in the exported SPSS file) and at least one of the actual values in that variable is >= 9 characters in length, then the labels are not correctly exported for that variable. Labels may be fine for other variables in the same file. Labels are correctly exported if all values for the variable in question have <= 8 characters. When I use SPSS to inspect the value labels in test_export.sav, I see a value that looks like "12345678lz h□" where I should see "12345678E". Similarly, I see "1234ABCD{z h□" where I should see "1234ABCD". I've extended the code to show what is actually read back in via read_sav().

# Write_sav() does not properly export value labels in the SPSS file for 
# labelled character variables when at least one value of that variable 
# contains 9 or more characters. 

# Create example data. v1 should export correctly because all values are 
# <= 8 characters, v2 will not because first value is 9 characters long.  
dat <- data.frame(v1 = c("12345678",  "ABCDEFGH", "1234ABCD"),
                  v2 = c("12345678E", "ABCDEFGH", "1234ABCD"),
                  l1 = c("Text1",  "Text2", "Text3"),
                  l2 = c("Text4",  "Text5", "Text6"),
                  stringsAsFactors = FALSE)

# Create a named vector from a vector of values plus a vector of labels.
named <- function(x, labels) {names(x) <- labels; return(x)}

# Turn v1 & v2 into labelled variables so they'll have value labels 
# assigned in the exported SPSS file. Length & contents of the strings 
# in the label vectors seem not to matter. 
dat$v1 <- labelled(dat$v1, named(dat$v1, dat$l1))
dat$v2 <- labelled(dat$v2, named(dat$v2, dat$l2))
dat
str(dat)

# The SPSS file created below will have correct value labels only for v1. 
# You have to open the file in SPSS to see that. 
write_sav(dat, path = "test_export.sav")

# Read the file back in to show that what was stored in test_export.sav
# does not match properties of the data frame written out. 
dat2 <- read_sav(path = "test_export.sav")
str(dat2)

# Correct set of labels we tried to write out. 
attr(dat$v2, "labels")

# Incorrect set of labels recovered from the exported file. 
attr(dat2$v2, "labels") ``` 

@hadley hadley changed the title write_sav exports labeled character vector with incorrect value labels write_sav() truncates long labels May 30, 2016

@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 30, 2016

Minimal reprex:

x <- labelled(c("1", "2"), c("2" = "1234567890", "1" = "1"))
x
tmp <- tempfile()
write_sav(tibble::data_frame(x), tmp)
read_sav(tmp)$x

@evanmiller this seems like a readstat bug

@tklebel

This comment has been minimized.

Copy link
Contributor

tklebel commented May 30, 2016

SPSS seems to default to creating new variables with a maximum "width" of 8, therefore restricting every value in that variable to eight digits/characters. Maybe that causes a problem along the way.

@hadley hadley added the readstat label May 30, 2016

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented May 31, 2016

Just wanted to acknowledge this as a bug in ReadStat. Should have a fix available soon. (Values longer than 8 bytes and labels longer than 255 bytes require a separate "long value labels" record in the SAV file.)

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented May 31, 2016

@hadley hadley closed this in 101b072 May 31, 2016

@lock lock bot locked and limited conversation to collaborators Jun 27, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.