New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_sav needs to coerce labels to UTF-8 #87

Closed
larmarange opened this Issue Jul 7, 2015 · 4 comments

Comments

Projects
None yet
3 participants
@larmarange
Copy link
Contributor

larmarange commented Jul 7, 2015

An example:

v1 <- labelled(c(1,1,2,3), c(éè = 1, à = 2, ï  = 3))
v2 <- c("éè", "éè", "à", "ï")
v3 <- c("ee", "ee", "a", "i")
dt <- data_frame(v1, v2, v3)
attr(dt$v1, "label") <- "a làbèl wïth acèènts"
attr(dt$v2, "label") <- "a label with no accent"
write_sav(dt, "example.sav")

When opening the resulting file with SPSS, it appears that:

  • v2 and v3 are correct
  • the value labels of v1 are not correct (i.e. they are displayed as ??, ??, ?, ? by SPSS)
  • variable labels are correct
@sjPlot

This comment has been minimized.

Copy link

sjPlot commented Jul 7, 2015

Do you have the same problems with the write_spss function from the sjmisc-package? Try to convert label attributes to factor levels (as_factor or to_label), this might work...

@larmarange

This comment has been minimized.

Copy link
Contributor

larmarange commented Jul 7, 2015

I just tried with write_spss. Exactly the same bug with accents in value labels.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jul 7, 2015

In the short-term, you can work around it by setting the Encoding() of every character vector to UTF-8. In the long-term, haven should do that for you automatically.

@hadley hadley changed the title Encoding pb with value labels in write_sav write_sav needs to coerce labels to UTF-8 May 30, 2016

@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 30, 2016

Minimal reprex:

# c("éè", "à", "ï")
labels_utf8 <- c("\u00e9\u00e8", "\u00e0", "\u00ef")
labels_latin1 <- iconv(labels_utf8, "utf-8", "latin1")

v_utf8 <- labelled(3:1, setNames(1:3, labels_utf8))
v_latin1 <- labelled(3:1, setNames(1:3, labels_latin1))

roundtrip_var(v_utf8) #ok
roundtrip_var(v_latin1) # not ok

@hadley hadley closed this in 092059d May 30, 2016

@lock lock bot locked and limited conversation to collaborators Jun 27, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.