Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAV import fails to import umlauts in labels #19

Closed
sjPlot opened this issue Feb 22, 2015 · 8 comments
Closed

SAV import fails to import umlauts in labels #19

sjPlot opened this issue Feb 22, 2015 · 8 comments

Comments

@sjPlot
Copy link

sjPlot commented Feb 22, 2015

When having ä, ö or ü in variable labels, these are not imported correctly.

@hadley
Copy link
Member

hadley commented Feb 22, 2015

Could you please attach a minimal example

@sjPlot
Copy link
Author

sjPlot commented Feb 23, 2015

Here's an example:

library(haven)

# umlaut-test
mydf <- read_sav("umlauts.sav")
str(mydf)

# > str(mydf)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of  1 variable:
#   $ var1:Class 'labelled'  atomic [1:4] 1 2 1 3
# .. ..- attr(*, "label")= chr "This is an ä-umlaut"
# .. ..- attr(*, "labels")= Named int [1:3] 1 2 3
# .. .. ..- attr(*, "names")= chr [1:3] "the ä umlaut" "the ü umlaut" "the ö umlaut"

umlauts

library(sjPlot)
# my read_spss function is a wrapper for read.spss
# of the foreign-package. read_spss only available
# in current development snaphot at github.com/sjPlot/devel
mydf <- read_spss("umlauts.sav")
str(mydf)

# > str(mydf)
# 'data.frame': 4 obs. of  1 variable:
#   $ var1: atomic  1 2 1 3
# ..- attr(*, "value.labels")= Named num  3 2 1
# .. ..- attr(*, "names")= chr  "the ö umlaut" "the ü umlaut" "the ä umlaut"
# - attr(*, "variable.labels")= Named chr "This is an ä-umlaut"
# ..- attr(*, "names")= chr "var1"
# - attr(*, "codepage")= int 65001

# changing the encoding helps
mydf <- read_spss("umlauts.sav", enc = "UTF-8")
str(mydf)

# > str(mydf)
# 'data.frame': 4 obs. of  1 variable:
#   $ var1: atomic  1 2 1 3
# ..- attr(*, "value.labels")= Named chr  "3" "2" "1"
# .. ..- attr(*, "names")= chr  "the ö umlaut" "the ü umlaut" "the ä umlaut"
# - attr(*, "variable.labels")= Named chr "This is an ä-umlaut"
# ..- attr(*, "names")= chr "var1"
# - attr(*, "codepage")= int 65001

@evanmiller
Copy link
Collaborator

Can you attach the umlauts.sav file itself?

@hadley
Copy link
Member

hadley commented Feb 23, 2015

@evanmiller it's possible this is a bug on my end - can I assume readstat_string_value() is always UTF-8?

@evanmiller
Copy link
Collaborator

Yes, it should always return UTF-8.

hadley added a commit that referenced this issue Feb 23, 2015
@hadley
Copy link
Member

hadley commented Feb 23, 2015

@sjPlot can you please try again with the latest version? I pushed a possible fix

@hadley
Copy link
Member

hadley commented Feb 23, 2015

Oh hmm, I only fixed it in one place (variable values) - I'll need to think a bit more to fix the labels.

@sjPlot
Copy link
Author

sjPlot commented Feb 23, 2015

I don't know how to attach sav-files, but I just sent them to Hadley.

@hadley hadley closed this as completed in 2a86fd2 Feb 24, 2015
@lock lock bot locked and limited conversation to collaborators Jun 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants