Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAS catalog: value label sets with long names aren't being matched #121

Closed
evanmiller opened this issue Oct 16, 2015 · 7 comments
Closed

Comments

@evanmiller
Copy link
Collaborator

SAS catalog files can name their formats (value label sets) with both 8-byte short name and a 32-byte long name. At present ReadStat is returning the short name, which may not be the correct behavior. @cullenjd has reported that his value labels are not being properly applied (see previous discussion at #34). It is likely an issue with ReadStat but could also be something with haven.

evanmiller added a commit to WizardMac/ReadStat that referenced this issue Oct 16, 2015
Currently the short (8-byte) name is always being returned, but I'm not
sure that's correct. See tidyverse/haven#121. Let's return the long name
and see if that fixes user-reported issues.
@evanmiller
Copy link
Collaborator Author

I've pushed a possible fix to ReadStat here: WizardMac/ReadStat@ac1a2db

@evanmiller
Copy link
Collaborator Author

@cullenjd Please try:

devtools::install_github("evanmiller/haven", ref="update-readstat")

@cullenjd
Copy link

@evanmiller - I updated but I'm still not getting the value labels from the catalog file.

table(opdata$workstat)

0 1 2 3
5604 2441 67 42

Should be:

"No" "Yes" "Don't Know" "Refused"
5604 2441 67 42

@evanmiller
Copy link
Collaborator Author

@cullenjd - Do you have a small sample data file you can share?

@cullenjd
Copy link

@evanmiller
Copy link
Collaborator Author

I am seeing the labels:

> df$workstat
<Labelled>
  [1]  2  0  1  0  0  0  0  1  0  0  0  0  0  1  1  0  0  0  1  0  0  0  0  0  0  1  0  0  0  0  1  0  0  0  0  1  0  0  0  0  0  0  0  1  1  0  0  0  1  0  1
 [52]  0  1  0  0  0  0  1  1  1 NA  1  0  1  0  0  0  0  1  1  1  0  0  0  1  0  0  0  0  0 NA  0  0  1  0 NA  0  0  0  0  0  0  1  0  0  0  0  0  0  0
attr(,"label")
[1] workstat: H1. At the time of your fracture, were you working full- or part-time for pay?

Labels:
 value                                  label is_na
     1                                  A.Yes FALSE
     0                                   B.No FALSE
     2                           C.Don't know FALSE
     3 D.Patient declined to answer (Refused) FALSE

It appears the table doesn't produce labels -- I think this is a separate issue (@hadley?).

In any event it appears that labels are imported correctly with the patch. I will leave this issue open until the changes are merged into haven/master.

hadley added a commit that referenced this issue Oct 19, 2015
@gergness
Copy link
Contributor

@cullenjd - the variables are imported as labelled class vectors and require some more manipulation before you can refer to them by their labelled values. I think the idea is that haven imports the data in this intermediate form, and expects the user to convert them to more standard R structures that suit the particular analysis.

In your example, use as_factor() to convert the variable to a factor, and table() will use the labels.

opdata$workstat <- as_factor(opdata$workstat)
table(opdata$workstat)

@lock lock bot locked and limited conversation to collaborators Jun 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants