Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

labelled sensitive to labels ordering in Stata #327

Closed
kuriwaki opened this issue Dec 20, 2017 · 6 comments
Closed

labelled sensitive to labels ordering in Stata #327

kuriwaki opened this issue Dec 20, 2017 · 6 comments

Comments

@kuriwaki
Copy link

@kuriwaki kuriwaki commented Dec 20, 2017

When defining a labelled object, the label-value vector needs to be sorted by its numerical values, or else Stata cannot correctly read the label:

library(haven)
suppressPackageStartupMessages(library(dplyr))

labs <- c(Democrat = 1, Republican = 2, Independent = 3) # named vector

lbl <- tibble(pid3 = labelled(1L:3L, labs)) # both sorted
lbl_num <- tibble(pid3 = labelled(3L:1L, labs)) # values not sorted
lbl_lab <- tibble(pid3 = labelled(3L:1L, labs[c(1, 3, 2)])) # labels not sorted

write_dta(lbl, "foo.dta")
write_dta(lbl_num, "foo_num.dta")
write_dta(lbl_lab, "foo_lab.dta") # this one gets misread in Stata

When I open foo_lab.dta in Stata, I get


. tab pid3

       pid3 |      Freq.     Percent        Cum.
------------+-----------------------------------
   Democrat |          1       33.33       33.33
          2 |          1       33.33       66.67
Independent |          1       33.33      100.00
------------+-----------------------------------
      Total |          3      100.00

So the label for 2 dropped out. Is this a bug?

This error does not occur when the dta file is read into R by read_dta and analyze it there; only when I open it in Stata.

@kuriwaki kuriwaki changed the title labelled sensitive to label ordering in Stata labelled sensitive to labels ordering in Stata Dec 20, 2017
@hadley hadley added the bug label Jan 7, 2018
@hadley
Copy link
Member

@hadley hadley commented Jan 7, 2018

@evanmiller any thoughts on whether this is the responsibility of haven or readstat?

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Jan 7, 2018

@hadley ReadStat should probably order the labels the way that Stata expects before writing.

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Jan 8, 2018

@hadley Try pulling in this change: WizardMac/ReadStat@f4d8706

@hadley hadley added the readstat label Jan 8, 2018
hadley added a commit that referenced this issue Jan 8, 2018
@hadley
Copy link
Member

@hadley hadley commented Jan 8, 2018

@kuriwaki please try with the latest dev version.

@hadley hadley added the wip label Jan 8, 2018
@kuriwaki
Copy link
Author

@kuriwaki kuriwaki commented Jan 8, 2018

@hadley @evanmiller works great. Thank you.

@hadley hadley closed this Jan 9, 2018
@lock
Copy link

@lock lock bot commented Jul 8, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants