New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

labelled sensitive to labels ordering in Stata #327

Closed
kuriwaki opened this Issue Dec 20, 2017 · 6 comments

Comments

Projects
None yet
3 participants
@kuriwaki
Copy link

kuriwaki commented Dec 20, 2017

When defining a labelled object, the label-value vector needs to be sorted by its numerical values, or else Stata cannot correctly read the label:

library(haven)
suppressPackageStartupMessages(library(dplyr))

labs <- c(Democrat = 1, Republican = 2, Independent = 3) # named vector

lbl <- tibble(pid3 = labelled(1L:3L, labs)) # both sorted
lbl_num <- tibble(pid3 = labelled(3L:1L, labs)) # values not sorted
lbl_lab <- tibble(pid3 = labelled(3L:1L, labs[c(1, 3, 2)])) # labels not sorted

write_dta(lbl, "foo.dta")
write_dta(lbl_num, "foo_num.dta")
write_dta(lbl_lab, "foo_lab.dta") # this one gets misread in Stata

When I open foo_lab.dta in Stata, I get


. tab pid3

       pid3 |      Freq.     Percent        Cum.
------------+-----------------------------------
   Democrat |          1       33.33       33.33
          2 |          1       33.33       66.67
Independent |          1       33.33      100.00
------------+-----------------------------------
      Total |          3      100.00

So the label for 2 dropped out. Is this a bug?

This error does not occur when the dta file is read into R by read_dta and analyze it there; only when I open it in Stata.

@kuriwaki kuriwaki changed the title labelled sensitive to label ordering in Stata labelled sensitive to labels ordering in Stata Dec 20, 2017

@hadley hadley added the bug label Jan 7, 2018

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jan 7, 2018

@evanmiller any thoughts on whether this is the responsibility of haven or readstat?

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jan 7, 2018

@hadley ReadStat should probably order the labels the way that Stata expects before writing.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jan 8, 2018

@hadley Try pulling in this change: WizardMac/ReadStat@f4d8706

@hadley hadley added the readstat label Jan 8, 2018

hadley added a commit that referenced this issue Jan 8, 2018

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jan 8, 2018

@kuriwaki please try with the latest dev version.

@hadley hadley added the wip label Jan 8, 2018

@kuriwaki

This comment has been minimized.

Copy link

kuriwaki commented Jan 8, 2018

@hadley @evanmiller works great. Thank you.

@hadley hadley closed this Jan 9, 2018

@lock

This comment has been minimized.

Copy link

lock bot commented Jul 8, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 8, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.