New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_dta: numeric labelleds not written correctly #144

Closed
diogocp opened this Issue Dec 4, 2015 · 2 comments

Comments

Projects
None yet
2 participants
@diogocp
Copy link
Contributor

diogocp commented Dec 4, 2015

I caught this in #140 but have not yet discovered the cause of the problem.

> num <- labelled(c(1, 2), c(a = 1, b = 3))
> num
<Labelled>
[1] 1 2

Labels:
 value label is_na
     1     a FALSE
     3     b FALSE
> roundtrip_var(num)
<Labelled>
[1] 1 2

Labels:
 value label is_na
     0     a FALSE
     0     b FALSE

I opened the file in Stata and verified that the labels are not written correctly:

. label list
x..i..:
           0 a
           0 b

It might be a bug in ReadStat, since the label-writing code in DfWriter seems to be basically the same as the code for integer labelleds, which do work fine.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Dec 4, 2015

Stata only supports value labels for integers:

http://www.stata.com/help.cgi?dta#value_labels

You could try creating the label set for integer values (readstat_label_int32_value) but I am not sure what the Stata behavior is when an integer label set is applied to a double-precision column.

@diogocp

This comment has been minimized.

Copy link
Contributor

diogocp commented Dec 4, 2015

You're right. I thought I had tried to label a double in Stata, but apparently I did not.

. label define test 3.14 "pi"
may not label 3.14

Applying an integer label to a double variable does work, but is probably not very useful.

. list, clean

          x  
  1.      1  
  2.   3.14  

. label define test 1 "one"
. label values x test
. list, clean

          x  
  1.    one  
  2.   3.14 

The easiest way to fix this would be to warn and discard labels when writing numerics. Writing only those labels with integer values could be very surprising for users.

@hadley hadley closed this in aba3f4e May 30, 2016

@lock lock bot locked and limited conversation to collaborators Jun 27, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.