Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_dta can't write labelled doubles with decimal values #401

Closed
helge-baumann opened this issue Sep 5, 2018 · 8 comments

Comments

@helge-baumann
Copy link

commented Sep 5, 2018

Hello - please apologize if I'm doing something wrong, I'm a newbie. I'll try my best.

The issue: Until recently, write_dta couldn't write labelled doubles even if all values of a variable were integers. As far as know, the issue (#343) was fixed.

However, write_dta still won't write labelled doubles if certain values have decimal values --> Even if these values do not have labels.

Minimal example:

x <- data.frame(
  a=labelled(c(-7, -8, 1, 2, 3), labels=c("refuse" = -7, "don't know" = -8)),
  b=labelled(c(-7, -8, 1, 2, 3. 2), labels=c("refuse" = -7, "don't know" = -8))
)
write_dta(x, "test.dta")

I think those variables should be supported since many (survey) variables include data with decimal values (e.g., hourly wages) but still need to label certain values (refuse, does not apply...). Of course I see the point why labelled values don't need decimal points.

By the way: With write_sav, this works fine.

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 23, 2019

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

@hadley hadley added the reprex label Jan 23, 2019

@helge-baumann

This comment has been minimized.

Copy link
Author

commented Jan 24, 2019

Dear Hadley,

thanks for your reply. Actually I haven't used reprex() before so I hope I've used it correctly (and the markdown output is fine):

pac <- "pacman" %in% rownames(installed.packages())
if(pac == FALSE) install.packages("pacman"); rm(pac)
library(pacman)
p_load("haven", "reprex")

# Creating the data frame with two labelled variables 
  # "a" is a variable with no decimal values
  # "b" includes an (unlabelled) decimal value
x <- 
  data.frame(
    a=labelled(
      c(-7, -8, 1, 2, 3), 
      labels=c("refuse" = -7, "don't know" = -8)
      ),
    b=labelled(
      c(-7, -8, 1, 2, 3.2), # "3.2" is the culprit
      labels=c("refuse" = -7, "don't know" = -8)
      )
)

write_sav(x, "test.sav") # (file turns out to be fine)
write_dta(x, "test.dta")
#> Error: Stata only supports labelled integers.
#> Problems: `b`
@hadley

This comment has been minimized.

Copy link
Member

commented Jan 24, 2019

Minimal reprex:

library(haven)

x <- data.frame(
  b = labelled(
    c(-7, -8, 1, 2, 3.2), # "3.2" is the culprit
    labels = c("refuse" = -7, "don't know" = -8)
  )
)

write_dta(x, "test.dta")
#> Error: Stata only supports labelled integers.
#> Problems: `b`

Created on 2019-01-24 by the reprex package (v0.2.1.9000)

Now that I have the reprex, I'm not sure what you're asking for. As far as I know, stata does not support labelled doubles, but SPSS does. See #144 for details.

@hadley hadley closed this Jan 24, 2019

@helge-baumann

This comment has been minimized.

Copy link
Author

commented Jan 24, 2019

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 24, 2019

I don't understand what you are trying to say. Can you please try again? Sending me a file would not help because I don't have stata.

@helge-baumann

This comment has been minimized.

Copy link
Author

commented Jan 24, 2019

Oh, my English obviously sucks - please let me apologize for the inconvenience. I'll do my best:

Suppose you have a variable/column like "hourly wage in dollars". Obviously this variable can have values which contain decimal values, e.g. 8.32, 11.50, 7.45.

Now this variable/column can have also values which do not contain decimal values, like the codes for "don't know" or "I don't have a job". Let's just say these (missing) values are 999997 and 999998.

Now, you want to assign labels for these values in Stata. Stata will allow you to label the values 999997 and 999998 because they do not contain decimal values. However, Stata will not allow you to label 11.50.

What I'm trying to say: Stata does not prohibit you from labelling values in a column which contains values with decimal numbers. It only prohibits you from labelling certain values with decimal numbers.

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 24, 2019

Ok, got it — thanks for the explanation!

@hadley hadley reopened this Jan 24, 2019

@hadley hadley closed this in 345a696 Jan 24, 2019

@lock

This comment has been minimized.

Copy link

commented Jul 23, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants
You can’t perform that action at this time.