Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate_dta only checks first column for labelled #326

Closed
kuriwaki opened this issue Dec 18, 2017 · 4 comments
Closed

validate_dta only checks first column for labelled #326

kuriwaki opened this issue Dec 18, 2017 · 4 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@kuriwaki
Copy link

validate_dta only checks the first column for integer+labelled:

haven/R/haven.R

Line 247 in 7f2b479

bad_labels <- is_labelled && !is_integer

Shouldn't it check all columns? MWE:

library(haven)

s1 <- labelled(c("M", "M", "F"), c(Male = "M", Female = "F"))
s2 <- labelled(c(1L, 1L, 2L), c(Male = 1L, Female = 2L))
labelled_df <- data.frame(s1, s2)

## appropriately fails because s1 is not integer
write_dta(labelled_df, "labelled.dta")
#> Error: Stata only supports labelled integers.
#> Problems: `s1`, `s2`

## swapping columns should fail for same reason (?), but doesn't
write_dta(labelled_df[, c("s2", "s1")], "labelled.dta")
@hadley
Copy link
Member

hadley commented Jan 7, 2018

Doh!

@hadley hadley added the bug an unexpected problem or unintended behavior label Jan 7, 2018
@hadley hadley closed this as completed in 19256d0 Jan 7, 2018
@cimentadaj
Copy link

But this raises a contradiction: read_dta can read labelled doubles but write_dta can't write labelled doubles. Download data here

library(haven)
#> Warning: package 'haven' was built under R version 3.4.3
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.4.2
#> ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.3.4     ✔ dplyr   0.7.4
#> ✔ tidyr   0.7.2     ✔ stringr 1.2.0
#> ✔ readr   1.1.1     ✔ forcats 0.2.0
#> Warning: package 'tidyr' was built under R version 3.4.2
#> Warning: package 'purrr' was built under R version 3.4.2
#> Warning: package 'dplyr' was built under R version 3.4.2
#> ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()

stata_dt <- read_dta("st_ex.dta") %>% 
  select(tvtot, tvpol)

map_chr(stata_dt, typeof)
#>    tvtot    tvpol 
#> "double" "double"
map_lgl(stata_dt, is.labelled)
#>  tvtot  tvpol 
#> "TRUE" "TRUE"

write_dta(stata_dt, tempfile(fileext = ".dta"))
#> Error: Stata only supports labelled integers.
#> Problems: `tvtot`, `tvpol`

I'm not sure why write_dta is not allowed to save labelled doubles, although up to this new version it was doing it just fine (at least in my work I've been saving labelled doubles, which were integers but read as doubles by read_dta)

@kuriwaki
Copy link
Author

@cimentadaj I agree I've found this as a pain point; perhaps file it as a new issue so it doesn't get buried under this closed issue?

Perhaps the restriction in haven is too strict and can allow <dbl + lbl> classes as long as the doubles are not explicit decimals.

@lock
Copy link

lock bot commented Jul 25, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants