New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate_dta only checks first column for labelled #326

Closed
kuriwaki opened this Issue Dec 18, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@kuriwaki
Copy link

kuriwaki commented Dec 18, 2017

validate_dta only checks the first column for integer+labelled:

haven/R/haven.R

Line 247 in 7f2b479

bad_labels <- is_labelled && !is_integer

Shouldn't it check all columns? MWE:

library(haven)

s1 <- labelled(c("M", "M", "F"), c(Male = "M", Female = "F"))
s2 <- labelled(c(1L, 1L, 2L), c(Male = 1L, Female = 2L))
labelled_df <- data.frame(s1, s2)

## appropriately fails because s1 is not integer
write_dta(labelled_df, "labelled.dta")
#> Error: Stata only supports labelled integers.
#> Problems: `s1`, `s2`

## swapping columns should fail for same reason (?), but doesn't
write_dta(labelled_df[, c("s2", "s1")], "labelled.dta")
@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jan 7, 2018

Doh!

@hadley hadley added the bug label Jan 7, 2018

@hadley hadley closed this in 19256d0 Jan 7, 2018

@cimentadaj

This comment has been minimized.

Copy link

cimentadaj commented Jan 22, 2018

But this raises a contradiction: read_dta can read labelled doubles but write_dta can't write labelled doubles. Download data here

library(haven)
#> Warning: package 'haven' was built under R version 3.4.3
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.4.2
#> ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.3.4     ✔ dplyr   0.7.4
#> ✔ tidyr   0.7.2     ✔ stringr 1.2.0
#> ✔ readr   1.1.1     ✔ forcats 0.2.0
#> Warning: package 'tidyr' was built under R version 3.4.2
#> Warning: package 'purrr' was built under R version 3.4.2
#> Warning: package 'dplyr' was built under R version 3.4.2
#> ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()

stata_dt <- read_dta("st_ex.dta") %>% 
  select(tvtot, tvpol)

map_chr(stata_dt, typeof)
#>    tvtot    tvpol 
#> "double" "double"
map_lgl(stata_dt, is.labelled)
#>  tvtot  tvpol 
#> "TRUE" "TRUE"

write_dta(stata_dt, tempfile(fileext = ".dta"))
#> Error: Stata only supports labelled integers.
#> Problems: `tvtot`, `tvpol`

I'm not sure why write_dta is not allowed to save labelled doubles, although up to this new version it was doing it just fine (at least in my work I've been saving labelled doubles, which were integers but read as doubles by read_dta)

@kuriwaki

This comment has been minimized.

Copy link

kuriwaki commented Jan 26, 2018

@cimentadaj I agree I've found this as a pain point; perhaps file it as a new issue so it doesn't get buried under this closed issue?

Perhaps the restriction in haven is too strict and can allow <dbl + lbl> classes as long as the doubles are not explicit decimals.

@lock

This comment has been minimized.

Copy link

lock bot commented Jul 25, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 25, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.