Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-unique labels #424

Closed
gergness opened this issue Feb 7, 2019 · 7 comments · Fixed by #425
Labels

Comments

@gergness
Copy link
Contributor

@gergness gergness commented Feb 7, 2019

Similar but not the same as #364 - it's currently possible to have multiple values mapped to a label with the same text.

library(haven)
library(tibble)

x <- labelled(1:2, c("label" = 1, "label" = 2))
x
#> <Labelled integer>
#> [1] 1 2
#> 
#> Labels:
#>  value label
#>      1 label
#>      2 label

However, if you do this you can't assign to factors using option levels = "label"

as_factor(x, "labels")
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated

This is especially problematic because #390 used this in the print method.

tibble(x = x)
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated

Stata at least considers this data valid (test.zip attached), so I think the best solution is to fix the as_factor() code, but if not, I think it's important to at least change the tibble print method.

read_dta("~/test.dta")
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated
@briatte

This comment has been minimized.

Copy link

@briatte briatte commented Mar 5, 2019

This issue makes it currently impossible to work with many Stata datasets using haven, since it kills the print function.

@gergness

This comment has been minimized.

Copy link
Contributor Author

@gergness gergness commented Mar 5, 2019

FWIW as an interim fix, you can set options(haven.show_pillar_labels = FALSE) to use the old print method.

@elbersb

This comment has been minimized.

Copy link

@elbersb elbersb commented Oct 8, 2019

Anything we can help with to fix this? I run into this with Stata datasets frequently.

@courtiol

This comment has been minimized.

Copy link

@courtiol courtiol commented Oct 17, 2019

@elbersb the issue has been solved (actually at least twice: #425, #476) in pull requests, you now need to wait a little for the package maintainer (@hadley) to include the change in the package.

@gorcha

This comment has been minimized.

Copy link
Contributor

@gorcha gorcha commented Oct 17, 2019

Hi @hadley / @batpigandme, can this fix be merged some time soon? For those of us who use haven regularly this essentially breaks dplyr printing (which is incredibly useful) for a lot of datasets.

It seems like a simple and non-controversial fix.

@hadley

This comment has been minimized.

Copy link
Member

@hadley hadley commented Oct 18, 2019

I will try and take a look in the near future but it’s likely to be 2-3 weeks. In the mean time, you can just install the version from the PR with the fix: install_github(“tidyverse/haven#425”)

@gorcha

This comment has been minimized.

Copy link
Contributor

@gorcha gorcha commented Oct 23, 2019

Thanks!

@hadley hadley added bug wip labels Nov 6, 2019
@hadley hadley closed this in #425 Nov 6, 2019
hadley added a commit that referenced this issue Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.