Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-unique labels #424

Closed
gergness opened this issue Feb 7, 2019 · 8 comments · Fixed by #425
Closed

non-unique labels #424

gergness opened this issue Feb 7, 2019 · 8 comments · Fixed by #425
Labels
bug wip

Comments

@gergness
Copy link
Contributor

@gergness gergness commented Feb 7, 2019

Similar but not the same as #364 - it's currently possible to have multiple values mapped to a label with the same text.

library(haven)
library(tibble)

x <- labelled(1:2, c("label" = 1, "label" = 2))
x
#> <Labelled integer>
#> [1] 1 2
#> 
#> Labels:
#>  value label
#>      1 label
#>      2 label

However, if you do this you can't assign to factors using option levels = "label"

as_factor(x, "labels")
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated

This is especially problematic because #390 used this in the print method.

tibble(x = x)
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated

Stata at least considers this data valid (test.zip attached), so I think the best solution is to fix the as_factor() code, but if not, I think it's important to at least change the tibble print method.

read_dta("~/test.dta")
#> Error in `levels<-`(`*tmp*`, value = as.character(levels)): factor level [2] is duplicated
@briatte
Copy link

@briatte briatte commented Mar 5, 2019

This issue makes it currently impossible to work with many Stata datasets using haven, since it kills the print function.

@gergness
Copy link
Contributor Author

@gergness gergness commented Mar 5, 2019

FWIW as an interim fix, you can set options(haven.show_pillar_labels = FALSE) to use the old print method.

@elbersb
Copy link

@elbersb elbersb commented Oct 8, 2019

Anything we can help with to fix this? I run into this with Stata datasets frequently.

@courtiol
Copy link

@courtiol courtiol commented Oct 17, 2019

@elbersb the issue has been solved (actually at least twice: #425, #476) in pull requests, you now need to wait a little for the package maintainer (@hadley) to include the change in the package.

@gorcha
Copy link
Member

@gorcha gorcha commented Oct 17, 2019

Hi @hadley / @batpigandme, can this fix be merged some time soon? For those of us who use haven regularly this essentially breaks dplyr printing (which is incredibly useful) for a lot of datasets.

It seems like a simple and non-controversial fix.

@hadley
Copy link
Member

@hadley hadley commented Oct 18, 2019

I will try and take a look in the near future but it’s likely to be 2-3 weeks. In the mean time, you can just install the version from the PR with the fix: install_github(“tidyverse/haven#425”)

@gorcha
Copy link
Member

@gorcha gorcha commented Oct 23, 2019

Thanks!

@hadley hadley added bug wip labels Nov 6, 2019
@lock
Copy link

@lock lock bot commented May 5, 2020

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators May 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug wip
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants