Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing a variable with missing values but no labels #219

Closed
larmarange opened this issue Sep 20, 2016 · 6 comments
Closed

Importing a variable with missing values but no labels #219

larmarange opened this issue Sep 20, 2016 · 6 comments

Comments

@larmarange
Copy link
Contributor

@larmarange larmarange commented Sep 20, 2016

In SPSS, you could have a variable with defined missing values but no value label. In such case, read_spss is producing a vector with a structure like:

> x <- structure(c(1.2, 2.4), class = c("labelled_spss", "labelled"), na_values = 9)

If it's working with print, there is a bug when using summary:

> x
<Labelled SPSS double>
[1] 1.2 2.4
Missing values: 9
> summary(x)
Error: `x` and `labels` must be same type

In fact, as currently written, labelled_spss format doesn't allow a vector with no value labels.

> labelled_spss(c(1.2, 2.4), NULL, na_values = 9)
Error: `x` and `labels` must be same type 
@hadley
Copy link
Member

@hadley hadley commented Jan 25, 2017

I think there are two problems here:

# First, you should be able to create a labelled spss vector with no labels
# (That's a bit inelegant but it's expedient)
labelled_spss(c(1.2, 2.4), na_values = 9)
labelled_spss(c(1.2, 2.4), double(), na_values = 9)

# Also need to make sure the C++ code returns an object with that structure

# Second you should be able to subset that object
# (That's the root cause of the summary failure)
x <- structure(c(1.2, 2.4), class = c("labelled_spss", "labelled"), na_values = 9)
x[1]

Would you mind providing an SPSS file that contains that variable so I can test?

@hadley
Copy link
Member

@hadley hadley commented Feb 15, 2018

@larmarange do you still have any interest in this issue?

@huftis
Copy link
Contributor

@huftis huftis commented Apr 24, 2018

I’m not the original submitter, but here’s an example SPSS file as requested, @hadley:
missing-no-label.zip

Example code to trigger the error message:

library(haven)
d <- read_sav("https://github.com/tidyverse/haven/files/1942576/missing-no-label.zip", user_na = TRUE)
summary(d$x)
#> Error: `x` and `labels` must be same type

@dusadrian
Copy link

@dusadrian dusadrian commented Oct 15, 2018

I was recently thinking about a similar example, and concluded it simply does not make any sense to have a declared missing value with no label. This would be similar to a general NA in R.

How about using the value itself as a label? Something like:

x <- structure(c(1.2, 2.4), class = c("labelled_spss", "labelled"),
               labels = c("9" = 9), na_values = 9)

Something like that could be done at import time.

@hadley
Copy link
Member

@hadley hadley commented Jan 24, 2019

Reprex with built-in dataset:

library(haven)
sav <- system.file("files", "testdata.sav", package = "foreign")
x <- read_spss(file = sav, user_na = TRUE)

x$numeric_long_label
#> <Labelled SPSS double>
#> [1] 1.00000 2.00000 3.33333 4.00000      NA
#> Missing range:  [1, 2]
x$numeric_long_label[1:3]
#> Error: `x` and `labels` must be same type

Created on 2019-01-23 by the reprex package (v0.2.1.9000)

@lock
Copy link

@lock lock bot commented Jul 23, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants