New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple missing values #170
Comments
I think this might need two approaches: one for SAS/Stata and one for SPSS. SAS/StataOne way to handle tagged missing values (as suggested by @tslumley) would be to use the payload of NaNs. An IEEE 754 NaN fills the exponent field (eleven bits after the sign bit) filled with ones. R sets the final 32 bits to 1954, so that leaves 20 bits we could use to store extra information and still have the value be a valid NaN and a valid NA. Probably easiest to work with the 2nd byte of the double, treating it like a char (so tagged NA "A" would be . Advantages: tagged missing values are treated like regular missing values (or possibly NaNs) Disadvantages: default print methods won't show difference; will need to write C code to access the values, re-label, and format to show value. This will also need some API for getting/setting tagged NA's, and probably a helper around for relabelling. Maybe: is_tagged_na(x)
is_tagged_na(x, "A")
relabel_na(x, D = "did not respond", "N" = "not applicable") SPSSSPSS supports flagging ranges of value as missing which means it will require another approach. To be consistent with SAS/Stata it seems reasonable that these missing values should be given value NA (so by default they are treated correctly by R), but an extra attribute could store the original value so it could be retrieved if desired. It seems reasonable to subclass |
The first of the 20 bits distinguishes quiet from signalling NaNs, so we need to be careful not to touch it. |
@tslumley oops, yes, we should use the 3rd or 4th bytes, not the 2nd. |
Now done. |
SAS and Stata support "tagged" missing values:
.A
,.B
, ...,.Z
, and for SAS._
. SPSS supports per-column user defined missing values: either up to 3 distinct values, or a range plus 1 distinct value. Haven needs to capture these special missing values in a way that preserves regular R missing values semantics, while enabling them to be labelled.Replaces #33 & #118
The text was updated successfully, but these errors were encountered: