New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_sav does not write "Measure" info to file #133

Closed
dgromer opened this Issue Nov 10, 2015 · 14 comments

Comments

Projects
None yet
4 participants
@dgromer
Copy link

dgromer commented Nov 10, 2015

SPSS files have a "Measure" type for each variable (one of Scale, Ordinal or Nominal). When opening a file written with write_sav, SPSS tells "Unknown" in the Measure column (just like when you create a new variable in SPSS).

spss_measure

SPSS seems to use this information for plotting and some statistical tests (not sure though).

Looking at the ReadStat source (https://github.com/WizardMac/ReadStat/blob/master/src/readstat_sav_write.c) it looks like the Variable Display Parameter Record (see http://www.gnu.org/software/pspp/pspp-dev/html_node/Variable-Display-Parameter-Record.html#Variable-Display-Parameter-Record) is not implemented yet. @evanmiller

From a haven standpoint of view, the question is how to set the appropriate Measure, e.g. R unordered factor -> Nominal, R ordered factor -> Ordinal, R numeric -> Scale.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 30, 2016

@evanmiller is this something you're planning on supporting?

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented May 30, 2016

@tklebel

This comment has been minimized.

Copy link
Contributor

tklebel commented May 30, 2016

It seems reasonable to me, especially if you have a workflow like this:

*.sav -> import to R -> compute things -> re-export to SPSS

It would be good to preserve the initial information in this case.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 30, 2016

@tklebel preserving full information for a round trip is beyond the scope of haven.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented May 31, 2016

@hadley are you OK if I break readstat_add_variable to support this? The parameter list is getting cumbersome -- I seem to recall the design was partly to ensure that variables were immutable after creation -- something in haven benefited from this immutability IIRC. Anyway I'd like to refactor the API into something like:

readstat_variable_t *readstat_add_variable(readstat_writer_t *writer,
         readstat_types_t type, const char *name);

void readstat_variable_set_width(readstat_variable_t *variable,
         size_t width);

void readstat_variable_set_label(readstat_variable_t *variable,
         const char *label);

void readstat_variable_set_measure(readstat_variable_t *variable,
         readstat_measure_t measure); // an enum or something 

That way we can continue to add more "stuff" like measure, alignment, etc. Any objections?

@hadley

This comment has been minimized.

Copy link
Member

hadley commented May 31, 2016

Fine with me

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented May 31, 2016

Ok, I have a branch going (https://github.com/WizardMac/ReadStat/tree/set-variable-attributes) with breaking changes along these lines. Relevant API:

typedef enum readstat_measure_e {
    READSTAT_MEASURE_UNKNOWN = -1,
    READSTAT_MEASURE_NOMINAL = 1,
    READSTAT_MEASURE_ORDINAL,
    READSTAT_MEASURE_INTERVAL,
    READSTAT_MEASURE_RATIO
} readstat_measure_t;

typedef enum readstat_alignment_e {
    READSTAT_ALIGNMENT_UNKNOWN = -1,
    READSTAT_ALIGNMENT_LEFT = 1,
    READSTAT_ALIGNMENT_CENTER,
    READSTAT_ALIGNMENT_RIGHT
} readstat_alignment_t;

readstat_measure_t readstat_variable_get_measure(readstat_variable_t *variable);
readstat_alignment_t readstat_variable_get_alignment(readstat_variable_t *variable);

// Define your variables. Note that `width' is only used for READSTAT_TYPE_STRING variables.
readstat_variable_t *readstat_add_variable(readstat_writer_t *writer, const char *name,
    readstat_types_t type, size_t width);
void readstat_variable_set_label(readstat_variable_t *variable, const char *label);
void readstat_variable_set_format(readstat_variable_t *variable, const char *format);
void readstat_variable_set_label_set(readstat_variable_t *variable, readstat_label_set_t *label_set);
void readstat_variable_set_measure(readstat_variable_t *variable, readstat_measure_t measure);
void readstat_variable_set_alignment(readstat_variable_t *variable, readstat_alignment_t alignment);

Internally SPSS treats both interval and ratio variables as "scale" variables. When parsing ReadStat will return these as READSTAT_MEASURE_INTERVAL. I'm not really sure if the ratio/interval distinction is useful but it seems to be traditional in the stats world.

These methods still need to be adapted to Stata and SAS. This will be a little strange as Stata packs the alignment information into the format string (%20s versus %-20s).

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jun 2, 2016

I've merged these breaking changes into ReadStat master, and also adapted the left/right/center alignment code to work with Stata. (The alignment stuff is probably irrelevant to haven.) Still not sure if the interval/ratio distinction is useful; might rip it out so put on your hard hats.

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 2, 2016

FWIW the ratio/interval distinction isn't important to me

@hadley hadley closed this in bbf66c5 Jun 2, 2016

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 2, 2016

I have no way to check this, so @dgromer can you please try it out?

@dgromer

This comment has been minimized.

Copy link

dgromer commented Jun 3, 2016

@hadley SPSS does not open SAV file written with latest version and fails with the following error:

GET
  FILE='C:\Users\***\Documents\haven.sav'.

>Error.  Command name: GET FILE
>Invalid SPSS Statistics data file: C:\Users\***\Documents\haven.sav (DATA1204)
>Execution of this command stops.

>Error # 1405 in column 8.  Text: C:\Users\***\Documents\haven.sav
>Error when attempting to get a data file.
DATASET NAME DataSet1 WINDOW=FRONT.

CRAN version of haven works. I used the following code to create the SAV file:

data_ <- data.frame(x = 1:3, y = factor(1:3))
haven::write_sav(data_, "haven.sav")
@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jun 3, 2016

@hadley I am not sure but this commit might fix the issue: WizardMac/ReadStat@8ef4dec

hadley added a commit that referenced this issue Jun 3, 2016

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jun 3, 2016

@dgromer can you please try again now?

@dgromer

This comment has been minimized.

Copy link

dgromer commented Jun 3, 2016

@hadley @evanmiller yes works! And sets "Measure" column correctly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.