New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPSS Error # 1405 when reading haven-created SAV files containing 256+ byte strings #266

Open
rubenarslan opened this Issue Jan 26, 2017 · 18 comments

Comments

Projects
None yet
5 participants
@rubenarslan
Copy link

rubenarslan commented Jan 26, 2017

SPSS (v.20.0) cannot read files written by haven if there are any strings longer than 255 characters (or bytes, strings with umlauts need to be shorter still).

to reproduce

n <- 256
df <- data.frame(long = paste(rep("a", n), collapse = ""), stringsAsFactors = FALSE)
write_sav(df, path = "test.sav")

test files

I made three test files.
In all, I wrote a long string consisting of "a" with one "b" at the end.

  1. is 255 characters, written in haven, it opens in SPSS.
  2. is 256 characters, written in haven, it doesn't open in SPSS.
  3. is the first one, with the string width increased from 255 to 256 in SPSS (added one a). This one opens fine too.

test_files.zip

original discusson

PS.: @evanmiller Let me know if I should raise these issues directly in ReadStat. I only use ReadStat through haven though, so wouldn't be able to make reprexes.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jul 8, 2017

Should be fixed in WizardMac/ReadStat@f6aef4c

@rubenarslan

This comment has been minimized.

Copy link

rubenarslan commented Jul 10, 2017

Great. Can I test with my real example already or do I need to wait until this is pulled into haven?

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Jul 10, 2017

I think this fix just barely missed the 1.1.0 cutoff, so you'll have to wait until it gets pulled into haven.

@aganalyticsemc

This comment has been minimized.

Copy link

aganalyticsemc commented Dec 14, 2017

Never mind. I managed to fix the issue myself by adjusting the readstat_sav_read.c and
readstat_sav_write.c according to your fix above. Additionally I changed these constants:

#define MAX_STRING_SIZE 2000
#define MAX_LABEL_SIZE 2000

At last I just want to thank you for creating an awesome package!! :).

@evanmiller When do you expect to have this fix pulled into haven? I am experiencing the same issue as @rubenarslan . SPSS won't open a .sav saved using write.sav if of the saved dataframe contains a string with more than 255 bytes. I've tried to manually adjust ReadStat as you mentioned in the fix. However, this didn't seem to work :/

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Feb 16, 2018

This appears to still be a problem even with latest readstat

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Feb 16, 2018

Error message from #346:

Error. Command name: GET FILE
Invalid SPSS Statistics data file: test.sav (DATA1204)
Execution of this command stops.
Error # 1405 in column 8. Text: test.sav
Error when attempting to get a data file.
GET
FILE='test.sav'.

My current theory is as follows. Variables in an SAV file contain a "virtual" variable for each 256-byte chunk. ReadStat gives each virtual variable the same name. However, in the sample file provided, it appears that SPSS assigns a unique name to each virtual variable (e.g. LONG and LONG0). I'll see if the PSPP docs have anything to say about this.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Feb 16, 2018

PSPP has this to say:

The ‘name’ fields should be unique within a system file. System files written by SPSS that contain very long string variables with similar names sometimes contain duplicate names that are later eliminated by resolving the very long string names (see Very Long String Record). PSPP handles duplicates by assigning them new, unique names.

Source

@rubenarslan

This comment has been minimized.

Copy link

rubenarslan commented Feb 16, 2018

Does it help you verify your theory if I generate files with more 5*256 letters or so?

evanmiller added a commit to WizardMac/ReadStat that referenced this issue Feb 16, 2018

hadley added a commit that referenced this issue Feb 16, 2018

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Feb 16, 2018

@rubenarslan I just updated haven with @evanmiller's latest code. Can you please try again now?

@rubenarslan

This comment has been minimized.

Copy link

rubenarslan commented Feb 16, 2018

Unfortunately, the same error occurs after an update with devtools. I also attached a 512 character-string file in case that helps.

tests.zip

GET FILE='test_x.sav'.
Error. Command name: GET FILE
Invalid SPSS Statistics data file: test_x.sav (DATA1204)
Execution of this command stops.
Error # 1405 in column 8. Text: test_x.sav
Error when attempting to get a data file.

Edit: Sorry, just realised I should have made the long file with SPSS, not R. Stupid. Here it is: test_720.sav.zip

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Feb 16, 2018

Thanks. Might be an issue with the Variable Display Parameter... I'll have another update to test soon.

evanmiller added a commit to WizardMac/ReadStat that referenced this issue Feb 16, 2018

SAV writer: Fix Variable Display Parameter w/ long strings
The Variable Display Parameter record contains information about
ordinality and text alignment. Previously, one entry was emitted for
each real variable. However, SPSS emits an entry for ghost variables as
well. This change mimics the SPSS behavior, and possibly fixes
tidyverse/haven#266
@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Feb 16, 2018

@hadley Try the latest ReadStat.

For future reference someone should give this issue a more descriptive name e.g.

SPSS Error # 1405 when reading haven-created SAV files containing 256+ byte strings

@rubenarslan rubenarslan changed the title support long strings in write_sav SPSS Error # 1405 when reading haven-created SAV files containing 256+ byte strings Feb 16, 2018

@rubenarslan

This comment has been minimized.

Copy link

rubenarslan commented Feb 16, 2018

@evanmiller that might well be it. In SPSS you also cannot lengthen a string without first increasing the variable display parameter. I'll try with the next update.

hadley added a commit that referenced this issue Feb 16, 2018

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Feb 16, 2018

@rubenarslan try now

@rubenarslan

This comment has been minimized.

Copy link

rubenarslan commented Feb 16, 2018

Well, it now writes the 256 character file and SPSS can open it. Yay!
But if I go to 512 I get two visible V0000001 and V0000002. It's not concatenated in the long variable. In the addition, the "long" variable has the type comma at 512 chars, Date at 1024 chars etc. Seems pretty weird...

@rubenarslan

This comment has been minimized.

Copy link

rubenarslan commented Feb 16, 2018

I just noticed that foreign also cannot read the haven-generated files, but it has more informative error messages. Maybe this is helpful for writing tests in R? Unless the foreign implementation is really bad, but at least it's quite robust, right?

256 chars

foreign::read.spss("test_x.sav")
Error in foreign::read.spss("test_x.sav") :
error reading system-file header
In addition: Warning message:
In foreign::read.spss("test_x.sav") :
test_x.sav: Bad format specifier byte (0)

512 chars

foreign::read.spss("test_y.sav")
Error in foreign::read.spss("test_y.sav") :
error reading system-file header
In addition: Warning message:
In foreign::read.spss("test_y.sav") :
test_y.sav: String variable VAR0 has numeric format specifier COMMA

1024 chars

foreign::read.spss("test_z.sav")
Error in foreign::read.spss("test_z.sav") :
error reading system-file header
In addition: Warning message:
In foreign::read.spss("test_z.sav") :
test_z.sav: String variable VAR0 has numeric format specifier F

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Feb 16, 2018

@rubenarslan Please open new and separate issues against ReadStat for these.

Note that ReadStat has a command-line interface, so we should be able to debug these without getting haven involved.

readstat /path/to/spss-input.sav /path/to/readstat-output.sav
@wibrt

This comment has been minimized.

Copy link

wibrt commented May 22, 2018

More feedback,
when i open an spss .sav made by spss; then there are no errors
then after loading this dataset, making some calculations; saving the dataset as .sav in R;
the same dataset is unable to be opened in spss

i don't know if this has anything to do with the labels, since the data column to which SPSS refers in its error message was not changed

referring: SPSS Error # 1405

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment