Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPSS Error # 1405 when reading haven-created SAV files containing 256+ byte strings #266

Closed
rubenarslan opened this issue Jan 26, 2017 · 21 comments

Comments

@rubenarslan
Copy link
Contributor

@rubenarslan rubenarslan commented Jan 26, 2017

SPSS (v.20.0) cannot read files written by haven if there are any strings longer than 255 characters (or bytes, strings with umlauts need to be shorter still).

to reproduce

n <- 256
df <- data.frame(long = paste(rep("a", n), collapse = ""), stringsAsFactors = FALSE)
write_sav(df, path = "test.sav")

test files

I made three test files.
In all, I wrote a long string consisting of "a" with one "b" at the end.

  1. is 255 characters, written in haven, it opens in SPSS.
  2. is 256 characters, written in haven, it doesn't open in SPSS.
  3. is the first one, with the string width increased from 255 to 256 in SPSS (added one a). This one opens fine too.

test_files.zip

original discusson

PS.: @evanmiller Let me know if I should raise these issues directly in ReadStat. I only use ReadStat through haven though, so wouldn't be able to make reprexes.

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Jul 8, 2017

Should be fixed in WizardMac/ReadStat@f6aef4c

@rubenarslan
Copy link
Contributor Author

@rubenarslan rubenarslan commented Jul 10, 2017

Great. Can I test with my real example already or do I need to wait until this is pulled into haven?

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Jul 10, 2017

I think this fix just barely missed the 1.1.0 cutoff, so you'll have to wait until it gets pulled into haven.

@aganalyticsemc
Copy link

@aganalyticsemc aganalyticsemc commented Dec 14, 2017

Never mind. I managed to fix the issue myself by adjusting the readstat_sav_read.c and
readstat_sav_write.c according to your fix above. Additionally I changed these constants:

#define MAX_STRING_SIZE 2000
#define MAX_LABEL_SIZE 2000

At last I just want to thank you for creating an awesome package!! :).

@evanmiller When do you expect to have this fix pulled into haven? I am experiencing the same issue as @rubenarslan . SPSS won't open a .sav saved using write.sav if of the saved dataframe contains a string with more than 255 bytes. I've tried to manually adjust ReadStat as you mentioned in the fix. However, this didn't seem to work :/

@hadley
Copy link
Member

@hadley hadley commented Feb 16, 2018

This appears to still be a problem even with latest readstat

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Feb 16, 2018

Error message from #346:

Error. Command name: GET FILE
Invalid SPSS Statistics data file: test.sav (DATA1204)
Execution of this command stops.
Error # 1405 in column 8. Text: test.sav
Error when attempting to get a data file.
GET
FILE='test.sav'.

My current theory is as follows. Variables in an SAV file contain a "virtual" variable for each 256-byte chunk. ReadStat gives each virtual variable the same name. However, in the sample file provided, it appears that SPSS assigns a unique name to each virtual variable (e.g. LONG and LONG0). I'll see if the PSPP docs have anything to say about this.

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Feb 16, 2018

PSPP has this to say:

The ‘name’ fields should be unique within a system file. System files written by SPSS that contain very long string variables with similar names sometimes contain duplicate names that are later eliminated by resolving the very long string names (see Very Long String Record). PSPP handles duplicates by assigning them new, unique names.

Source

@rubenarslan
Copy link
Contributor Author

@rubenarslan rubenarslan commented Feb 16, 2018

Does it help you verify your theory if I generate files with more 5*256 letters or so?

evanmiller added a commit to WizardMac/ReadStat that referenced this issue Feb 16, 2018
Possibly fixes tidyverse/haven#266
hadley added a commit that referenced this issue Feb 16, 2018
@hadley
Copy link
Member

@hadley hadley commented Feb 16, 2018

@rubenarslan I just updated haven with @evanmiller's latest code. Can you please try again now?

@rubenarslan
Copy link
Contributor Author

@rubenarslan rubenarslan commented Feb 16, 2018

Unfortunately, the same error occurs after an update with devtools. I also attached a 512 character-string file in case that helps.

tests.zip

GET FILE='test_x.sav'.
Error. Command name: GET FILE
Invalid SPSS Statistics data file: test_x.sav (DATA1204)
Execution of this command stops.
Error # 1405 in column 8. Text: test_x.sav
Error when attempting to get a data file.

Edit: Sorry, just realised I should have made the long file with SPSS, not R. Stupid. Here it is: test_720.sav.zip

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Feb 16, 2018

Thanks. Might be an issue with the Variable Display Parameter... I'll have another update to test soon.

evanmiller added a commit to WizardMac/ReadStat that referenced this issue Feb 16, 2018
The Variable Display Parameter record contains information about
ordinality and text alignment. Previously, one entry was emitted for
each real variable. However, SPSS emits an entry for ghost variables as
well. This change mimics the SPSS behavior, and possibly fixes
tidyverse/haven#266
@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Feb 16, 2018

@hadley Try the latest ReadStat.

For future reference someone should give this issue a more descriptive name e.g.

SPSS Error # 1405 when reading haven-created SAV files containing 256+ byte strings

@rubenarslan rubenarslan changed the title support long strings in write_sav SPSS Error # 1405 when reading haven-created SAV files containing 256+ byte strings Feb 16, 2018
@rubenarslan
Copy link
Contributor Author

@rubenarslan rubenarslan commented Feb 16, 2018

@evanmiller that might well be it. In SPSS you also cannot lengthen a string without first increasing the variable display parameter. I'll try with the next update.

hadley added a commit that referenced this issue Feb 16, 2018
@hadley
Copy link
Member

@hadley hadley commented Feb 16, 2018

@rubenarslan try now

@rubenarslan
Copy link
Contributor Author

@rubenarslan rubenarslan commented Feb 16, 2018

Well, it now writes the 256 character file and SPSS can open it. Yay!
But if I go to 512 I get two visible V0000001 and V0000002. It's not concatenated in the long variable. In the addition, the "long" variable has the type comma at 512 chars, Date at 1024 chars etc. Seems pretty weird...

@rubenarslan
Copy link
Contributor Author

@rubenarslan rubenarslan commented Feb 16, 2018

I just noticed that foreign also cannot read the haven-generated files, but it has more informative error messages. Maybe this is helpful for writing tests in R? Unless the foreign implementation is really bad, but at least it's quite robust, right?

256 chars

foreign::read.spss("test_x.sav")
Error in foreign::read.spss("test_x.sav") :
error reading system-file header
In addition: Warning message:
In foreign::read.spss("test_x.sav") :
test_x.sav: Bad format specifier byte (0)

512 chars

foreign::read.spss("test_y.sav")
Error in foreign::read.spss("test_y.sav") :
error reading system-file header
In addition: Warning message:
In foreign::read.spss("test_y.sav") :
test_y.sav: String variable VAR0 has numeric format specifier COMMA

1024 chars

foreign::read.spss("test_z.sav")
Error in foreign::read.spss("test_z.sav") :
error reading system-file header
In addition: Warning message:
In foreign::read.spss("test_z.sav") :
test_z.sav: String variable VAR0 has numeric format specifier F

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Feb 16, 2018

@rubenarslan Please open new and separate issues against ReadStat for these.

Note that ReadStat has a command-line interface, so we should be able to debug these without getting haven involved.

readstat /path/to/spss-input.sav /path/to/readstat-output.sav
@wibrt
Copy link

@wibrt wibrt commented May 22, 2018

More feedback,
when i open an spss .sav made by spss; then there are no errors
then after loading this dataset, making some calculations; saving the dataset as .sav in R;
the same dataset is unable to be opened in spss

i don't know if this has anything to do with the labels, since the data column to which SPSS refers in its error message was not changed

referring: SPSS Error # 1405

@hadley
Copy link
Member

@hadley hadley commented Jan 23, 2019

Closing since this thread is long and seems to combine multiple issues. Please file new issues (with reprexes) if you're still seeing problems.

@hadley hadley closed this Jan 23, 2019
@rubenarslan
Copy link
Contributor Author

@rubenarslan rubenarslan commented Jan 24, 2019

The original issue has been fixed in ReadStat. WizardMac/ReadStat#122

@lock
Copy link

@lock lock bot commented Jul 23, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants