-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPSS value labels longer than 255 bytes cause crash #262
Comments
Can you please create a minimal reproducible example, and format your issue nicely so it's easier for me to read? |
I think I ran into the same problem. Here's a minimal reproducible example:
|
@rubenarslan perfect - thanks! |
Possibly fixed by: WizardMac/ReadStat@e4b4c1b |
@evanmiller with the latest readstat, both examples now crash:
|
@hadley That is strange since the 252 and 253-byte string are now under test coverage... do other <252 and >255 length strings crash too? |
@evanmiller yeah, I've only run a few test cases but length 100 is ok, but 200 is not. I can try and narrow it down more if that would help |
@evanmiller btw the problem isn't actually strings, but is actually value labels. |
Slightly simpler reprex: n <- 100
df <- data.frame(long = paste(rep("a", n), collapse = ""))
write_sav(df, path = tempfile()) |
@hadley Thanks, that makes sense, will investigate. |
Welp: https://github.com/WizardMac/ReadStat/blob/master/src/spss/readstat_sav_write.c#L22 The file format limits value labels to 120 chars, so ReadStat attempts to truncate to 120. Likely a buffer overrun or something for 120+ bytes. |
If these are value labels, what is the underlying data type and storage size? (The SAV writer has separate code paths for string values longer than 8 bytes vs 8 bytes or shorter.) |
May or may not fix things: WizardMac/ReadStat@7e2965d |
Looks good - thanks @evanmiller ! |
FYI the news note is misleading as value labels are still truncated to 120 chars. |
Ooops, not only that but I guess I forgot to test it with a long enough label :/ |
I've added an R-side check for now. If you figure out the root problem, I can change the error to a warning. |
@hadley I think the root problem is SPSS, I also cannot make value labels longer than 120 in the software itself. But long strings work fine now, thanks a lot! |
So strings longer than 255 still don't work with haven installed just now from Github. Sorry, I think I misunderstood some of the above. Is this now a problem with value labels? But strings don't have value labels in SPSS right? Value labels have an actual hard limit, strings longer than 255 just get a special treatment under the hood?
|
@rubenarslan you did not create a string there. |
Sorry, I have
|
@rubenarslan Are you using the latest code? Your example works fine for me. > devtools::install_github("tidyverse/haven")
> library(haven)
> n <- 256
> df <- data.frame(long = paste(rep("a", n), collapse = ""), stringsAsFactors = FALSE)
> write_sav(df, path = "test.sav")
> df1 <- read_sav("test.sav")
> df1
# A tibble: 1 × 1
long
<chr>
1 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> |
I am. SPSS refuses to open the file, haven has no problem.
|
I assume you don't have SPSS to test? Maybe this helps, I made three test files. |
@rubenarslan Please open a new issue if the issue is SPSS compatibility rather than a crash. |
Sorry, hadley changed the title of this issue from "support" to "crash" after I started commenting. I assumed we were talking about the same thing (I mentioned the SPSS error only in the off-screen comment of my original reply), but didn't read the title of the original issue carefully enough. |
i can confirm this issue still seems to persist, |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Haven cannot create a SPSS file with a string longer than 255 bytes but Python can.
Vanilla SAS cannot, but it splits the string into 200 byte chunks over multiple varaibles;
Suggestion on improving the write_sav and write_sas in Haven.
but crashes with longer strings;
%utl_submit_r64('
source("C:/Program Files/R/R-3.3.2/etc/Rprofile.site", echo=T);
library(haven);
str<-as.data.frame(paste(replicate(100, "roger"), collapse = ""));
colnames(str)<-"String";
str;
write_sav(str,"d:/sav/str.sav");
fro<-read_sav("d:/sav/str.sav");
fro;
');
hangs
visual studio 5 exception
unhandled win 32 exception
Maybe you can look at the code?
I am a statistician and drop down to WPS, Stattransfer, R, Perl and Python from SAS.
Using the functionality
This works with long strings;
CREATE SPSS dataset using Python;
create a SPSS file where var1 is 3000 bytes;
Python seems better that R for SPSS;
seems to work better when you overspecity string length meta data;
PYTHON;
%utl_submit_py64('
import savReaderWriter as sav;
savFileName = "d:/rio/mtcars.sav";
newstring = "a"*300;
print(newstring);
records = [[newstring, 1, 1], [newstring, 2, 1]];
varNames = ["var1", "v2", "v3"];
varTypes = {"var1": 500, "v2": 0, "v3": 0};
with sav.SavWriter(savFileName, varNames, varTypes) as writer:;
. for record in records:;
. writer.writerow(record);
');
using the free express version of WPS (proc r) I can
create a SAS dtatset using the Python output.
Haven can input the long strings;
Stattransfer can handle the long string (in and out);
create a SAS dataset from sav file;
%utl_submit_wps64('
options set=R_HOME "C:/Program Files/R/R-3.3.2";
libname wrk "%sysfunc(pathname(work))";
proc r;
submit;
source("C:/Program Files/R/R-3.3.2/etc/Rprofile.site", echo=T);
library(haven);
strsas<-read_sav("d:/rio/mtcars.sav");
strsas;
endsubmit;
import r=strsas data=wrk.strsas;
run;quit;
');
/* SAS dataset strsas */
Variables in Creation Order
Variable Type Len
1 VAR1 Char 300
2 V2 Num 8
3 V3 Num 8
The text was updated successfully, but these errors were encountered: