Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_sav: long Cyrillic labels in factors cause crash #395

alexeyknorre opened this issue Aug 2, 2018 · 5 comments

write_sav: long Cyrillic labels in factors cause crash #395

alexeyknorre opened this issue Aug 2, 2018 · 5 comments


Copy link

@alexeyknorre alexeyknorre commented Aug 2, 2018

Hey there,

I am working with Russian survey dataset (16k obs, 200 vars) and was trying to save it to SPSS using haven::write_sav. The dataset contains lots of character and factor variables, and when calling write_sav the R session crashes. I reproduced it on another Windows machine with lots of RAM.

I managed to establish the cause of the crash: long factor labels in Cyrillic/Russian. The reproducible example is below, . Interestingly, the workaround here is to shorten factor labels to 65-68 characters. Possibly, non-Latin characters require more bytes, so default limit in 120 characters should be smaller for non-Latin characters?

text <- "Что может быть проще примитивного нуль-передатчика? Только примитивный нуль-аккумулятор."

df <- data.frame(variable = factor(sample(c(text, "another value"), 10000, replace = T)))

# Uncomment to avoid crash
#levels(df$variable) <- substr(levels(df$variable), 0, 65)

haven::write_sav(df, "test.sav")

Session info:

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

[1] LC_COLLATE=Russian_Russia.1251  LC_CTYPE=Russian_Russia.1251    LC_MONETARY=Russian_Russia.1251
[4] LC_NUMERIC=C                    LC_TIME=Russian_Russia.1251    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.5.0  magrittr_1.5    hms_0.4.2       tools_3.5.0     pillar_1.2.3    haven_1.1.2    
 [7] tibble_1.4.2    yaml_2.1.19     Rcpp_0.12.17    forcats_0.3.0   pkgconfig_2.0.1 rlang_0.2.1   
Copy link

@hadley hadley commented Jan 23, 2019

@evanmiller I think this is one for you. I see:

* thread #1, queue = '', stop reason = signal SIGABRT
  * frame #0: 0x00007fff6df7f23e libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff6e035c1c libsystem_pthread.dylib`pthread_kill + 285
    frame #2: 0x00007fff6dee81c9 libsystem_c.dylib`abort + 127
    frame #3: 0x00007fff6dee833c libsystem_c.dylib`abort_report_np + 177
    frame #4: 0x00007fff6df0cc8e libsystem_c.dylib`__chk_fail + 48
    frame #5: 0x00007fff6df0cc5e libsystem_c.dylib`__chk_fail_overflow + 16
    frame #6: 0x00007fff6df0d16d libsystem_c.dylib`__memcpy_chk + 18
    frame #7: 0x000000010863de29`sav_begin_data at readstat_sav_write.c:481 [opt]
    frame #8: 0x000000010863dc0d`sav_begin_data(writer_ctx=<unavailable>) at readstat_sav_write.c:1071 [opt]
    frame #9: 0x00000001086299a7`readstat_begin_row(writer=0x000000010b50d750) at readstat_writer.c:494 [opt]
    frame #10: 0x000000010865823a`Writer::write(this=<unavailable>) at DfWriter.cpp:130 [opt]
    frame #11: 0x0000000108657c65`write_sav_(data=<unavailable>, path=<unavailable>, compress=false) at DfWriter.cpp:352 [opt]
    frame #12: 0x000000010865df20`::_haven_write_sav_(dataSEXP=0x0000000108e1bde8, pathSEXP=0x0000000108e0aef8, compressSEXP=0x0000000108ca5858) at RcppExports.cpp:145 [opt]

Copy link

@evanmiller evanmiller commented Jan 24, 2019

Thanks. Looks like an integer wraparound issue on my end.

evanmiller added a commit to WizardMac/ReadStat that referenced this issue Jan 24, 2019
Copy link

@evanmiller evanmiller commented Jan 24, 2019

Should be fixed in

@hadley hadley closed this in e0d3d2e Jan 24, 2019
Copy link

@hadley hadley commented Jan 24, 2019

Thanks @evanmiller!

Copy link

@lock lock bot commented Jul 23, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue.

@lock lock bot locked and limited conversation to collaborators Jul 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants