Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_sav: long Cyrillic labels in factors cause crash #395

Closed
alexeyknorre opened this issue Aug 2, 2018 · 5 comments

Comments

@alexeyknorre
Copy link

commented Aug 2, 2018

Hey there,

I am working with Russian survey dataset (16k obs, 200 vars) and was trying to save it to SPSS using haven::write_sav. The dataset contains lots of character and factor variables, and when calling write_sav the R session crashes. I reproduced it on another Windows machine with lots of RAM.

I managed to establish the cause of the crash: long factor labels in Cyrillic/Russian. The reproducible example is below, . Interestingly, the workaround here is to shorten factor labels to 65-68 characters. Possibly, non-Latin characters require more bytes, so default limit in 120 characters should be smaller for non-Latin characters?

text <- "Что может быть проще примитивного нуль-передатчика? Только примитивный нуль-аккумулятор."

df <- data.frame(variable = factor(sample(c(text, "another value"), 10000, replace = T)))

# Uncomment to avoid crash
#levels(df$variable) <- substr(levels(df$variable), 0, 65)

haven::write_sav(df, "test.sav")

Session info:

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Russian_Russia.1251  LC_CTYPE=Russian_Russia.1251    LC_MONETARY=Russian_Russia.1251
[4] LC_NUMERIC=C                    LC_TIME=Russian_Russia.1251    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.5.0  magrittr_1.5    hms_0.4.2       tools_3.5.0     pillar_1.2.3    haven_1.1.2    
 [7] tibble_1.4.2    yaml_2.1.19     Rcpp_0.12.17    forcats_0.3.0   pkgconfig_2.0.1 rlang_0.2.1   
@hadley

This comment has been minimized.

Copy link
Member

commented Jan 23, 2019

@evanmiller I think this is one for you. I see:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff6df7f23e libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff6e035c1c libsystem_pthread.dylib`pthread_kill + 285
    frame #2: 0x00007fff6dee81c9 libsystem_c.dylib`abort + 127
    frame #3: 0x00007fff6dee833c libsystem_c.dylib`abort_report_np + 177
    frame #4: 0x00007fff6df0cc8e libsystem_c.dylib`__chk_fail + 48
    frame #5: 0x00007fff6df0cc5e libsystem_c.dylib`__chk_fail_overflow + 16
    frame #6: 0x00007fff6df0d16d libsystem_c.dylib`__memcpy_chk + 18
    frame #7: 0x000000010863de29 haven.so`sav_begin_data at readstat_sav_write.c:481 [opt]
    frame #8: 0x000000010863dc0d haven.so`sav_begin_data(writer_ctx=<unavailable>) at readstat_sav_write.c:1071 [opt]
    frame #9: 0x00000001086299a7 haven.so`readstat_begin_row(writer=0x000000010b50d750) at readstat_writer.c:494 [opt]
    frame #10: 0x000000010865823a haven.so`Writer::write(this=<unavailable>) at DfWriter.cpp:130 [opt]
    frame #11: 0x0000000108657c65 haven.so`write_sav_(data=<unavailable>, path=<unavailable>, compress=false) at DfWriter.cpp:352 [opt]
    frame #12: 0x000000010865df20 haven.so`::_haven_write_sav_(dataSEXP=0x0000000108e1bde8, pathSEXP=0x0000000108e0aef8, compressSEXP=0x0000000108ca5858) at RcppExports.cpp:145 [opt]
@evanmiller

This comment has been minimized.

Copy link
Contributor

commented Jan 24, 2019

Thanks. Looks like an integer wraparound issue on my end.

evanmiller added a commit to WizardMac/ReadStat that referenced this issue Jan 24, 2019

@evanmiller

This comment has been minimized.

Copy link
Contributor

commented Jan 24, 2019

Should be fixed in
WizardMac/ReadStat@e1c41e1

@hadley hadley closed this in e0d3d2e Jan 24, 2019

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 24, 2019

Thanks @evanmiller!

@lock

This comment has been minimized.

Copy link

commented Jul 23, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants
You can’t perform that action at this time.