New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_sav() has endless loop, write a 40GB file and still writting #245

Closed
h2appy opened this Issue Nov 30, 2016 · 4 comments

Comments

Projects
None yet
5 participants
@h2appy
Copy link

h2appy commented Nov 30, 2016

haven_test.Rdata.zip

When I use write_sav() to create a SPSS file "test.sav", the file size is growing more than 40GB and still writting.

My Environment:
haven 1.0.0

@h2appy h2appy changed the title SPSS cannot open the file that is created with write_sav() write_sav() has endless loop, write a 40GB file and still writting Nov 30, 2016

@pascaltanner

This comment has been minimized.

Copy link

pascaltanner commented Dec 13, 2016

I've got the same problem with my data. write_sav gets stuck in endless loop and can not be opened by SPSS while write_sav works perfectly fine (writing file and opening it in STATA). Unfortunately I can't provide my data at the moment.

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252 LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C                       
[5] LC_TIME=German_Germany.1252

> packageVersion("haven")
[1] ‘1.0.0’
@JhossePaul

This comment has been minimized.

Copy link

JhossePaul commented Dec 14, 2016

I confirm the bug. Cannot write SAV files for large datasets (24000 rows, 670 columns, all character). Cannot share my data for copyright reasons. Hope you can fix this

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Spanish_Mexico.1252 
[2] LC_CTYPE=Spanish_Mexico.1252   
[3] LC_MONETARY=Spanish_Mexico.1252
[4] LC_NUMERIC=C                   
[5] LC_TIME=Spanish_Mexico.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] tidyr_0.6.0      haven_1.0.0      forcats_0.1.1   
[4] data.table_1.9.7 dplyr_0.5.0     

loaded via a namespace (and not attached):
[1] readr_1.0.0    magrittr_1.5   R6_2.2.0      
[4] assertthat_0.1 DBI_0.5-1      tools_3.3.2   
[7] tibble_1.2     Rcpp_0.12.7   

ecortens added a commit to ecortens/haven that referenced this issue Dec 23, 2016

ecortens added a commit to ecortens/haven that referenced this issue Dec 23, 2016

@ecortens

This comment has been minimized.

Copy link
Contributor

ecortens commented Dec 23, 2016

Just created a pull request that fixes this bug. It's caused by having one or more string columns in the data frame that have a length of 0, e.g.:

data <- data.frame(a = c("", "", ""))

The pull request fixes this by making the minimum length of a string column 1, rather than 0. (More details in the pull request comment.)

Hopefully this is easy to integrate, @hadley as it's causing some issues in production for me, and a few other folks it sounds like. I don't anticipate any unintended consequences, at worst, it's restoring the pre-August 4th functionality of ReadStat for 0-length string columns.

@evanmiller

This comment has been minimized.

Copy link
Contributor

evanmiller commented Dec 23, 2016

Fixed in WizardMac/ReadStat@c443d7f

ecortens added a commit to ecortens/haven that referenced this issue Dec 23, 2016

@hadley hadley closed this in 42c8883 Jan 25, 2017

@lock lock bot locked and limited conversation to collaborators Jun 26, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.