Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_sav() has endless loop, write a 40GB file and still writting #245

Closed
h2appy opened this issue Nov 30, 2016 · 4 comments
Closed

write_sav() has endless loop, write a 40GB file and still writting #245

h2appy opened this issue Nov 30, 2016 · 4 comments

Comments

@h2appy
Copy link

@h2appy h2appy commented Nov 30, 2016

haven_test.Rdata.zip

When I use write_sav() to create a SPSS file "test.sav", the file size is growing more than 40GB and still writting.

My Environment:
haven 1.0.0

@h2appy h2appy changed the title SPSS cannot open the file that is created with write_sav() write_sav() has endless loop, write a 40GB file and still writting Nov 30, 2016
@pascaltanner
Copy link

@pascaltanner pascaltanner commented Dec 13, 2016

I've got the same problem with my data. write_sav gets stuck in endless loop and can not be opened by SPSS while write_sav works perfectly fine (writing file and opening it in STATA). Unfortunately I can't provide my data at the moment.

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252 LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C                       
[5] LC_TIME=German_Germany.1252

> packageVersion("haven")
[1] ‘1.0.0’
@JhossePaul
Copy link

@JhossePaul JhossePaul commented Dec 14, 2016

I confirm the bug. Cannot write SAV files for large datasets (24000 rows, 670 columns, all character). Cannot share my data for copyright reasons. Hope you can fix this

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Spanish_Mexico.1252 
[2] LC_CTYPE=Spanish_Mexico.1252   
[3] LC_MONETARY=Spanish_Mexico.1252
[4] LC_NUMERIC=C                   
[5] LC_TIME=Spanish_Mexico.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] tidyr_0.6.0      haven_1.0.0      forcats_0.1.1   
[4] data.table_1.9.7 dplyr_0.5.0     

loaded via a namespace (and not attached):
[1] readr_1.0.0    magrittr_1.5   R6_2.2.0      
[4] assertthat_0.1 DBI_0.5-1      tools_3.3.2   
[7] tibble_1.2     Rcpp_0.12.7   
ecortens added a commit to ecortens/haven that referenced this issue Dec 23, 2016
ecortens added a commit to ecortens/haven that referenced this issue Dec 23, 2016
@ecortens
Copy link
Contributor

@ecortens ecortens commented Dec 23, 2016

Just created a pull request that fixes this bug. It's caused by having one or more string columns in the data frame that have a length of 0, e.g.:

data <- data.frame(a = c("", "", ""))

The pull request fixes this by making the minimum length of a string column 1, rather than 0. (More details in the pull request comment.)

Hopefully this is easy to integrate, @hadley as it's causing some issues in production for me, and a few other folks it sounds like. I don't anticipate any unintended consequences, at worst, it's restoring the pre-August 4th functionality of ReadStat for 0-length string columns.

@evanmiller
Copy link
Collaborator

@evanmiller evanmiller commented Dec 23, 2016

Fixed in WizardMac/ReadStat@c443d7f

ecortens added a commit to ecortens/haven that referenced this issue Dec 23, 2016
@hadley hadley closed this in 42c8883 Jan 25, 2017
@lock lock bot locked and limited conversation to collaborators Jun 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants