Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spread causes system to run out of memory #13

Closed
jtlowell opened this issue Jul 18, 2014 · 4 comments
Closed

spread causes system to run out of memory #13

jtlowell opened this issue Jul 18, 2014 · 4 comments

Comments

@jtlowell
Copy link

How long/wide can a data frame be -- going from gathered to spread form?

I have a 200,000 row data frame I'm spreading to create 200,000 columns and I'm running out of memory.

Have you tested to see limits on the operations with various machines?

@hadley
Copy link
Member

hadley commented Jul 21, 2014

I have not. It might be possible to replace the vectorised R code with optimised C++ code that would need less memory.

@hadley hadley closed this as completed Aug 22, 2014
@JamesOwers
Copy link

Just a quick note that I'm having memory issues with spread(..., drop=FALSE). If I use spread(..., drop=TRUE) then everything works out fine, the process takes just a few seconds, and the result is of size 0.2Mb.

My input dataset is 0.4MB, has 6000 rows, and 11 variables. This is the result of a filter on a dataset which is of size 200Mb. When running with spread(..., drop=FALSE), the rsession memory expands to over 20Gb.

Unfortunately I can't provide the exact dataset, but if there is anything I can provide to help, I'll be happy to do so.

@hadley
Copy link
Member

hadley commented Dec 11, 2015

How many unique values are there in the variables that you are spreading? It is easy to create very very large data frames with spread.

@JamesOwers
Copy link

There are some numeric variables with a few thousand unique values, but isn't spread just going to make a variable for each key? Also, by virtue of spread(..., drop=TRUE) working fine, the only variables remaining to spread only have one value: NA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants