You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a file with 20Gb of data that I need to process. When I use a pandas dataframe, the full 20Gb need to be loaded. That will make the computer slow or even crash. Can this process be made more efficient by automatically (very very important that the user does not have to do anything here) loads a chunk, processes it, writes it, loads the second chunk, etc.
This is stuff is possible, it is done by ROOT for instance.
Feature Description
This would just work with the normal dataframes, there could be an option like
pd.chunk_size=100
which would process 100Mb at a time. So that no more than 100 Mb would be in memory.
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I have a file with 20Gb of data that I need to process. When I use a pandas dataframe, the full 20Gb need to be loaded. That will make the computer slow or even crash. Can this process be made more efficient by automatically (very very important that the user does not have to do anything here) loads a chunk, processes it, writes it, loads the second chunk, etc.
This is stuff is possible, it is done by ROOT for instance.
Feature Description
This would just work with the normal dataframes, there could be an option like
which would process 100Mb at a time. So that no more than 100 Mb would be in memory.
Alternative Solutions
Alternatively we can
Additional Context
No response
The text was updated successfully, but these errors were encountered: