You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem statement
Loading of the structured files used to be relative fast since the files contained only the interpolated dataframes and these were much smaller. Having the raw data available is very convenient but slows down the loading time and dramatically increases the amount of memory required.
Solution suggestion
Making the loading the of the raw data optional should solve the problem, so that unless a flag was set, the attribute would remain empty (using the auto_load_processed method). However, it's also desirable to make the to/from file methods simple and transparent so if a lot of additional complexity is required this might not be the best solution.
Alternative solutions
Another solution could be to remove the raw data attribute or turn it into a loading method that goes and looks for the original raw file. This is less desirable since it returns to the problem of locating the raw data that was used to create the structured file.
The text was updated successfully, but these errors were encountered:
So after examining this a bit, it seems the breakdown is this:
Reading the data file into a dictionary: 58% of the time
Creating the Datapath object (from_dict): 42% of the time
By simply not adding in the raw data to the object, we can reduce the "Creating the Datapath object" by about 78%, AKA reducing the total load time by 31%. Source: loading the structured big test file (250MB) on laptop repeatedly
This simple fix might also take care of the memory issues.
I'm not sure the best way to programmatically not load the raw file from disk into dictionary; one way around it would be changing the defaults so that raw data is not saved unless specified, then the current BEEPDatapath code would automatically ignore the data and the run would be marked as "legacy". I'll update my PR with some timing results.
Edit:
Decided to just not save raw data when using CLI by default. It's the easiest and cleanest solution, requiring no changes to as/from_dict. See #250
Problem statement
Loading of the structured files used to be relative fast since the files contained only the interpolated dataframes and these were much smaller. Having the raw data available is very convenient but slows down the loading time and dramatically increases the amount of memory required.
Solution suggestion
Making the loading the of the raw data optional should solve the problem, so that unless a flag was set, the attribute would remain empty (using the
auto_load_processed
method). However, it's also desirable to make the to/from file methods simple and transparent so if a lot of additional complexity is required this might not be the best solution.Alternative solutions
Another solution could be to remove the raw data attribute or turn it into a loading method that goes and looks for the original raw file. This is less desirable since it returns to the problem of locating the raw data that was used to create the structured file.
The text was updated successfully, but these errors were encountered: