Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory management is a little worrying #219

Open
dt-woods opened this issue Dec 19, 2023 · 4 comments
Open

Memory management is a little worrying #219

dt-woods opened this issue Dec 19, 2023 · 4 comments

Comments

@dt-woods
Copy link
Collaborator

The latest runs of ELCI_1 trigger the ba_io_trading_model in eia_io_trading.py. There is a bulk data file and during the call to ba_exchange_to_df, the memory demand spikes to >11 GB. I've hit Python segmentation faults during this, which was solved by restarting my computer and re-running. Seems worthy of a cautionary tale for users.

I see a few instances of memory management where the massive lists of strings are deleted after processing.

In response, I started to parse out subroutines from the really long method. Not sure what all else can be done given the shear size of the bulk text file (>3 GB) and that it's stored primarily as Python string objects. I might look into optimizing the data types when the text file is processed. An alternative may just be to break the monster file into smaller files, process them individually, then put it back together.

@m-jamieson
Copy link
Collaborator

I think the bulk data when we first developed this was only 1GB or something.

Originally my concerns were with the time it took to process the file, and I didn't really see many ways to optimize it because of the way the data is stored. Breaking the file up seems a reasonable interim plan - is it possible to search for changes in data year and break it up that way? Similarly, reading the file in chunks.

In some future version where we use EIA API, this problem likely goes away, and we can at least process jsons rather than plain text.

@dt-woods
Copy link
Collaborator Author

Please, please, please, let there be an API for that!

@dt-woods
Copy link
Collaborator Author

Now that I'm on to testing ELCI_3, I'm hitting more seg faults (and one bus fault) and it's giving me flashbacks to my early coding career when I used to do too much with passing variables globally. There are a lot of hints of that going on here, especially where modules are imported within scope of a method, initializing globals used elsewhere, globals being referenced in methods, globals being sliced and modified. All are a good recipe for unmanaged memory.
Best advice I can give (and not sure how much can be implemented given the scope) is the following:

  • Plan (and know) your use cases. It helps to diagram your program's operational procedure (e.g., what calls are made and when for each use case). This helps identify where and what information is required and may shed light on where seg faults can occur.
  • Initialize your globals at the onset. Since your configuration is defined at the first step of the program's operation, you should know (as the developer) exactly what data you need. Initialize it so it's ready when the method(s) are called. This limits the chance that a running method isn't calling a subroutine that imports a module, which has the global definition that needs to be initialized for the original method. I've hit circularity dependency errors with electricitylci that leads me to believe this is quite possible (i.e., module import order matters).
  • Use function parameters vigilantly. There's no reason why you can't send data as a parameter to a method and have the method return data back. This makes mapping/managing memory a lot easier than manipulating globals (or depending on them).
  • Try to avoid global variables that change. In my experience a global variable is better suited to a constant value shared among methods (i.e., it assumes it value and all methods that need know where to find it). Mutable variables that are in global scope are nightmares. Do yourself a favor and pass them as arguments (adheres better to the transparency clause associated with your coding project).

@dt-woods dt-woods changed the title Memory management in ba_io_trading_model is a little worrying Memory management is a little worrying Dec 22, 2023
dt-woods added a commit to KeyLogicLCA/ElectricityLCI that referenced this issue Dec 22, 2023
dt-woods added a commit to KeyLogicLCA/ElectricityLCI that referenced this issue Dec 22, 2023
@dt-woods
Copy link
Collaborator Author

Added new checks for bulk data vintage to trigger a new download with the latest data.

dt-woods added a commit to KeyLogicLCA/ElectricityLCI that referenced this issue Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants