Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored GenomicAnnotation to reduce memory usage #395

Merged
merged 5 commits into from
Feb 19, 2022

Conversation

zhuchcn
Copy link
Member

@zhuchcn zhuchcn commented Feb 16, 2022

The GenomicAnnoation class was refactored slightly. When loading data from GTF, only useful annotation information is kept. For all the features of the same transcript, their IDs are now using the same object by reference. The memory usage for annotation itself is about 6.5 GB when reading directly from GTF and 10.5 GB when loading from the pickled object. Issue #394 is still open.

@lydiayliu
Copy link
Collaborator

The memory usage for annotation itself is about 6.5 GB when reading directly from GTF and 10.5 GB when loading from the pickled object.

Oh I missed this issue. Why would memory usage be double when reading from the pickled object? Isn't the pickled object just the loaded annotation saved?

@zhuchcn
Copy link
Member Author

zhuchcn commented Feb 16, 2022

I was looking at this whole morning but still don't have a good conclusion. And this seems to only happen to the annotation file but not the genome fasta. Reading directly from FASTA uses just as much memory as loading from the pickled one.

@lydiayliu
Copy link
Collaborator

Wow interesting. Maybe a very specific data structure is not agreeing with pickling...

@zhuchcn
Copy link
Member Author

zhuchcn commented Feb 18, 2022

Quoting from the PR that I accidentally opened for no reason..

I now save the genomic annotation in a compressed text format instead of pickle to solve #394 . The other objects are fine with pickle. This takes slightly longer time to load, but not a lot. The memory usage is now about 18 GB after loading everything, with 3.5 GB before launching. Still needs to be improved. Hopefully in the future.

I think we can close #394 for now. We should further optimize the memory usage but not the top priority for now.

@zhuchcn zhuchcn linked an issue Feb 18, 2022 that may be closed by this pull request
@lydiayliu
Copy link
Collaborator

sorry i forgot about this. no thoughts whatsoever XD sounds like work

@lydiayliu lydiayliu merged commit a42e0a3 into main Feb 19, 2022
@lydiayliu lydiayliu deleted the czhu-fix-genomic-annotation branch February 22, 2022 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large memory usage of pickle
2 participants