Refactored GenomicAnnotation to reduce memory usage #395

zhuchcn · 2022-02-16T19:10:11Z

The GenomicAnnoation class was refactored slightly. When loading data from GTF, only useful annotation information is kept. For all the features of the same transcript, their IDs are now using the same object by reference. The memory usage for annotation itself is about 6.5 GB when reading directly from GTF and 10.5 GB when loading from the pickled object. Issue #394 is still open.

…notation in memory

lydiayliu · 2022-02-16T20:06:00Z

The memory usage for annotation itself is about 6.5 GB when reading directly from GTF and 10.5 GB when loading from the pickled object.

Oh I missed this issue. Why would memory usage be double when reading from the pickled object? Isn't the pickled object just the loaded annotation saved?

zhuchcn · 2022-02-16T20:08:10Z

I was looking at this whole morning but still don't have a good conclusion. And this seems to only happen to the annotation file but not the genome fasta. Reading directly from FASTA uses just as much memory as loading from the pickled one.

lydiayliu · 2022-02-16T20:15:41Z

Wow interesting. Maybe a very specific data structure is not agreeing with pickling...

…genomic annotation to solve #394

zhuchcn · 2022-02-18T18:34:55Z

Quoting from the PR that I accidentally opened for no reason..

I now save the genomic annotation in a compressed text format instead of pickle to solve #394 . The other objects are fine with pickle. This takes slightly longer time to load, but not a lot. The memory usage is now about 18 GB after loading everything, with 3.5 GB before launching. Still needs to be improved. Hopefully in the future.

I think we can close #394 for now. We should further optimize the memory usage but not the top priority for now.

lydiayliu · 2022-02-19T03:08:01Z

sorry i forgot about this. no thoughts whatsoever XD sounds like work

fix (gtf): removed unused attributes to reduce the size of genomic an…

9b8ab2c

…notation in memory

zhuchcn requested a review from lydiayliu February 16, 2022 19:10

lydiayliu approved these changes Feb 16, 2022

View reviewed changes

style: fixed bad indentation

757ab90

lydiayliu approved these changes Feb 17, 2022

View reviewed changes

zhuchcn added 3 commits February 18, 2022 10:05

refactor (generageIndex): change to use compressed text file to stoe …

0289bea

…genomic annotation to solve #394

style fixed for pylint

51aeeea

Merge branch 'czhu-fix-generate-index' into czhu-fix-genomic-annotation

b1b2e9a

zhuchcn linked an issue Feb 18, 2022 that may be closed by this pull request

Large memory usage of pickle #394

Closed

zhuchcn requested a review from lydiayliu February 18, 2022 18:36

lydiayliu merged commit a42e0a3 into main Feb 19, 2022

lydiayliu deleted the czhu-fix-genomic-annotation branch February 22, 2022 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored GenomicAnnotation to reduce memory usage #395

Refactored GenomicAnnotation to reduce memory usage #395

zhuchcn commented Feb 16, 2022

lydiayliu commented Feb 16, 2022

zhuchcn commented Feb 16, 2022

lydiayliu commented Feb 16, 2022

zhuchcn commented Feb 18, 2022

lydiayliu commented Feb 19, 2022

Refactored GenomicAnnotation to reduce memory usage #395

Refactored GenomicAnnotation to reduce memory usage #395

Conversation

zhuchcn commented Feb 16, 2022

lydiayliu commented Feb 16, 2022

zhuchcn commented Feb 16, 2022

lydiayliu commented Feb 16, 2022

zhuchcn commented Feb 18, 2022

lydiayliu commented Feb 19, 2022