Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #142

Closed
Floreuzan opened this issue Nov 18, 2021 · 3 comments
Closed

Memory leak #142

Floreuzan opened this issue Nov 18, 2021 · 3 comments

Comments

@Floreuzan
Copy link

Floreuzan commented Nov 18, 2021

Setup

I am reporting a problem with GSEApy version, Python version, and operating
system as follows:

import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import gseapy; print(gseapy.__version__)

3.7.11 (default, Jul 27 2021, 09:42:29) [MSC v.1916 64 bit (AMD64)]
CPython
Windows-10-10.0.19041-SP0
0.10.5

Expected behaviour

I want to run the gp.prerank() function.

        pre_res = gp.prerank(rnk=rnk_file, gene_sets=geneset_file,
                    processes=4,
                    permutation_num=100, # reduce number to speed up testing
                    outdir= 'rnk_output/' + rnk_file+ '_' + geneset_file, 
                    format='png', 
                    seed=6,
                    no_plot=True)

Actual behaviour

I choose the C2 geneset from the MSigDB website, it has approximatively 6300 genesets.
Even though I call the function on a system with ~80 GB of RAM with swap space, it appears to be a memory leak because using swap space does not slow the calculation down -it's not going back to the memory it has used previously.

Attempted fix

To solve this issue, I modified the fie GSEApy/gseapy/algorithm.py, in the function gsea_compute(), it calls for the function Parallel() from the joblib package, then you can rermove the require=’sharedmen’ option (line 509).

In other words,

        temp_esnu = Parallel(n_jobs=processes, require='sharedmem')(delayed(enrichment_score)( 
                        gl, cor_vec, gmt.get(subset), w, n, 
                        rs, single, scale) 
                        for subset, rs in zip(subsets, random_seeds)) 

becomes:

        temp_esnu = Parallel(n_jobs=processes)(delayed(enrichment_score)( 
                        gl, cor_vec, gmt.get(subset), w, n, 
                        rs, single, scale) 
                        for subset, rs in zip(subsets, random_seeds)) 
zqfang pushed a commit that referenced this issue Dec 7, 2021
@zqfang
Copy link
Owner

zqfang commented Dec 7, 2021

Thanks @Floreuzan. I also need a more clean code that cost fewer memroy. But it seems require a little bit effort to refactor the code.

@zqfang zqfang closed this as completed Jun 16, 2022
@zqfang
Copy link
Owner

zqfang commented Jul 20, 2022

any up comming release of GSEApy which re-written in Rust will fix the problem here !!! Stay tune

@zqfang zqfang reopened this Jul 20, 2022
@zqfang
Copy link
Owner

zqfang commented Aug 3, 2022

The Rust binding of GSEApy (v0.11.0) has been released. Close the issue now

@zqfang zqfang closed this as completed Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants