-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P values of Enrichr are not uniform calculated when switching from online mode to local mode #122
Comments
One thing I notice is that the total numbers of genes in PI3K-Akt signaling pathway term (354 and 352) are different in these two modes. Does that mean the gmt files used by the Enrichr sever are different from those we can download from Enrichr sever? |
Another thought on this issue is that there are total 49 genes with term 'Malaria', but result of the local mode shows only 48 genes there. |
I suspect this is the origin of the issue: Line 69 in 149f3ec
There might be some genes in KEGG_2019_HUMAN not included in the |
For example, KLRC4-KLRK1 gene (it's associated with the term 'Malaria') is not in |
the could set the background gene list by your own. enr2_down_local = gp.enrichr(downregulated_genes.index.to_list(), gene_sets='./data/KEGG_2019_Human.gmt',
organism='Human',
background= ['gene1', 'gene2', 'gene3', ....],
outdir='enrichr_kegg', cutoff=0.5)
enr2_down_local.results.sort_values('P-value').head(20) you could get the updated file from biomart ensembl: http://uswest.ensembl.org/biomart/martview/e666d46813690a3366756dfcbc11466e |
Thanks. I have downloaded it and opened a pull request #123. |
Even with the updated background, the p-values are still different. The number of the genes with "PI3K-Akt signaling pathway" now becomes 351. I am wondering if we should not do an intersection here: Line 69 in 149f3ec
|
@hsiaoyi0504 , sorry, the intersection for the background gene is necessary here. We need to makesure all genes could be found in the background. That's, it's weird(wrong) that you draw a blue ball from the urn which only contains red and white ball. |
Yes, I understand that. However, given the results I get here, some genes in KEGG_2019_HUMAN are not in the human gene list fetched from the Ensembl through Biomart. It's very weird. What I propose is a temporary fix like if background = 'hsapiens_gene_ensembl' or 'mmusculus_gene_ensembl', no need to do the intersection operation. |
@hsiaoyi0504 , If you go deep, you need to curated a background genes annotated just only from KEGG database (for the KEGG_2019_HUMAN.gmt). As far as I know, KEGG covers much less genes than ensembl or NCBI. Why not just input a number for |
That's a good workaround. I didn't think of that. Thanks for the suggestion. |
I'm wondering why you need the local mode instead of using EnrichrAPI. EnrirchR is more powerfull and has more metrics |
As you mentioned in #121, background genes can only be customized in the local mode. |
Setup
I am reporting a problem with GSEApy version, Python version, and operating
system as follows:
Expected behaviour
The p-values should be the same when switching the online mode to local mode.
Actual behaviour
The p-values seem to be different switching the online mode to local mode.
Steps to reproduce
On-line mode:
Local mode (KEGG_2019_Human.gmt file is download from the Enrichr website):
The text was updated successfully, but these errors were encountered: