-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enrichr: Local mode of GO/enrichment analysis does not provide Odds Ratios in the results #132
Comments
Do you mind share me the code you've used ? @sreichl |
Hi @zqfang, Sure, I can share the code I use now for determining odds ratio manually based on the following variables and results I get from GSEApy:
The last two parameters can be extracted directly from the GSEApy results, column "Overlap". def odds_ratio_calc(bg_n, gene_list_n, gene_set_n, overlap_n):
import scipy.stats as stats
# make contingency table
table=np.array([[gene_set_n, bg_n-gene_set_n],[overlap_n, gene_list_n-overlap_n]])
# perform Fisher's exact test
oddsratio, pvalue = stats.fisher_exact(table)
# return (inverse) oddsratio
return (1/oddsratio) You could also construct the contingency table differently to then directly get the odds ratio you are looking for, without the 1/x step at the end. I hope this helps! Cheers, S |
Sorry for rely late. Just back to work now. This feature could be implemented. The Odds ratio is new thing since the Enrichr Sever updated. |
thanks for implementing! |
Hi, Therefore I looked into the committed code and I think the formula you are applying for determining the odds ratio is not correct. instead of this expect_count = k*m/bg
oddr= x / expect_count you should use this oddr= (x*(bg-m))/(m*(k-x)) if I am not mistaken. I hope this helps. |
Great. Thanks for the correction. I thought odd ratio was ( experiment_hits / expected_hits ). |
Hi, no worries and thanks. As per wikipedia: "An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B." Or in this recommended paper: "The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure." Exposures in our case are the gene lists (eg from a differential analysis) and Outcomes are the gene sets in which we are testing enrichment (eg GO terms) or the other way around. Cheers, S |
Thanks for the clarification. I'm really appreciate it. |
The added code causes divion-by-zero crash when x==k in the expression |
Hi @yossi-liron, great catch that is true! I think this only occurs if the query gene list completely overlaps with the category gene list (eg GO Term). The pragmatic solution is to add 0.5 to every cell of the contingency table to avoid divisions by zero, called Haldane-Anscombe correction. This would mean in the case of k==x oddr= ((x+0.5)*(bg-m+0.5))/((m+0.5)*(k-x+0.5)) What do you think? Cheers, S |
Two alternatives : returning an 'inf' value or the suggested correction formulat.
..And if the caller insists on rejecting the oddr , she still has the information that x==k from the overlap value |
Hi, Sorry for replying to this closed issue. I just had a question about the implementation based on this discussion. Why did @zqfang adopt the formula where: x = len(query.intersection(category)) # = a
bg = len(background) # = a + b + c + d
m = len(category) # = a + b
k = len(query) # = a + c In the above equation, 0.5 is omitted for Haldane-Anscombe correction for simplicity. It would be helpful if you could point out my misunderstanding. |
Hi @136s , If the formula is indeed wrong and instead of 1.0*... it is always 1.0+..., then the two formulas are equivalent, right? |
Hi, @sreichl , Since there seems to be no replies on I read "Definition in terms of joint and conditional probabilities" section on wikipedia thought the following: Where |
Hi @136s , I think the correct formula given our variables should in python w/ Haldane-Anscombe correction oddr = ((x + 0.5) * (bg - m -k + x + 0.5)) / ((m - x + 0.5) * (k - x + 0.5) What do you think @136s @zqfang @yossi-liron? |
Hi, thanks for this package!
This is more of a feature request than a bug report. I think the odds ratio is quite an important value to interpret enrichment analysis results.
Setup
I am reporting a problem with GSEApy version, Python version, and operating
system as follows:
3.9.5 (default, Jun 4 2021, 12:28:51)
[GCC 7.5.0]
CPython
Linux-3.10.0-1062.18.1.el7.x86_64-x86_64-with-glibc2.17
0.10.5
Expected behaviour
When using the Enrichr functionality of GSEApy
gseapy.enrichr()
in local mode, to be able to provide a custom background gene set, the resulting data frame contains the same columns (including odds ratio) as in the vanilla mode (direct query of enrichr).Actual behaviour
When using the Enrichr functionality of GSEApy
gseapy.enrichr()
in local mode, to be able to provide a custom background gene set, the resulting data frame does not contain Odds Ratios, although the vanilla mode returns Odds Ratios from enrichr.Steps to reproduce
It is already apparent in the respective example in the docs.
The text was updated successfully, but these errors were encountered: