Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information Content varies with version #156

Closed
huhrichard opened this issue Mar 3, 2020 · 2 comments
Closed

Information Content varies with version #156

huhrichard opened this issue Mar 3, 2020 · 2 comments

Comments

@huhrichard
Copy link

Previously I used goatools version 0.99, where all information content from a list of GO term is very low (nearly all < 2), but now I updated to version 1.02, now the IC is like 2-4, Is there any reason? I used the same script and same obo file to see the IC.

Thanks

@huhrichard huhrichard changed the title Information Content vary with version Information Content varies with version Mar 3, 2020
@dvklopfenstein
Copy link
Collaborator

Hello @huhrichard,

Thank you for your interest in GOATOOLS and taking the time to write to us.

Yes. You are correct. The information content values have changed.

The change is in the calculation of the "aspect counts" (total counts). Our original code incorrectly calculated the aspect counts by counting the same thing multiple times, resulting in large aspect values. This error caused the information content to be scaled to be lower than it should be.

Upon adding more tests which are now comparing our calculations to those found in other open-source code, we found and fixed the error.

Our new semantic similarity test DAGs and annotations are found here:
https://github.com/tanghaibao/goatools/blob/master/tests/data/yangRWC/README.md

We are now comparing our results to those described by Yang[1] and implemented in a
high-quality Java implementation described by first authors, Alfonso E Romero and Horacio Caniza[2].

[1] Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty
Yang, Haixuan et al. Bioinformatics (2012)

[2] GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology
Caniza, H. et al. Bioinformatics (2014)

Thank you again for being alert and asking about the changes. Thank for using GOATOOLS.

@dvklopfenstein
Copy link
Collaborator

dvklopfenstein commented Mar 6, 2020

FYI: This fix was implemented with this hash on goatools/semantic.py
839ad71
by @dvklopfenstein on 2019-08-27

Here is the diff: 839ad71#diff-688e6e2aa684f60dd85887a80cf6c258

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants