-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider evidence type in GOEA #119
Comments
Hello, Thank you for your interest in GOATOOLS and for taking the time to write us. Yes. GOATOOLS does have a facility for limiting associations to specific evidence types for pval testing. Are you running from your own script or from our scripts/find_enrichment.py script? If you are running from your own scripts, use the keyword argument, evidence_set, when calling the function read_ncbi_gene2go. The value for evidence_set should be a set containing the evidence codes that you would like. If you are running from our find_enrichment script, we need to add a command-line-argument to pass this information to the GOEA. Here is the plan to do to address your issue:
Thank you again for your interest in GOATOOLS. |
Hello @jeffsmith8 , Thank you for your excellent request. It will surely be a popular feature. I am working on it now. To better inform the tests that need to be written; How are you doing your enrichment analyses? Are you running the script, find_enrichment.py? Or are you writing your own scripts? |
Hey there I haven't really progressed with this because I've been occupied with a lot of lab work and some more urgent analysis. Nevertheless it's still a feature I would utilise in the future because my usual practice is to only review GO Associations that fall in the experimental evidence groups as below (comment snippet from one of my own GO filtering functions):
Not sure if that helps? |
Thank you for taking the time to write us and for your interest in GOATOOLS. It was a great request. The information you provided in the last issue post was extremely helpful in determining the user-interface for the extended functionality. I have standardized the way evidence codes can be used across all annotation formats (GPAD, GAF, NCBI's and gene2go). To get a list of the Evidence codes, do:
To get a more detailed list of evidence codes containing their descriptions do:
If you only wanted to use annotations returned from experimental evidence groups when using the find_enrichment.py script, you would use the include evidence argument, ev_inc:
This is the same as listing all experimental codes individually:
You can also EXCLUDE evidence codes. If you want to use all evidence codes except the ones inferred from Electronic Annotation, use this argument:
Please give it a try and let us know what you think. Thank you again for your taking the time to contact us. |
This is awesome! Great work! |
I am curious if there is plans to incorporate ECO codes for evidence type? I'm wondering if y'all would want a mapper or something, since I think the GOC is switching to ECO for gpad/gaf annotation development. For example,
etc...etc... |
Thank you! @cross12tamu! It was a lot of work because we unified handling of all the different annotation formats. Each format had been added at different times from different requests. Now they are all derived off of one base class. We added a mapper to map ECO IDs to the evidence code letters, which was needed to keep support across all annotation formats consistent. The mapper downloads this file: And then creates this Python module in GOATOOLS:
|
awesome! 👍 |
@cross12tamu : That is a great idea to add ECO IDs too. I will not be able to do it right now due to work on other high priority issues. If you want to take a crack at it, go ahead. Please before submitting, write a test(s) that covers your new functionality and be sure to run:
Only GPAD supports the ECO IDs, so if a user specifies an ECO ID and is not using the GPAD format, you should ignore the user's ECO ID and write a message telling them that the other formats do not contain ECO IDs and so the user ECO IDs will be ignored. |
Wonderful! I'm flattered you found my suggestion useful, and grateful that it has even been added as a feature! I look forward to using this tool, and am particularly pleased you have added a facility for custom evidence code combinations. Great work :) This is a bit of a segway though I'll add it in case it is already a GOATOOLS feature or deemed useful... One other filter I like to include is the QUALIFIER so that searches can also be limited by biological process, molecular function or cellular compartment. I generally find this useful because much of my work is in proteomics where our experimental techniques are usually selected to target these categories- i.e. affinity proteomics targets molecular function, cell fractionation targets cellular compartment. As such I often wonder whether p-val testing in GO enrichment offers more power when it is similarly matched to a chosen technique or hypothesis being tested (at present I limit myself to a descriptive approach rather than a statistical one). Also seems like a worthwhile research question if it hasn't been tackled before. |
One more final note: You can combine including evidence codes and evidence groups with excluding evidence codes. For example,
Results in these codes being used:
For reference:
|
@jeffsmith8 , Thank you very much for the detailed description of your usage model and how it would benefit to be able to specify which namespace to use in a GOEA run. I just added this for #127 To specify running only biological process from the command line, use the --ns option:
Where the namespace abbreviations are BP, MF, and CC:
So an example of a full command-line call to run a GOEA on just the molecular function (--ev=MF) branch is:
You can combine this with --ev_inc and --ev_exc to include and exclude evidence codes. |
@tanghaibao , Can you upload a new version to PyPI and Bioconda? The latest updates for the user-experience:
The internal changes are:
These changes will close And then the next high priority issues that will be addressed are: Thank you @BrianLohman, @jeffsmith8, and @cross12tamu for your ideas and requests, and taking the time to convey them to us. Your detailed descriptions of your usage case help us to build a better user experience for the new functionality. Thank you so much. Thank you @dgpinheiro and @risserlin for opening the issues concerning relationships. This will be the next task to tackle. |
Version tag updated and uploaded to PyPI. (hijacking this message thread ...) Somehow the recent travis CI tests has been failing often due to network timeout (see: https://travis-ci.org/tanghaibao/goatools/jobs/529067076). Not sure what is the nature of the problem here. @dvklopfenstein would you mind taking a look, and/or disable the tests that failed on the travis server? Thanks, |
@tanghaibao, Regarding TravisCI: Before committing, to ensure a good build, I always run all of the tests using this:
Only if these tests all pass, I commit. Then it gets to TravisCI ... You are correct that the TravisCI failures have all been timeouts lately; The Travis config, No language set always passes. The configs, Python 2.7 and Python 3.6, often fail with timeouts. I always check the failures and ensure that all are timeout failures. The timeouts are occurring when the tests download the annotation files. In an attempt to fix, I placed the download tests first. The tests that follow should not need to download after the first tests. But TravisCI seems to continue to try downloading the files, so perhaps the tests are split into individual jobs and distributed across different machines where there are no previously downloaded annotation files. Many people have had issues with TravisCI timeouts and there appears to be no satisfying resolution (travis-ci/travis-ci#9587). I am taking a look at this... |
The TravisCI tests are now passing. Here is the TravisCI resolution for test failures due to timeouts on downloading the annotation files: TravisCI will run all tests on Python versions, 2.7, 3.6, and 3.7, using generic language, osx, and install python by hand rather than using the built virtual environments for linux builds as the built environments have the annotation file download timeouts. |
Good to know the tests are now passing. Thanks. |
Hi
Apologies if I've missed it somewhere- but does goatools also have a facility for limiting associations to specific evidence types for pval testing?
I'm also interested in filtering by qualifier as raised earlier, though I've noticed that the gene2go file is nearly wholly empty of qualifier statements (at least for human, ergo I assume for all).
The text was updated successfully, but these errors were encountered: