Consider evidence type in GOEA #119

jeffsmith8 · 2019-01-30T22:11:49Z

Hi
Apologies if I've missed it somewhere- but does goatools also have a facility for limiting associations to specific evidence types for pval testing?

I'm also interested in filtering by qualifier as raised earlier, though I've noticed that the gene2go file is nearly wholly empty of qualifier statements (at least for human, ergo I assume for all).

dvklopfenstein · 2019-01-31T21:44:28Z

Hello,

Thank you for your interest in GOATOOLS and for taking the time to write us.

Yes. GOATOOLS does have a facility for limiting associations to specific evidence types for pval testing. Are you running from your own script or from our scripts/find_enrichment.py script?

If you are running from your own scripts, use the keyword argument, evidence_set, when calling the function read_ncbi_gene2go. The value for evidence_set should be a set containing the evidence codes that you would like.

If you are running from our find_enrichment script, we need to add a command-line-argument to pass this information to the GOEA.

Here is the plan to do to address your issue:

Give the user the ability to specify the evidence codes to the find_enrichment script.
Add an example in the Jupyter notebooks showing how to do it from inside your own scripts.
Add evidence code tests.

Thank you again for your interest in GOATOOLS.

…le base used in the GafReader. Will be used to address these two issues: #119 #127

dvklopfenstein · 2019-04-16T23:25:42Z

Hello @jeffsmith8 , Thank you for your excellent request. It will surely be a popular feature. I am working on it now.

To better inform the tests that need to be written; How are you doing your enrichment analyses? Are you running the script, find_enrichment.py? Or are you writing your own scripts?

jeffsmith8 · 2019-04-17T05:24:08Z

Hey there

I haven't really progressed with this because I've been occupied with a lot of lab work and some more urgent analysis. Nevertheless it's still a feature I would utilise in the future because my usual practice is to only review GO Associations that fall in the experimental evidence groups as below (comment snippet from one of my own GO filtering functions):

Evidence Codes:
1: ['GO_ExpEvidence','EXP','IDA','IPI','IMP','IGI','IEP']
2: ['GO_HTPEvidence','HTP','HDA','HMP','HGI','HEP']
3: ['GO_CompEvidence','ISS','ISO','ISA','ISM','IGC','IBA','IBD','IKR','IRD','RCA']
4: ['GO_AuthEvidence,''TAS','NAS']
5: ['GO_CurEvidence','IC','ND']
6: ['GO_eEvidence','IEA']

Not sure if that helps?

…com//issues/119

#119

…d by one base class #119 #127

#119

dvklopfenstein · 2019-05-02T00:06:31Z

Thank you for taking the time to write us and for your interest in GOATOOLS. It was a great request. The information you provided in the last issue post was extremely helpful in determining the user-interface for the extended functionality.

I have standardized the way evidence codes can be used across all annotation formats (GPAD, GAF, NCBI's and gene2go).

To get a list of the Evidence codes, do:

$ python3 scripts/find_enrichment.py --ev_help_short

EVIDENCE GROUP AND CODES:
    Experimental       : EXP IDA IPI IMP IGI IEP
    Similarity         : ISS ISO ISA ISM IGC IBA IBD IKR IRD IMR
    Combinatorial      : RCA
    High_Throughput    : HTP HDA HMP HGI HEP
    Author             : TAS NAS
    Curatorial         : IC
    No biological data : ND
    Automatic          : IEA

To get a more detailed list of evidence codes containing their descriptions do:

python3 scripts/find_enrichment.py --ev_help

If you only wanted to use annotations returned from experimental evidence groups when using the find_enrichment.py script, you would use the include evidence argument, ev_inc:

--ev_inc=Experimental

This is the same as listing all experimental codes individually:

--ev_inc=EXP,IDA,IPI,IMP,IGI,IEP

You can also EXCLUDE evidence codes. If you want to use all evidence codes except the ones inferred from Electronic Annotation, use this argument:

--ev_exc=IEA

Please give it a try and let us know what you think. Thank you again for your taking the time to contact us.

cross12tamu · 2019-05-02T00:08:47Z

This is awesome! Great work!

cross12tamu · 2019-05-02T00:12:02Z

I am curious if there is plans to incorporate ECO codes for evidence type? I'm wondering if y'all would want a mapper or something, since I think the GOC is switching to ECO for gpad/gaf annotation development.

For example,

'IDA' == "ECO:0000314",
'IMP'== "ECO:0000315",

etc...etc...

dvklopfenstein · 2019-05-02T00:18:49Z

Thank you! @cross12tamu! It was a lot of work because we unified handling of all the different annotation formats. Each format had been added at different times from different requests. Now they are all derived off of one base class.

We added a mapper to map ECO IDs to the evidence code letters, which was needed to keep support across all annotation formats consistent.

The mapper downloads this file:

https://raw.githubusercontent.com/evidenceontology/evidenceontology/master/gaf-eco-mapping-derived.txt

And then creates this Python module in GOATOOLS:

goatools/anno/eco2group.py

ECO2GRP = {
    'ECO:0000030': 'ISA',
    'ECO:0000031': 'ISA',
    'ECO:0000032': 'ISA',
    'ECO:0000053': 'IEA',
    'ECO:0000209': 'IEA',
    'ECO:0000210': 'IEA',
...

cross12tamu · 2019-05-02T00:22:25Z

awesome! 👍

dvklopfenstein · 2019-05-02T00:24:08Z

@cross12tamu : That is a great idea to add ECO IDs too.

I will not be able to do it right now due to work on other high priority issues.

If you want to take a crack at it, go ahead. Please before submitting, write a test(s) that covers your new functionality and be sure to run:

$ make clobber
$ make pytest

Only GPAD supports the ECO IDs, so if a user specifies an ECO ID and is not using the GPAD format, you should ignore the user's ECO ID and write a message telling them that the other formats do not contain ECO IDs and so the user ECO IDs will be ignored.

jeffsmith8 · 2019-05-02T03:04:47Z

Wonderful! I'm flattered you found my suggestion useful, and grateful that it has even been added as a feature! I look forward to using this tool, and am particularly pleased you have added a facility for custom evidence code combinations. Great work :)

This is a bit of a segway though I'll add it in case it is already a GOATOOLS feature or deemed useful...

One other filter I like to include is the QUALIFIER so that searches can also be limited by biological process, molecular function or cellular compartment.

I generally find this useful because much of my work is in proteomics where our experimental techniques are usually selected to target these categories- i.e. affinity proteomics targets molecular function, cell fractionation targets cellular compartment. As such I often wonder whether p-val testing in GO enrichment offers more power when it is similarly matched to a chosen technique or hypothesis being tested (at present I limit myself to a descriptive approach rather than a statistical one). Also seems like a worthwhile research question if it hasn't been tackled before.

dvklopfenstein · 2019-05-06T22:54:48Z

One more final note: You can combine including evidence codes and evidence groups with excluding evidence codes.

For example,

--ev_inc=Experimental,Similarity
--ev_exc=IEP,IMR

Results in these codes being used:

    Experimental       : EXP IDA IPI IMP IGI
    Similarity         : ISS ISO ISA ISM IGC IBA IBD IKR IRD

For reference:

$ python3 scripts/find_enrichment.py --ev_help_short

EVIDENCE GROUP AND CODES:
    Experimental       : EXP IDA IPI IMP IGI IEP
    Similarity         : ISS ISO ISA ISM IGC IBA IBD IKR IRD IMR
    Combinatorial      : RCA
    High_Throughput    : HTP HDA HMP HGI HEP
    Author             : TAS NAS
    Curatorial         : IC
    No biological data : ND
    Automatic          : IEA

dvklopfenstein · 2019-05-06T23:08:21Z

@jeffsmith8 , Thank you very much for the detailed description of your usage model and how it would benefit to be able to specify which namespace to use in a GOEA run.

I just added this for #127

To specify running only biological process from the command line, use the --ns option:

Namespace examples:
--ns=BP
--ns=BP,MF
--ns=CC

Where the namespace abbreviations are BP, MF, and CC:

NS	Namespace
BP	Biological Process
MF	Molecular Function
CC	Cellular Component

So an example of a full command-line call to run a GOEA on just the molecular function (--ev=MF) branch is:

python3 scripts/find_enrichment.py ids_stu_gene2go_9606.txt ids_pop_gene2go_9606.txt gene2go --pval=0.05 --method=fdr_bh --pval_field=fdr_bh --outfile=results_gene2go_9606.xlsx --ns=MF

You can combine this with --ev_inc and --ev_exc to include and exclude evidence codes.

dvklopfenstein · 2019-05-08T16:07:04Z

@tanghaibao , Can you upload a new version to PyPI and Bioconda? The latest updates for the user-experience:

Add option to run GOEAs with user-specified evidence codes or evidence code groups
Add Evidence Code help to inform user of Evidence groups and codes, with descriptions
Add option to run a GOEA only one branch

The internal changes are:

Add one base class that all annotation format readers derive from
GOEAs are run on each branch

These changes will close
#127 and
#119

And then the next high priority issues that will be addressed are:
#126 and
#117

Thank you @BrianLohman, @jeffsmith8, and @cross12tamu for your ideas and requests, and taking the time to convey them to us. Your detailed descriptions of your usage case help us to build a better user experience for the new functionality. Thank you so much.

Thank you @dgpinheiro and @risserlin for opening the issues concerning relationships. This will be the next task to tackle.

tanghaibao · 2019-05-09T01:51:36Z

@dvklopfenstein

Version tag updated and uploaded to PyPI.

(hijacking this message thread ...) Somehow the recent travis CI tests has been failing often due to network timeout (see: https://travis-ci.org/tanghaibao/goatools/jobs/529067076). Not sure what is the nature of the problem here. @dvklopfenstein would you mind taking a look, and/or disable the tests that failed on the travis server?

Thanks,
Haibao

dvklopfenstein · 2019-05-09T13:47:26Z

@tanghaibao,
Thank you for the version in PyPI.

Regarding TravisCI: Before committing, to ensure a good build, I always run all of the tests using this:

$ make clobber
$ make pytest

Only if these tests all pass, I commit. Then it gets to TravisCI ...

You are correct that the TravisCI failures have all been timeouts lately; The Travis config, No language set always passes. The configs, Python 2.7 and Python 3.6, often fail with timeouts. I always check the failures and ensure that all are timeout failures.

The timeouts are occurring when the tests download the annotation files. In an attempt to fix, I placed the download tests first. The tests that follow should not need to download after the first tests. But TravisCI seems to continue to try downloading the files, so perhaps the tests are split into individual jobs and distributed across different machines where there are no previously downloaded annotation files.

Many people have had issues with TravisCI timeouts and there appears to be no satisfying resolution (travis-ci/travis-ci#9587).

I am taking a look at this...

dvklopfenstein · 2019-05-09T15:25:14Z

The TravisCI tests are now passing.

Here is the TravisCI resolution for test failures due to timeouts on downloading the annotation files:

TravisCI will run all tests on Python versions, 2.7, 3.6, and 3.7, using generic language, osx, and install python by hand rather than using the built virtual environments for linux builds as the built environments have the annotation file download timeouts.

tanghaibao · 2019-05-09T18:51:12Z

@dvklopfenstein

Good to know the tests are now passing. Thanks.

dvklopfenstein added a commit that referenced this issue Jan 31, 2019

Check evidence codes seen in NCBI's gene2go for issue #119

53764ae

dvklopfenstein added the High Priority label Mar 7, 2019

dvklopfenstein mentioned this issue Apr 11, 2019

Enrichment in single GO catetory/scope of multiple test correction #127

Closed

dvklopfenstein added a commit that referenced this issue Apr 12, 2019

Added a new NCBI gene2go association reader which mimics the namedtup…

3cf2fb8

…le base used in the GafReader. Will be used to address these two issues: #119 #127

dvklopfenstein added a commit that referenced this issue Apr 21, 2019

Include or exclude sets or individual evidence codes\nhttps://github.…

9d29a00

…com//issues/119

dvklopfenstein added a commit that referenced this issue Apr 21, 2019

GPAD has ECO IDs, but not group. Add evidence code group dict

1b9ccc8

#119

dvklopfenstein added a commit that referenced this issue Apr 21, 2019

Add annotation derived class, id2gos, so all anno files may be manage…

8ae9fd1

…d by one base class #119 #127

dvklopfenstein added a commit that referenced this issue Apr 27, 2019

Add code to include/exclude annotations by evidence code.

c840d13

#119

dvklopfenstein closed this as completed May 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider evidence type in GOEA #119

Consider evidence type in GOEA #119

jeffsmith8 commented Jan 30, 2019

dvklopfenstein commented Jan 31, 2019

dvklopfenstein commented Apr 16, 2019

jeffsmith8 commented Apr 17, 2019

dvklopfenstein commented May 2, 2019 •

edited

Loading

cross12tamu commented May 2, 2019

cross12tamu commented May 2, 2019 •

edited

Loading

dvklopfenstein commented May 2, 2019 •

edited

Loading

cross12tamu commented May 2, 2019

dvklopfenstein commented May 2, 2019 •

edited

Loading

jeffsmith8 commented May 2, 2019 •

edited

Loading

dvklopfenstein commented May 6, 2019

dvklopfenstein commented May 6, 2019

dvklopfenstein commented May 8, 2019 •

edited

Loading

tanghaibao commented May 9, 2019

dvklopfenstein commented May 9, 2019 •

edited

Loading

dvklopfenstein commented May 9, 2019

tanghaibao commented May 9, 2019

Consider evidence type in GOEA #119

Consider evidence type in GOEA #119

Comments

jeffsmith8 commented Jan 30, 2019

dvklopfenstein commented Jan 31, 2019

dvklopfenstein commented Apr 16, 2019

jeffsmith8 commented Apr 17, 2019

dvklopfenstein commented May 2, 2019 • edited Loading

cross12tamu commented May 2, 2019

cross12tamu commented May 2, 2019 • edited Loading

dvklopfenstein commented May 2, 2019 • edited Loading

cross12tamu commented May 2, 2019

dvklopfenstein commented May 2, 2019 • edited Loading

jeffsmith8 commented May 2, 2019 • edited Loading

dvklopfenstein commented May 6, 2019

dvklopfenstein commented May 6, 2019

dvklopfenstein commented May 8, 2019 • edited Loading

tanghaibao commented May 9, 2019

dvklopfenstein commented May 9, 2019 • edited Loading

dvklopfenstein commented May 9, 2019

tanghaibao commented May 9, 2019

dvklopfenstein commented May 2, 2019 •

edited

Loading

cross12tamu commented May 2, 2019 •

edited

Loading

dvklopfenstein commented May 2, 2019 •

edited

Loading

dvklopfenstein commented May 2, 2019 •

edited

Loading

jeffsmith8 commented May 2, 2019 •

edited

Loading

dvklopfenstein commented May 8, 2019 •

edited

Loading

dvklopfenstein commented May 9, 2019 •

edited

Loading