Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An example of GOEA with Uniprots #251

Open
Maryam-Haghani opened this issue Nov 24, 2022 · 6 comments
Open

An example of GOEA with Uniprots #251

Maryam-Haghani opened this issue Nov 24, 2022 · 6 comments

Comments

@Maryam-Haghani
Copy link

Maryam-Haghani commented Nov 24, 2022

Hi,
I have noticed in the paper that we can use GPAD format from European Bioinformatics Institute’s FTP site for Gene Ontology Enrichment Analysis instead of NCBI's GAF format:

"The annotations are currently available for download from the GOC as GO Annotation Format (GAF), from NCBI’s FTP server in a gene2go format, or from the European Bioinformatics Institute’s FTP site in the Gene Product Association Data format (GPAD). GOATOOLS can efficiently parse these relevant file formats to retrieve rich attributes of each term..."

Based on this, as I have my data in Uniprot format, I am trying to work with the GPAD format to calculate over-represented GOs based on UniprotIDs. But, unfortunately, could not find any examples of that, the way it exists for NCBI gene data.
Having a look at the python files in the repository, I came up to the code below which gives an AssertionError while reading the associations at GpadReader(fin_gpad).
Could you please help me to resolve this issue?

Here is my code:

from goatools.base import download_go_basic_obo
from goatools.anno.dnld_ebi_goa import DnldGoa
from goatools.base import get_godag
from goatools.anno.gpad_reader import GpadReader

obo_fname = download_go_basic_obo()

objdnld = DnldGoa()
fin_gpad = objdnld.dnld_goa('human', 'gpa', None)

godag = get_godag()
objgpad = GpadReader(fin_gpad)

P.S.: For the last line, I also tried objgpad = GpadReader(fin_gpad, godag=godag), but nothing changed.

@dvklopfenstein
Copy link
Collaborator

Thank you for your interest in GOA Tools and for giving your time to write this issue.

Download the annotations before reading using wget:

$ wget http://geneontology.org/ontology/go-basic.obo
$ wget http://current.geneontology.org/annotations/goa_human.gpad.gz
$ gunzip goa_human.gpad.gz

Then read the annotations like this:

from goatools.base import get_godag
from goatools.anno.gpad_reader import GpadReader

godag = get_godag('go-basic.obo')
anno = GpadReader('goa_human.gpad', godag=godag)

I added a new notebook demonstrating how to read annotations from a GPAD file here

@Maryam-Haghani
Copy link
Author

Thank you for your helpful answer @dvklopfenstein.
As far as I see, you have downloaded the gpad file in your notebook from GO website that is gpa-version: 1.2 and using this version resolves my issue.
But problem with my first code using goatools.anno.dnld_ebi_goa.py file is that it downloads the gpa file from ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human.gpa.gz, with gpa-version: 1.1. As error details says, GpadReader cannot find some evidence code of this file in eco2group dictionary, and hence it raises an error.
I think to get rid of the error, we have to change goatools.anno.dnld_ebi_goa.py file to use version 1.2 of the gpad file.

@dvklopfenstein
Copy link
Collaborator

Thank you very much for the error detail and the name and location of the annotation file. This is extremely helpful.

Likely, we need to update the ECO codes. This should go quickly. I'll have an update ASAP.

Thank you for your time to report the error on the ECO code and thank you for your interest in GOA Tools.

@Maryam-Haghani
Copy link
Author

You are welcome!
BTW, I used evidenceontology to have the updated ECO codes and have written a code to convert this file to a dictionary automatically.
This code can be used in eco2group.py to have the ECO2GRP dictionary automatically rather than hard-coding.

Here is the code:

import pandas as pd

#Use eco file existing in evidenceontology
URL = "https://raw.githubusercontent.com/evidenceontology/evidenceontology/master/gaf-eco-mapping-derived.txt"

# Read file in df format
df_eco = pd.read_csv(URL, comment='#', sep="\t", header = None)

ECO2GRP = dict(df_eco[[0,1]].values)

@dvklopfenstein
Copy link
Collaborator

Very nice.

Thank you so much for the additional information regarding downloading the ECO codes to be used by GOA Tools.

Your idea works well for GOA Tools as our philosophy is to decouple from storing other group's databases, which benefits researchers by giving them the power to use the latest data or to cite a specific data version, which is needed when publishing.

Thank you so much. I will be sure to credit you as a contributor on the next pull request which implements the change you suggest.

Thank you for your time and for your interest in GOA Tools.

@Maryam-Haghani
Copy link
Author

And thanks for your valuable work.

dvklopfenstein added a commit that referenced this issue Dec 12, 2022
dvklopfenstein added a commit that referenced this issue Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants