Copyright (c) 2019 Dario Ghersi, Matt L. Hale, Ishwor Thapa.
The GOUtil suite can be used to perform enrichment analysis, semantic similarity calculations, and other basic analyses using Gene Ontology. The suite should work with other controlled vocabularies or pathway languages (e.g., Reactome).
The program uses the C++ Boost library, which should be installed on your system before attempting to compile the source code. The Boost library can be found at:
To compile the enrichment program with the GCC compiler, run the following command:
g++ -O3 -o enrich enrich.C utilities.C --std=gnu++11
The semantic similarity program can be compiled by running the following command:
g++ -O3 -o funSim funSim.C utilities.C --std=gnu++11
To perform enrichment analysis calculations, run the enrich
program as follows:
## downloading dataset for input to enrich program##
git clone https://github.com/MLHale/GOUtildata.git
./enrich -a GOUtildata/jan-2018/ann.hsa.bp.txt -e GOUtildata/jan-2018/edgeList.bp.txt\
-t GOUtildata/jan-2018/target.txt -b GOUtildata/jan-2018/background.hsa.bp.txt \
-o target_output_2018.txt -p 0.05
The annotation file ann.hsa.bp.txt contains Gene Ontology annotations with the following structure:
GENE1 TERM1 TERM2 ...
GENE2 TERM1 TERM3 TERM6 ...
...
extractAnnotations.py
script can be used to generate this format of annotation file.
It requires GO annotation input file from http://geneontology.org/page/download-go-annotations
edgeList.txt
describes the Gene Ontology graph with a list of
child-parent directed edges and is obtained by running the
buildEdgeList.py
script on a Gene Ontology obo file.
target.txt
and background.txt
contain the target gene list and
the background gene list, respectively.
output.txt
is the name of the file where the results will be stored.
The output contains the following fields:
GO_TERM GO_DEFINITION ADJUSTED_PVALUE ENRICHMENT_SCORE GENES
The suite also contains a program to calculate the semantic similarity between pairs of Gene Ontology terms, using either the Lin or the Resnik index. To run the calculations, use the following command:
./funSim -a GOUtildata/jan-2018/ann.hsa.bp.txt -e GOUtildata/jan-2018/edgeList.bp.txt\
-o output.txt -t INDEX_TYPE
where INDEX_TYPE is one of [Lin, Resnik, AIC]. If the optional -f ENRICH_OUTPUT
is passed
to the program, then funSim
only computes the semantic similarity between terms in the
enrichment output.
A containerized version of the runtime is also provided using docker.
- Install Docker
- Run the following:
docker-compose build
docker-compose run enrich g++ -O3 -o enrich enrich.C utilities.C --std=gnu++11
To run with the default parameters:
docker-compose run enrich
to provide alternate parameters:
docker-compose run enrich ./enrich -a GOUtildata/jan-2018/ann.hsa.bp.txt -e GOUtildata/jan-2018/edgeList.bp.txt\
-t GOUtildata/jan-2018/target.txt -b GOUtildata/jan-2018/background.hsa.bp.txt \
-o target_output_2018.txt -p 0.05