# Report counts of GO terms at various levels and depths

Reports the number of GO terms at each level and depth.   

  * **Level** refers to the length of the shortest path from the top.   
  * **Depth** refers to the length of the longest path from the top.

See the Gene Ontology Consorium's (GOC) advice regarding 
[levels and depths of a GO term](http://geneontology.org/faq/how-can-i-calculate-level-go-term)
    
## GO level and depth reporting

GO terms reported can be all GO terms in an ontology.     
Or subsets of GO terms can be reported.     
GO subset examples include all GO terms annotated for a species or all GO terms in a study.

Example report on full Ontology:

```
go-basic.obo: fmt(1.2) rel(2019-01-12) 47,374 GO Terms

Summary for all Ontologies:
Dep <-Depth Counts->  <-Level Counts->
Lev   BP    MF    CC    BP    MF    CC
--- ----  ----  ----  ----  ----  ----
00     1     1     1     1     1     1
01    29    16    21    29    16    21
02   264   125   345   421   154   746
03  1273   570   494  2205   866  1073
04  2376  1516   735  4825  2072  1359
05  3692  4801   913  7297  5035   697
06  4474  1834   787  7287  1934   230
07  4699  1029   600  4696   728    68
08  4214   508   254  2018   194    10
09  3516   312    51   631    79     1
10  2399   153     4   241    13     0
11  1511   140     1    38    19     0
12   854    42     0     0     0     0
13   303    35     0     0     0     0
14    66    21     0     0     0     0
15    14     7     0     0     0     0
16     4     1     0     0     0     0

```

## 1. Download Ontologies, if necessary

In [1]:
# Get http://geneontology.org/ontology/go-basic.obo
from goatools.base import download_go_basic_obo
obo_fname = download_go_basic_obo()

  EXISTS: go-basic.obo


## 2. Download Associations, if necessary

In [2]:
# Get ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
from goatools.base import download_ncbi_associations
gene2go = download_ncbi_associations()

  EXISTS: gene2go


## 3. Initialize GODag object

In [3]:
from goatools.obo_parser import GODag

obodag = GODag("go-basic.obo")

go-basic.obo: fmt(1.2) rel(2021-05-01) 47,284 GO Terms


## 4. Initialize Reporter class

In [4]:
from goatools.rpt.rpt_lev_depth import RptLevDepth

rptobj = RptLevDepth(obodag)

## 5. Generate depth/level report for all GO terms

In [5]:
rptobj.write_summary_cnts_all()

Dep <-Depth Counts->  <-Level Counts->
Lev   BP    MF    CC    BP    MF    CC
--- ----  ----  ----  ----  ----  ----
00     1     1     1     1     1     1
01    30    30     3    30    30     3
02   278   190   788   430   254   788
03  1201   593   772  2215   977   942
04  2351  1545   933  4841  2050  1098
05  3626  4815   675  6859  4984   623
06  4557  1922   465  6796  1947   386
07  4727  1158   302  4628   704   199
08  4285   519   159  1977   183    83
09  3433   230    59   614    39    43
10  2129    87    18   209     0    10
11  1158    72     1    42     0     0
12   608     7     0     0     0     0
13   203     0     0     0     0     0
14    42     0     0     0     0     0
15     9     0     0     0     0     0
16     4     0     0     0     0     0


## 6. Count of GO terms

In [6]:
all_terms = obodag.values()
all_terms_unique = set(all_terms)
print(f"All terms: {len(all_terms)}, Unique terms: {len(all_terms_unique)}")

All terms: 47284, Unique terms: 43987


In [7]:
MF = len(obodag["GO:0003674"].get_all_children())
CC = len(obodag["GO:0005575"].get_all_children())
BP = len(obodag["GO:0008150"].get_all_children())
total = MF + CC + BP + 3 # Should match "Unique terms" above
print(f"MF: {MF}, CC: {CC}, BP: {BP}, Total terms: {total}")

MF: 11168, CC: 4175, BP: 28641, Total terms: 43987


In [8]:
# Find some examples of duplicates
from collections import Counter

counter = Counter(all_terms)
(most_common, most_common_count), = counter.most_common(1)
print(most_common)
print(f"This term shows up {most_common_count} times in GoDag")

GO:0003887	level-04	depth-06	DNA-directed DNA polymerase activity [molecular_function]
This term shows up 17 times in GoDag


In [9]:
# Find which keys led to GO:0003887
go_0003887 = [k for k, v in obodag.items() if v == most_common]
print(f"Alternative ids: {go_0003887}")

Alternative ids: ['GO:0003887', 'GO:0016449', 'GO:0016450', 'GO:0016451', 'GO:0016000', 'GO:0016452', 'GO:0003889', 'GO:0003890', 'GO:0003891', 'GO:0015999', 'GO:0003888', 'GO:0019984', 'GO:0016448', 'GO:0003895', 'GO:0003893', 'GO:0003894', 'GO:0008723']


In [10]:
print(obodag["GO:0003888"])

GO:0003887	level-04	depth-06	DNA-directed DNA polymerase activity [molecular_function]


When we do `obodag.values` or `obodag.items`, we end up showing all the alternative ids. In the above example, we see "GO:0003887" 17 times, due to its 16 alternative ids. We can also check the online version of [GO:0003887](https://salivaryproteome.nidcr.nih.gov/public/index.php/Special:Ontology_Term/GO:0003887) and confirm these ids.

Copyright (C) 2016-2019, DV Klopfenstein, H Tang. All rights reserved.