### Programming for Biomedical Informatics
#### Week 9 Assignment - 

In this weekly mini assignment you will use [`pronto`](https://pypi.org/project/pronto/) to find the term name associated with an ontology term id and use it to create a summary data based on data from the GenCC (that we worked with earlier in the course).

The GenCC schema uses CURIEs (compact URIs) which have the form prefix:identifier, for example `HGNC:10896` which is a Human Genome Nomenclature Committee (HGNC) gene identifier. We are going to extract information about the mode of inheritance (MOI) for the diseases in GenCC and use ontologies to add information to our analyses. In this case the MOI accessions are actually terms from the human phenotype ontology (HPO).

- link to the current HPO OBO file download - http://purl.obolibrary.org/obo/hp.obo
- [HPO homepage at BioPortal](https://bioportal.bioontology.org/ontologies/HP)

We did something similar in the snippet this week.

In [None]:
# You may need to install these packages first
# %pip install pronto

# handling www based requests (like APIs)
import urllib as ul

# ontology handling
import pronto
# standard Python data handling modules
import pandas as pd
import numpy as np
# working with nicer tables
import prettytable as PrettyTable

In [None]:
# fetch the Human Phenotype Onology OBO file and parse it with pronto

# download the GO ontology OBO file
import urllib.request

current_hpo_url = 'http://purl.obolibrary.org/obo/hp.obo'

# download the file
urllib.request.urlretrieve(current_hpo_url,'hpo.obo');

# parse the file
go = pronto.Ontology('hpo.obo')

In [None]:
# load the GenCC data
gencc = pd.read_csv('gencc-submissions.tsv', sep='\t')
gencc.head()

In [None]:
# find the unique MOI CURIEs
moi_curies = gencc['moi_curie'].unique()

# use pronto to find the MOI terms
moi_terms = [go[m] for m in moi_curies]

In [None]:
# print the terms using PrettyTable with the ID and name
t = PrettyTable.PrettyTable(['ID','Name'])
for term in moi_terms:
    t.add_row([term.id,term.name])
print(t)

In [None]:
# we now want to create a table with a count of the number of entries associated with each MOI term
# create a table where the columns are 'Mode of Inheritance Name', 'TermID', and 'Number of Entries'
moi_count = pd.DataFrame(columns=['Mode of Inheritance Name', 'TermID', 'Number of Entries'])

# loop over the MOI terms
for term in moi_terms:
    # get the number of entries associated with the term
    n_entries = len(gencc[gencc['moi_curie'] == term.id])
    # add a row to the table using concat
    moi_count = pd.concat([moi_count, pd.DataFrame([[term.name, term.id, n_entries]], columns=['Mode of Inheritance Name', 'TermID', 'Number of Entries'])])

# sort the table by the number of entries in descending order
moi_count = moi_count.sort_values(by='Number of Entries', ascending=False)

# print the table using PrettyTable
t = PrettyTable.PrettyTable(['Mode of Inheritance Name', 'TermID', 'Number of Entries'])

for index, row in moi_count.iterrows():
    t.add_row([row['Mode of Inheritance Name'], row['TermID'], row['Number of Entries']])
print(t)

In [None]:
# plot the table as a barplot using seaborn
import seaborn as sns
sns.barplot(x='Number of Entries', y='Mode of Inheritance Name', data=moi_count);