# Ontology mapping

Ontologies are structured and standardized representations of knowledge in a specific domain, defining the concepts, relationships, and properties within that domain. They matter for Electronic Health Records (EHR) as they provide a common vocabulary and framework for organizing and integrating healthcare data. By using ontologies, EHR systems can improve interoperability, semantic understanding, and facilitate effective data exchange, leading to enhanced decision support, data analysis, and collaboration among healthcare providers and also analysts.

ehrapy is compatible with [Bionty](https://github.com/laminlabs/bionty) which provides access to public ontologies and functionality to map values against them.

Here, we'll create an artificial AnnData object containing different diseases that we will map against to ensure that all of our annotations adhere to ontologies.

In [1]:
import anndata as ad
import numpy as np
import pandas as pd

Create an AnnData object with disease annotations in the `obs` slot.

In [2]:
adata = ad.AnnData(
    X=np.random.random((3, 3)),
    var=pd.DataFrame(index=[f"Lab value {val}" for val in range(3)]),
    obs=pd.DataFrame(
        columns=["Immune system disorders", "nervous system disorder", "injury"],
        data=[
            ["Rheumatoid arthritis", "Alzheimer's disease", "Fracture"],
            ["Celiac disease", "Parkinson's disease", "Traumatic brain injury"],
            ["Multipla sclurosis", "Epilepsy", "Fractured Femur"],
        ],
    ),
)
adata



AnnData object with n_obs × n_vars = 3 × 3
    obs: 'Immune system disorders', 'nervous system disorder', 'injury'

In [3]:
adata.obs

Unnamed: 0,Immune system disorders,nervous system disorder,injury
0,Rheumatoid arthritis,Alzheimer's disease,Fracture
1,Celiac disease,Parkinson's disease,Traumatic brain injury
2,Multipla sclurosis,Epilepsy,Fractured Femur


We notice that one of our injuries does not exist and we expect to have to correct it later.

## Introduction to Bionty

First we import Bionty.

In [4]:
import bionty as bt

✅ New records found in the public sources.yaml, updated /home/zeth/.lamin/bionty/versions/sources.local.yaml!


Bionty provides support for several ontologies related to diseases.

In [5]:
bt.display_available_sources().loc["Disease"]

Unnamed: 0_level_0,source,species,version,url,md5,source_name,source_website
entity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Disease,mondo,all,2023-02-06,,2b7d479d4bd02a94eab47d1c9e64c5db,Mondo Disease Ontology,https://mondo.monarchinitiative.org/
Disease,mondo,all,2022-10-11,,04b808d05c2c2e81430b20a0e87552bb,Mondo Disease Ontology,https://mondo.monarchinitiative.org/
Disease,doid,human,2023-01-30,,9f0c92ad2896dda82195e9226a06dc36,Human Disease Ontology,https://disease-ontology.org/


Bionty provides three key functionalities:

1. `inspect`: Check whether any of our values (here diseases) are mappable against a specified ontology.
2. `map_synonyms`: Map values against synonyms. This is not relevant for our diseases.
3. `curate`: Curate ontology values against the ontology to ensure compliance.

## Mapping against the MONDO Disease Ontology with Bionty

We will now showcase how to access the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/) with Bionty.
The Mondo Disease Ontology (Mondo) aims to harmonize disease definitions across the world.

There are several different sources available that provide definitions and data models for diseases, such as [HPO](https://hpo.jax.org/app), [OMIM](https://omim.org/), [SNOMED CT](http://www.snomed.org/), [ICD](https://www.cdc.gov/nchs/icd/icd10cm.htm), [PhenoDB](https://phenodb.org/), [MedDRA](https://www.meddra.org/), [MedGen](https://www.ncbi.nlm.nih.gov/medgen/), [ORDO](https://www.orpha.net/consor/cgi-bin/index.php?lng=EN), [DO](http://disease-ontology.org/), [GARD](https://rarediseases.info.nih.gov/), and others. However, these sources often overlap and sometimes conflict with each other, making it challenging to understand how they are related.

To address the need for a unified disease terminology that offers precise equivalences between disease concepts, Mondo was developed. Mondo is designed to unify multiple disease resources using a logic-based structure.

Bionty is centered around Bionty entity objects that provide the above introduced functionality. We'll now create a Bionty Disease object with the MONDO ontology as our source and a specific version for reproducibility.

In [6]:
disease_bionty = bt.Disease(source="mondo", version="2023-02-06")
disease_bionty

Disease
Species: all
Source: mondo, 2023-02-06

📖 Disease.df(): ontology reference table
🔎 Disease.lookup(): autocompletion of ontology terms
🔗 Disease.ontology: Pronto.Ontology object

We can access the DataFrame that contains all ontology terms:

In [7]:
disease_bionty.df()

Unnamed: 0_level_0,name,definition,synonyms,children
ontology_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
http://identifiers.org/hgnc/10001,RGS5,,,[]
http://identifiers.org/hgnc/10004,RGS9,,,[]
http://identifiers.org/hgnc/10006,RHAG,,,[]
http://identifiers.org/hgnc/10012,RHO,,,[]
http://identifiers.org/hgnc/10013,GRK1,,,[]
...,...,...,...,...
UBERON:8410056,capillary of anorectum,,,[]
UBERON:8410057,capillary of colon,,,[]
UBERON:8420000,hair of scalp,,,[]
UBERON:8440004,laminar subdivision of the cortex,,,[UBERON:0002301]


Let's inspect all of our "Immune system disorders" to learn which terms map against the MONDO Disease ontology.
We 

In [8]:
disease_bionty.inspect(
    adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)

✅ 1 terms (33.3%) are mapped.
🔶 2 terms (66.7%) are not mapped.


Unnamed: 0_level_0,__mapped__
Immune system disorders,Unnamed: 1_level_1
Rheumatoid arthritis,True
Celiac disease,False
Multipla sclurosis,False


Apparently "Rheumatoid arthritis" could be mapped to the MONDO Disease ontology, but "Celiac disease" and "Multiple sclerosis" could not.

We can use Bionty's lookup functionality to try to find the corresponding term in the MONDO Disease ontology for the terms that could not be mapped using auto-complete.
For this purpose we create a lookup object.

In [9]:
disease_bionty_lookup = disease_bionty.lookup()

In [10]:
disease_bionty_lookup.celiac_disease

disease(ontology_id='MONDO:0005130', name='celiac disease', definition='An Autoimmune Genetic Disorder With An Unknown Pattern Of Inheritance That Primarily Affects The Digestive Tract. It Is Caused By Intolerance To Dietary Gluten. Consumption Of Gluten Protein Triggers An Immune Response Which Damages Small Intestinal Villi And Prevents Adequate Absorption Of Nutrients. Clinical Signs Include Abdominal Cramping, Diarrhea Or Constipation And Weight Loss. If Untreated, The Clinical Course May Progress To Malnutrition, Anemia, Osteoporosis And An Increased Risk Of Intestinal Malignancies. However, The Prognosis Is Favorable With Successful Avoidance Of Gluten In The Diet.', synonyms='gluten-induced enteropathy|celiac sprue|idiopathic steatorrhea|gluten intolerance|coeliac disease|non tropical sprue', children=array(['MONDO:0800124'], dtype=object))

We found a match! Let's look at the definition of our result.

In [11]:
disease_bionty_lookup.celiac_disease.definition

'An Autoimmune Genetic Disorder With An Unknown Pattern Of Inheritance That Primarily Affects The Digestive Tract. It Is Caused By Intolerance To Dietary Gluten. Consumption Of Gluten Protein Triggers An Immune Response Which Damages Small Intestinal Villi And Prevents Adequate Absorption Of Nutrients. Clinical Signs Include Abdominal Cramping, Diarrhea Or Constipation And Weight Loss. If Untreated, The Clinical Course May Progress To Malnutrition, Anemia, Osteoporosis And An Increased Risk Of Intestinal Malignancies. However, The Prognosis Is Favorable With Successful Avoidance Of Gluten In The Diet.'

This is exactly what we've been looking for. To find a final match for "Multiple sclerosis" we use Bionty's fuzzy matching.

In [12]:
disease_bionty.fuzzy_match(
    "Multipla sclurosis", field=disease_bionty.name, case_sensitive=False
)

Unnamed: 0_level_0,ontology_id,definition,synonyms,children,__ratio__
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
multiple sclerosis,MONDO:0005301,A Progressive Autoimmune Disorder Affecting Th...,,"[MONDO:0005314, MONDO:0005284]",88.888889


In [13]:
disease_bionty_lookup.multiple_sclerosis

disease(ontology_id='MONDO:0005301', name='multiple sclerosis', definition='A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.', synonyms=None, children=array(['MONDO:0005314', 'MONDO:0005284'], dtype=object))

Now we can finally replace the values of our obs column with the MONDO Disease ontology values.

In [14]:
adata.obs["Immune system disorders"] = [
    adata.obs["Immune system disorders"][0],
    disease_bionty_lookup.celiac_disease.name,
    disease_bionty_lookup.multiple_sclerosis.name,
]
adata.obs["Immune system disorders"]

0    Rheumatoid arthritis
1          celiac disease
2      multiple sclerosis
Name: Immune system disorders, dtype: object

In [15]:
disease_bionty.inspect(
    adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)

✅ 3 terms (100.0%) are mapped.
🔶 0 terms (0.0%) are not mapped.


Unnamed: 0_level_0,__mapped__
Immune system disorders,Unnamed: 1_level_1
Rheumatoid arthritis,True
celiac disease,True
multiple sclerosis,True


Voilà, all of our immune system disorders are mapped against the ontology. We could now repeat this process for all other columns.

## Mapping against the Disease Ontology with Bionty

Bionty supports other ontologies besides the MONDO Disease Ontology like the [Disease Ontology](https://disease-ontology.org/). The workflow here is very similar.

We solely need to adapt the source and the version.

In [16]:
disease_bionty = bt.Disease(source="doid", version="2023-01-30")

The remaining workflow would be the same as above.

## Conclusion

ehrapy provides support for ontology management, inspection and mapping through [Bionty](https://github.com/laminlabs/bionty).
Bionty provide access to ontologies such as the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/), [Disease Ontology](https://disease-ontology.org/) and many others.
To access these ontologies we create a Bionty Disease objects that have class functions to map synonyms and to inspect data for adherence against ontologies.
Mismatches can be remedied by finding the actual correct ontology name using lookup objects or fuzzy matching.