# Ontology mapping

Ontologies are structured and standardized representations of knowledge in a specific domain, defining the concepts, relationships, and properties within that domain. They matter for Electronic Health Records (EHR) as they provide a common vocabulary and framework for organizing and integrating healthcare data. By using ontologies, EHR systems can improve interoperability, semantic understanding, and facilitate effective data exchange, leading to enhanced decision support, data analysis, and collaboration among healthcare providers and also analysts.

ehrapy is compatible with [Bionty](https://github.com/laminlabs/bionty) which provides access to public ontologies and functionality to map values against them.

Here, we'll create an artificial AnnData object containing different diseases that we will map against to ensure that all of our annotations adhere to ontologies.

In [1]:
import anndata as ad
import numpy as np
import pandas as pd

Create an AnnData object with disease annotations in the `obs` slot.

In [2]:
adata = ad.AnnData(
    X=np.random.random((3, 3)),
    var=pd.DataFrame(index=[f"Lab value {val}" for val in range(3)]),
    obs=pd.DataFrame(
        columns=["Immune system disorders", "nervous system disorder", "injury"],
        data=[
            ["Rheumatoid arthritis", "Alzheimer's disease", "Fracture"],
            ["Celiac disease", "Parkinson's disease", "Traumatic brain injury"],
            ["Multipla sclurosis", "Epilepsy", "Fractured Femur"],
        ],
    ),
)
adata



AnnData object with n_obs × n_vars = 3 × 3
    obs: 'Immune system disorders', 'nervous system disorder', 'injury'

In [3]:
adata.obs

Unnamed: 0,Immune system disorders,nervous system disorder,injury
0,Rheumatoid arthritis,Alzheimer's disease,Fracture
1,Celiac disease,Parkinson's disease,Traumatic brain injury
2,Multipla sclurosis,Epilepsy,Fractured Femur


We notice that one of our injuries does not exist and we expect to have to correct it later.

## Introduction to Bionty

First we import Bionty.

In [4]:
import bionty_base as bt

✅ wrote new records from public sources.yaml to /home/zeth/.lamin/bionty/versions/sources_local.yaml!

if you see this message repeatedly, run: import bionty_base; bionty_base.reset_sources()


Bionty provides support for several ontologies related to diseases.

In [5]:
bt.display_available_sources().loc["Disease"]

Unnamed: 0_level_0,source,organism,version,url,md5,source_name,source_website
entity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Disease,mondo,all,2024-02-06,http://purl.obolibrary.org/obo/mondo/releases/...,78914fa236773c5ea6605f7570df6245,Mondo Disease Ontology,https://mondo.monarchinitiative.org
Disease,mondo,all,2023-08-02,http://purl.obolibrary.org/obo/mondo/releases/...,7f33767422042eec29f08b501fc851db,Mondo Disease Ontology,https://mondo.monarchinitiative.org
Disease,mondo,all,2023-04-04,http://purl.obolibrary.org/obo/mondo/releases/...,700c43dd9ba51aecc7a8edfc3bc2dab1,Mondo Disease Ontology,https://mondo.monarchinitiative.org
Disease,mondo,all,2023-02-06,http://purl.obolibrary.org/obo/mondo/releases/...,2b7d479d4bd02a94eab47d1c9e64c5db,Mondo Disease Ontology,https://mondo.monarchinitiative.org
Disease,mondo,all,2022-10-11,http://purl.obolibrary.org/obo/mondo/releases/...,04b808d05c2c2e81430b20a0e87552bb,Mondo Disease Ontology,https://mondo.monarchinitiative.org
Disease,doid,human,2024-01-31,http://purl.obolibrary.org/obo/doid/releases/2...,b36c15a4610757094f8db64b78ae2693,Human Disease Ontology,https://disease-ontology.org
Disease,doid,human,2023-03-31,http://purl.obolibrary.org/obo/doid/releases/2...,64f083a1e47867c307c8eae308afc3bb,Human Disease Ontology,https://disease-ontology.org
Disease,doid,human,2023-01-30,http://purl.obolibrary.org/obo/doid/releases/2...,9f0c92ad2896dda82195e9226a06dc36,Human Disease Ontology,https://disease-ontology.org
Disease,icd,human,icd-11-2023,s3://bionty-assets/df_human__icd__icd-11-2023_...,16263aef644d2c62c47b7b1ecfbad9d6,International Classification of Diseases (ICD),https://www.cdc.gov/nchs/icd/icd9cm.htm
Disease,icd,human,icd-10-2020,s3://bionty-assets/df_human__icd__icd-10-2020_...,93ec5734fcc2edd64686d5ffc6f6105f,International Classification of Diseases (ICD),https://www.cdc.gov/nchs/icd/icd9cm.htm


Bionty provides three key functionalities:

1. `inspect`: Check whether any of our values (here diseases) are mappable against a specified ontology.
2. `map_synonyms`: Map values against synonyms. This is not relevant for our diseases.
3. `curate`: Curate ontology values against the ontology to ensure compliance.

## Mapping against the MONDO Disease Ontology with Bionty

We will now showcase how to access the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/) with Bionty.
The Mondo Disease Ontology (Mondo) aims to harmonize disease definitions across the world.

There are several different sources available that provide definitions and data models for diseases, such as [HPO](https://hpo.jax.org/app), [OMIM](https://omim.org/), [SNOMED CT](http://www.snomed.org/), [ICD](https://www.cdc.gov/nchs/icd/icd10cm.htm), [PhenoDB](https://phenodb.org/), [MedDRA](https://www.meddra.org/), [MedGen](https://www.ncbi.nlm.nih.gov/medgen/), [ORDO](https://www.orpha.net/consor/cgi-bin/index.php?lng=EN), [DO](http://disease-ontology.org/), [GARD](https://rarediseases.info.nih.gov/), and others. However, these sources often overlap and sometimes conflict with each other, making it challenging to understand how they are related.

To address the need for a unified disease terminology that offers precise equivalences between disease concepts, Mondo was developed. Mondo is designed to unify multiple disease resources using a logic-based structure.

Bionty is centered around Bionty entity objects that provide the above introduced functionality. We'll now create a Bionty Disease object with the MONDO ontology as our source and a specific version for reproducibility.

In [6]:
disease_bionty = bt.Disease(source="mondo", version="2023-02-06")
disease_bionty

PublicOntology
Entity: Disease
Organism: all
Source: mondo, 2023-02-06
#terms: 25913


We can access the DataFrame that contains all ontology terms:

In [7]:
disease_bionty.df()

Unnamed: 0_level_0,name,definition,synonyms,parents
ontology_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
MONDO:0000001,disease or disorder,A Disease Is A Disposition To Undergo Patholog...,disorders|medical condition|other disease|dise...,[]
MONDO:0000002,"obsolete 46,XX sex reversal",,,[]
MONDO:0000003,obsolete 17-hydroxysteroid dehydrogenase defic...,,,[]
MONDO:0000004,adrenocortical insufficiency,An Endocrine Or Hormonal Disorder That Occurs ...,adrenal gland insufficiency|adrenal cortical i...,[MONDO:0002816]
MONDO:0000005,"alopecia, isolated",,,[MONDO:0021034]
...,...,...,...,...
MONDO:8000030,obsolete morphological anomaly,,,[]
MONDO:8000031,obsolete subtype of a disorder,,,[]
MONDO:8000032,obsolete malformation syndrome,,,[]
MONDO:8000033,obsolete group of disorders,,,[]


Let's inspect all of our "Immune system disorders" to learn which terms map against the MONDO Disease ontology.

In [8]:
disease_bionty.inspect(
    adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)

❗ [1;93m3 terms[0m (100.00%) are not validated for [3mname[0m: [1;93mRheumatoid arthritis, Celiac disease, Multipla sclurosis[0m
   detected [1;93m2 terms with inconsistent casing/synonyms[0m: [1;93mRheumatoid arthritis, Celiac disease[0m
→  standardize terms via [3m.standardize()[0m


Unnamed: 0,__validated__
Rheumatoid arthritis,False
Celiac disease,False
Multipla sclurosis,False


None of the values can be validated immediately, but "Rheumatoid arthritis" and "Celiac disease" have synonyms and can be standardized.

In [9]:
adata.obs["Immune system disorders"] = disease_bionty.standardize(adata.obs["Immune system disorders"], field=disease_bionty.name)

💡 standardized 2/3 terms


In [10]:
disease_bionty.inspect(
    adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)

✅ [1;92m2 terms[0m (66.70%) are validated for [3mname[0m
❗ [1;93m1 term[0m (33.30%) is not validated for [3mname[0m: [1;93mMultipla sclurosis[0m


Unnamed: 0,__validated__
rheumatoid arthritis,True
celiac disease,True
Multipla sclurosis,False


We can use Bionty's lookup functionality to try to find the corresponding term in the MONDO Disease ontology for the terms that could not be mapped using auto-complete.
For this purpose we create a lookup object.

In [11]:
disease_bionty_lookup = disease_bionty.lookup()

In [12]:
disease_bionty_lookup.multiple_sclerosis

Disease(ontology_id='MONDO:0005301', name='multiple sclerosis', definition='A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.', synonyms=None, parents=array(['MONDO:0006704', 'MONDO:0000568', 'MONDO:0002562', 'MONDO:0005560'],
      dtype=object), _5='multiple sclerosis')

We found a match! Let's look at the definition of our result.

In [13]:
disease_bionty_lookup.multiple_sclerosis.definition

'A Progressive Autoimmune Disorder Affecting The Central Nervous System Resulting In Demyelination. Patients Develop Physical And Cognitive Impairments That Correspond With The Affected Nerve Fibers.'

This is exactly what we've been looking for. We can also search directly.

In [14]:
disease_bionty.search(
    "Multipla sclurosis", field=disease_bionty.name, case_sensitive=False
)

Unnamed: 0_level_0,ontology_id,definition,synonyms,parents,__agg__,__ratio__
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
multiple sclerosis,MONDO:0005301,A Progressive Autoimmune Disorder Affecting Th...,,"[MONDO:0006704, MONDO:0000568, MONDO:0002562, ...",multiple sclerosis,88.888889
multiple sclerosis variant,MONDO:0016428,,,[MONDO:0005071],multiple sclerosis variant,72.727273
pediatric multiple sclerosis,MONDO:0018784,Pediatric Multiple Sclerosis (Ms) Is A Rare Mu...,,[MONDO:0016428],pediatric multiple sclerosis,69.565217
lateral sclerosis,MONDO:0018155,Primary Lateral Sclerosis (Pls) Is An Idiopath...,primary lateral sclerosis|adult-onset PLS|PLS|...,[MONDO:0024257],lateral sclerosis,68.571429
glomerulosclerosis,MONDO:0000490,A Hardening Of The Kidney Glomerulus Caused By...,glomerular sclerosis,[MONDO:0019722],glomerulosclerosis,68.421053
...,...,...,...,...,...,...
BAFopathy,MONDO:0700120,Disorder Caused By Mutations In The Various Su...,,[MONDO:0003847],bafopathy,14.814815
hydrocele,MONDO:0004920,,,[MONDO:0003150],hydrocele,14.814815
XH antigen,MONDO:0010760,,XH antigen,[MONDO:0003847],xh antigen,14.285714
angiomyxoma,MONDO:0006086,A Benign Soft Tissue Neoplasm Characterized By...,,"[MONDO:0021581, MONDO:0044335]",angiomyxoma,13.793103


Now we can finally replace the values of our obs column with the MONDO Disease ontology values.

In [15]:
adata.obs["Immune system disorders"].replace({"Multipla sclurosis": disease_bionty_lookup.multiple_sclerosis.name},
                                             inplace=True)
adata.obs["Immune system disorders"]

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  adata.obs["Immune system disorders"].replace({"Multipla sclurosis": disease_bionty_lookup.multiple_sclerosis.name},


0    rheumatoid arthritis
1          celiac disease
2      multiple sclerosis
Name: Immune system disorders, dtype: object

In [16]:
disease_bionty.inspect(
    adata.obs["Immune system disorders"], field=disease_bionty.name, return_df=True
)

✅ [1;92m3 terms[0m (100.00%) are validated for [3mname[0m


Unnamed: 0,__validated__
rheumatoid arthritis,True
celiac disease,True
multiple sclerosis,True


Voilà, all of our immune system disorders are mapped against the ontology. We could now repeat this process for all other columns.

## Mapping against other Disease ontologies

Bionty supports other ontologies besides the MONDO Disease Ontology like the [Disease Ontology](https://disease-ontology.org/) or [ICD](https://www.who.int/standards/classifications/classification-of-diseases). The workflow is the same.

We solely need to adapt the source and the version.

In [17]:
disease_bionty = bt.Disease(source="icd", version="icd-11-2023")
disease_bionty

PublicOntology
Entity: Disease
Organism: human
Source: icd, icd-11-2023
#terms: 35574


The remaining workflow would be the same as above.

## Conclusion

ehrapy provides support for ontology management, inspection and mapping through [Bionty](https://github.com/laminlabs/bionty).
Bionty provide access to ontologies such as the [Mondo Disease Ontology](https://mondo.monarchinitiative.org/), [Disease Ontology](https://disease-ontology.org/) and many others.
To access these ontologies we create a Bionty Disease objects that have class functions to map synonyms and to inspect data for adherence against ontologies.
Mismatches can be remedied by finding the actual correct ontology name using lookup objects or fuzzy matching.