# Parse mapping of SNOMED to ICD-10

2017-06-06

Parse the mapping so that we can use it to cross reference drug indications from DrugCentral with rare disease information from OrphaNet to determine if the indications contain any information about rare diseases.

### Origin of files

Files from the March 2017 version of SNOMED CT US edition (downloaded from https://www.nlm.nih.gov/healthit/snomedct/us_edition.html).

Filename is `tls_Icd10cmHumanReadableMap_US1000124_20170301.tsv` from `/Documentation` after extracting the downloaded `zip` file. File was removed of `\r` using `dos2unix` and renamed to `snomed_icd10_map.tsv`.

In [1]:
import pandas as pd

In [2]:
data = (pd
    .read_csv("../data/raw/snomed/snomed_icd10_map.tsv", sep='\t')
    [["referencedComponentId", "sctName", "mapTarget", "icdName"]]
    .rename(columns={
        "referencedComponentId": "snomed_id",
        "sctName": "snomed_name",
        "mapTarget": "icd_id",
        "icdName": "icd_name"
    })
    .drop_duplicates()
)

In [3]:
data.shape

(156897, 4)

In [4]:
data.head()

Unnamed: 0,snomed_id,snomed_name,icd_id,icd_name
0,109006,Anxiety disorder of childhood OR adolescence (...,F93.0,Separation anxiety disorder of childhood
1,109006,Anxiety disorder of childhood OR adolescence (...,F40.8,Other phobic anxiety disorders
2,109006,Anxiety disorder of childhood OR adolescence (...,F94.8,Other childhood disorders of social functioning
4,109006,Anxiety disorder of childhood OR adolescence (...,F40.10,"Social phobia, unspecified"
6,109006,Anxiety disorder of childhood OR adolescence (...,F93.8,Other childhood emotional disorders


In [5]:
data.isnull().sum()

snomed_id          0
snomed_name        0
icd_id         12391
icd_name       12391
dtype: int64

Note that one snomed id may have multiple ICD-10 identifiers.

---

## Output to file

In [6]:
data.to_csv("data/snomed_icd10_map.tsv", sep='\t', index=False)