# Exploring a SNOMED-CT uk extension Release

In [1]:
import pandas as pd
import numpy as np
import pickle

## Loading the SNOMED UK extention release files

In [2]:
snomed_dir = '/Users/shek/Desktop/medcat/SNOMED UK'

Use Snapshot, instead of Full, here, as Full contains all historical concepts since 2014. Delta only contains differences from last version.
https://confluence.ihtsdotools.org/display/DOCGLOSS/Snapshot+release

In [3]:
base_term = f'{snomed_dir}/uk_sct2cl_28.0.0_20191001000001/'
int_terminology = base_term + 'SnomedCT_InternationalRF2_PRODUCTION_20180731T120000Z/Snapshot/Terminology'
uk_ext_terminology = base_term + 'SnomedCT_UKClinicalRF2_PRODUCTION_20191001T000001Z/Snapshot/Terminology'

In [4]:
def parse_file(filename, first_row_header=True, columns=None):
    with open(filename, encoding='utf-8') as f:
        entities = [[n.strip() for n in line.split('\t')] for line in f]
        return pd.DataFrame(entities[1:], columns=entities[0] if first_row_header else columns)

## SNOMED CT Design

### SNOMED CT Components
SNOMED CT is a clinical terminology containing concepts with unique meanings and formal logic based definitions organised into hierarchies.
For further information please see: https://confluence.ihtsdotools.org/display/DOCSTART/4.+SNOMED+CT+Basics

SNOMED CT content is represented into 3 main types of components:
- __Concepts__ representing clinical meanings that are organised into hierarchies.
- __Descriptions__ which link appropriate human readable terms to concepts
- __Relationships__ which link each concept to other related concepts

__NOTE:__ SNOMED-CT (UK Ed.) is an extension to the Int Ed. Both sets of files (Int. and the UK Ext.) are released as part of one 'UK Release'.

Load and merge the active concept from the international and UK Extention __Concept snapshot__ files

#### __Table 4.2.1-1:__ Concept file - Detailed Specification

|Field|Data type|Purpose|Mutable|Part of Primary Key|
|:-----|:-----|:-----|:-----|:-----|
|id|SCTID|Uniquely Idenfies the concept|NO|YES (Full/Snapshot)|
|effectiveTime|Time|Specifies the inclusive date at which the component version's state became the then current valid state of the component.|YES|YES (Full)<br>Optional (Snapshot)|
|active|Boolean|Specifies whether the concept was active or inactive from the nominal release date specified by the effectiveTime.|YES|NO|
|moduleId|SCTID|Identifies the concept version's module. Set to a descendant of 900000000000443000(Module) within the metadata hierarchy.|YES|NO|
|definitionStatusId|SCTID|Specifies if the concept version is primitive or defined. Set to a descendant of 900000000000444006(Definition status)in the metadata hierarchy.|YES|NO|

Taken from: https://confluence.ihtsdotools.org/display/DOCRELFMT

In [5]:
int_terms = parse_file(f'{int_terminology}/sct2_Concept_Snapshot_INT_20180731.txt')
uk_terms = parse_file(f'{uk_ext_terminology}/sct2_Concept_Snapshot_GB1000000_20191001.txt')
terms = pd.concat([int_terms, uk_terms])
active_terms = terms[terms.active == '1'] # active concepts are represented with 1

In [6]:
# Every concept has a unique concept identifier: active_terms['id'] 
active_terms.head()

Unnamed: 0,id,effectiveTime,active,moduleId,definitionStatusId
1,101009,20020131,1,900000000000207008,900000000000074008
2,102002,20020131,1,900000000000207008,900000000000074008
3,103007,20020131,1,900000000000207008,900000000000074008
4,104001,20020131,1,900000000000207008,900000000000073002
6,106004,20020131,1,900000000000207008,900000000000074008


Load and merge the active descriptions from the international and UK Extention __Description snapshot__ files

#### __Table 4.2.2-1:__ Description file - Detailed Specification

|Field|Data type|Purpose|Mutable|Part of Primary Key|
|:-----|:-----|:-----|:-----|:-----|
|id|SCTID|Uniquely identifies the description.|NO|YES (Full/Snapshot)|
|effectiveTime|Time|Specifies the inclusive date at which the component version's state became the then current valid state of the component|YES|YES (Full)<br>Optional \|Snapshot\||
|active|Boolean|Specifies whether the state of the description was active or inactive from the nominal release date specified by the effectiveTime.|YES|NO|
|moduleId|SCTID|Identifies the description version's module. Set to a child of 900000000000443000\|Module\| within the metadata hierarchy.|YES|NO|
|conceptId|SCTID|Identifies the concept to which this description applies. Set to the identifier of a concept in the 138875005 \|SNOMED CT Concept\| hierarchy within the Concept. Note that a specific version of a description is not directly bound to a specific version of the concept to which it applies. Which version of a description applies to a concept depends on its effectiveTime and the point in time at which it is accessed.|NO|NO|
|languageCode|String|Specifies the language of the description text using the two character ISO-639-1 code. Note that this specifies a language level only, not a dialect or country code.|NO|NO|
|typeId|SCTID|Identifies whether the description is fully specified name a synonym or other description type. This field is set to a child of 900000000000446008\|Description type\| in the Metadata hierarchy.|NO|NO|
|term|String|The description version's text value, represented in UTF-8 encoding.|YES|NO|
|caseSignificanceId|SCTID|Identifies the concept enumeration value that represents the case significance of this description version. For example, the term may be completely case sensitive, case insensitive or initial letter case insensitive. This field will be set to a child of 900000000000447004\|Case significance\| within the metadata hierarchy.|YES|NO|

Taken from: https://confluence.ihtsdotools.org/display/DOCRELFMT

In [7]:
int_desc = parse_file(f'{int_terminology}/sct2_Description_Snapshot-en_INT_20180731.txt')
uk_desc = parse_file(f'{uk_ext_terminology}/sct2_Description_Snapshot-en_GB1000000_20191001.txt')
descs = pd.concat([int_desc, uk_desc])
active_descs = descs[descs.active == '1']

In [8]:
active_descs.head()

Unnamed: 0,id,effectiveTime,active,moduleId,conceptId,languageCode,typeId,term,caseSignificanceId
0,101013,20170731,1,900000000000207008,126813005,en,900000000000013009,Neoplasm of anterior aspect of epiglottis,900000000000448009
1,102018,20170731,1,900000000000207008,126814004,en,900000000000013009,Neoplasm of junctional region of epiglottis,900000000000448009
2,103011,20170731,1,900000000000207008,126815003,en,900000000000013009,Neoplasm of lateral wall of oropharynx,900000000000448009
3,104017,20170731,1,900000000000207008,126816002,en,900000000000013009,Neoplasm of posterior wall of oropharynx,900000000000448009
4,105016,20170731,1,900000000000207008,126817006,en,900000000000013009,Neoplasm of esophagus,900000000000448009


Load and merge the relationships from the international and UK Extention __Relationship snapshot__ files

#### __Table 4.2.3-1:__ Relationship file - Detailed specification

|Field|Data type|Purpose|Mutable|Part of Primary Key|
|:-----|:-----|:-----|:-----|:-----|
|id|SCTID|Uniquely identifies the relationship.|NO|YES(Full/Snapshot)|
|effectiveTime|Time|Specifies the inclusive date at which the component version's state became the then current valid state of the component.|YES|YES(Full) Optional(Snapshot)|
|active|Boolean|Specifies whether the state of the relationship was active or inactive from the nominal release date specified by the effectiveTime field.|YES|NO|
|moduleId|SCTID|Identifies the relationship version's module. Set to a child of 900000000000443000\|Module\| within the metadata hierarchy.|YES|NO|
|sourceId|SCTID|Identifies the source concept of the relationship version. That is the concept defined by this relationship. Set to the identifier of a concept.|NO|NO|
|destinationId|SCTID|Identifies the concept that is the destination of the relationship version.<br>That is the concept representing the value of the attribute represented by the typeId column.<br>Set to the identifier of a concept.<br>Note that the values that can be applied to particular attributes are formally defined by the SNOMED CT Machine Readable Concept Model.|NO|NO|
|relationshipGroup|Integer|Groups together relationship versions that are part of a logically associated relationshipGroup. All active Relationship records with the same relationshipGroup number and sourceId are grouped in this way.|YES|NO|
|typeId|SCTID|Identifies the concept that represent the defining attribute (or relationship type) represented by this relationship version.<br><br>That is the concept representing the value of the attribute represented by the typeId column. <br><br>Set to the identifier of a concept. The concept identified must be either 116680003\|Is a\| or a subtype of 410662002\|Concept model attribute\|. The concepts that can be used as in the typeId column are formally defined as follows:<br>116680003\|is a\| OR < 410662002\|concept model attribute\|<br><br>__Note__ that the attributes that can be applied to particular concepts are formally defined by the SNOMED CT Machine Readable Concept Model.|NO|NO|
|characteristicTypeId|SCTID|A concept enumeration value that identifies the characteristic type of the relationship version (i.e. whether the relationship version is defining, qualifying, etc.) This field is set to a descendant of 900000000000449001\|Characteristic type\|in the metadata hierarchy.|YES|NO|
|modifierId|SCTID|A concept enumeration value that identifies the type of Description Logic(DL) restriction (some, all, etc.). Set to a child of 900000000000450001\|Modifier\| in the metadata hierarchy.<br> __Note__ Currently the only value used in this column is 900000000000451002\|Some\| and thus in practical terms this column can be ignored.|YES|NO|

Taken from: https://confluence.ihtsdotools.org/display/DOCRELFMT

In [9]:
int_relat = parse_file(f'{int_terminology}/sct2_Relationship_Snapshot_INT_20180731.txt')
uk_relat = parse_file(f'{uk_ext_terminology}/sct2_Relationship_Snapshot_GB1000000_20191001.txt')
relat = pd.concat([int_relat, uk_relat])
active_relat = relat[relat.active == '1']

In [10]:
active_relat.head()

Unnamed: 0,id,effectiveTime,active,moduleId,sourceId,destinationId,relationshipGroup,typeId,characteristicTypeId,modifierId
1,101021,20020131,1,900000000000207008,10000006,29857009,0,116680003,900000000000011006,900000000000451002
2,102025,20020131,1,900000000000207008,10000006,9972008,0,116680003,900000000000011006,900000000000451002
13,114022,20020131,1,900000000000207008,134035007,84371003,0,116680003,900000000000011006,900000000000451002
26,127021,20020131,1,900000000000207008,134136005,57250008,0,116680003,900000000000011006,900000000000451002
29,130025,20020131,1,900000000000207008,10002003,116175006,0,116680003,900000000000011006,900000000000451002


## SNOMED CT Concept Model

<img src="img/Association Between Files from 2019.png">

Taken from: https://confluence.ihtsdotools.org/display/DOCRELFMT

Find the fully specified name, Synonym or Definition of a SNOMED concept

__Description type__

|Type id|Term|
|:---:|:---|
|900000000000003001|Fully specified name|
|900000000000013009|Synonym|
|900000000000550004|Definition|


In [11]:
# Functions for finding the concept name and all synonymns for a SNOMED concept

def find_name(snomedcode):
    """
    Converts SNOMED code to Fully specified name and finds any Synonyms
    """
    active_terms['id'].str.match(str(snomedcode))
    df = active_descs[active_descs['conceptId'] == str(snomedcode)]
    concept_name = df[df['typeId']== '900000000000003001']
    concept_name = concept_name['term'].values
    return f"{''.join(concept_name)}"

def find_syn(snomedcode):
    """
    Converts SNOMED code and finds all Synonyms. Not including concept name
    """
    active_terms['id'].str.match(str(snomedcode))
    df = active_descs[active_descs['conceptId'] == str(snomedcode)]
    synonym = df[df['typeId']== '900000000000013009']
    synonym = synonym['term'].to_list()
    return f"{'; '.join(synonym)}"


In [12]:
find_name(138875005)

'SNOMED CT Concept (SNOMED RT+CTV3)'

In [13]:
find_syn(138875005)

'SNOMED CT Concept; SNOMED CT has been created by combining SNOMED RT and a computer-based nomenclature and classification known as Read Codes Version 3, which was created on behalf of the U.K. Department of Health.; © 2002-2018 International Health Terminology Standards Development Organisation (IHTSDO). All rights reserved. SNOMED CT®, was originally created by The College of American Pathologists. "SNOMED" and "SNOMED CT" are registered trademarks of the IHTSDO.; SNOMED Clinical Terms version: 20180731 [R] (July 2018 Release); 28.0.0_20191001000001 UK clinical extension; 27.0.0_20190601000001 UK clinical extension'

Create a DataFrame which contains only the active SNOMED codes and thier fully specified name

In [14]:
active_with_desc = pd.merge(active_terms, active_descs[active_descs['typeId'] == '900000000000003001'], left_on=['id'], right_on=['conceptId'], how='inner')
active_with_desc.describe()

Unnamed: 0,id_x,effectiveTime_x,active_x,moduleId_x,definitionStatusId,id_y,effectiveTime_y,active_y,moduleId_y,conceptId,languageCode,typeId,term,caseSignificanceId
count,369745,369745,369745,369745,369745,369745,369745,369745,369745,369745,369745,369745,369745,369745
unique,369742,63,1,5,2,369745,62,1,5,369742,1,1,369745,3
top,22711000000107,20020131,1,900000000000207008,900000000000074008,618378016,20170731,1,900000000000207008,22711000000107,en,900000000000003001,Injury of muscle of abdomen (disorder),900000000000448009
freq,2,176398,369745,338954,273358,1,249335,369745,338954,2,369745,369745,1,268398


So for some reason there are 3 concepts which have 2 active primary descriptions.

In [15]:
# Inspect the duplicates
active_with_desc[active_with_desc.duplicated(['id_x'], keep='first')]

Unnamed: 0,id_x,effectiveTime_x,active_x,moduleId_x,definitionStatusId,id_y,effectiveTime_y,active_y,moduleId_y,conceptId,languageCode,typeId,term,caseSignificanceId
352230,22711000000107,20040131,1,999000011000000103,900000000000074008,47671000000114,20101001,1,999000011000000103,22711000000107,en,900000000000003001,GP82 - sent to Health Board (finding),900000000000017005
353646,298641000000100,20071001,1,999000011000000103,900000000000074008,527611000000119,20071001,1,999000011000000103,298641000000100,en,900000000000003001,Antigen specific effector T cell measurement (...,900000000000020002
354035,321411000000108,20080401,1,999000011000000103,900000000000074008,618321000000116,20080401,1,999000011000000103,321411000000108,en,900000000000003001,Foetus with cardiovascular abnormality (disorder),900000000000020002


In [16]:
# drop duplicates
active_with_desc = active_with_desc.drop_duplicates(['id_x'], keep='first')
assert len(active_with_desc) == len(active_terms)

Create the top-level Concept which each concept is linked to:
tui -> term unique identifier

In [17]:
import re

def find_tui(concept_name):
    return re.match(r"\((\w+\s?.?\s?\w+.?\w+.?\w+.?)\)$")
active_with_desc['tui'] = active_with_desc['term'].str.extract(r"\((\w+\s?.?\s?\w+.?\w+.?\w+.?)\)$")

In [18]:
active_with_desc.describe()

Unnamed: 0,id_x,effectiveTime_x,active_x,moduleId_x,definitionStatusId,id_y,effectiveTime_y,active_y,moduleId_y,conceptId,languageCode,typeId,term,caseSignificanceId,tui
count,369742,369742,369742,369742,369742,369742,369742,369742,369742,369742,369742,369742,369742,369742,369742
unique,369742,63,1,5,2,369742,62,1,5,369742,1,1,369742,3,58
top,88531004,20020131,1,900000000000207008,900000000000074008,618378016,20170731,1,900000000000207008,88531004,en,900000000000003001,Injury of muscle of abdomen (disorder),900000000000448009,disorder
freq,1,176398,369742,338954,273355,1,249335,369742,338954,1,369742,369742,1,268398,77086


In [19]:
active_with_desc[active_with_desc['tui'].isnull()].values

array([], shape=(0, 15), dtype=object)

In [20]:
# The number of unique TUIs
active_with_desc['tui'].unique()

array(['organism', 'substance', 'procedure', 'body structure', 'disorder',
       'occupation', 'finding', 'qualifier value',
       'morphologic abnormality', 'cell structure', 'physical object',
       'regime/therapy', 'product', 'medicinal product', 'cell', 'person',
       'ethnic group', 'environment', 'observable entity', 'event',
       'religion/philosophy', 'attribute', 'physical force', 'situation',
       'medicinal product form', 'navigational concept', 'clinical drug',
       'social concept', 'tumor staging', 'specimen', 'basic dose form',
       'life style', 'dose form', 'linkage concept', 'staging scale',
       'record artifact', 'assessment scale', 'SNOMED RT+CTV3',
       'geographic location', 'environment / location',
       'inactive concept', 'special concept', 'namespace concept',
       'racial group', 'link assertion', 'foundation metadata concept',
       'core metadata concept', 'disposition', 'unit of presentation',
       'OWL metadata concept', 'number'

Explore what each tui contains:

In [24]:
active_with_desc[active_with_desc['tui'] == 'number']

Unnamed: 0,id_x,effectiveTime_x,active_x,moduleId_x,definitionStatusId,id_y,effectiveTime_y,active_y,moduleId_y,conceptId,languageCode,typeId,term,caseSignificanceId,tui
323114,734048000,20170731,1,900000000000207008,900000000000074008,3482145016,20170731,1,900000000000207008,734048000,en,900000000000003001,0.088 (number),900000000000448009,number


### Create the input required for a MedCAT concept database

In [25]:
snomed_cdb_active_only = active_with_desc.loc[:, ['id_x', 'term', 'tty', 'tui_code', 'tui']]
snomed_cdb_active_only.columns = ['cui', 'str', 'tty', 'tui', 'sty']
snomed_cdb_active_only['cui'] = snomed_cdb_active_only.cui.apply(lambda code: f'S-{code}')
snomed_cdb_active_only['onto'] = 'SNOMED-CT'

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


In [26]:
#snomed_cdb_csv = concepts_df.loc[:, ['conceptId', 'term', 'tty']]
#snomed_cdb_csv['conceptId'] = snomed_cdb_csv.conceptId.apply(lambda code: f'S-{code}')
#snomed_cdb_csv.columns = ['cui', 'str', 'tty']
#snomed_cdb_csv['onto'] = 'SNOMED-CT'

In [27]:
snomed_cdb_active_only # just for active concepts

Unnamed: 0,cui,str,tty,tui,sty,onto
0,S-101009,Quilonia ethiopica (organism),,,organism,SNOMED-CT
1,S-102002,Hemoglobin Okaloosa (substance),,,substance,SNOMED-CT
2,S-103007,Squirrel fibroma virus (organism),,,organism,SNOMED-CT
3,S-104001,Excision of lesion of patella (procedure),,,procedure,SNOMED-CT
4,S-106004,Structure of posterior carpal region (body str...,,,body structure,SNOMED-CT
...,...,...,...,...,...,...
369740,S-999951000000101,Urine homocysteine:creatinine ratio (observabl...,,,observable entity,SNOMED-CT
369741,S-999961000000103,Urine aspartate:creatinine ratio (observable e...,,,observable entity,SNOMED-CT
369742,S-999971000000105,Urine alanine:creatinine ratio (observable ent...,,,observable entity,SNOMED-CT
369743,S-999981000000107,Urine valine:creatinine ratio (observable entity),,,observable entity,SNOMED-CT


#### Create a MedCAT concept database including all synonyms

In [28]:
_ = pd.merge(active_terms, active_descs, left_on=['id'], right_on=['conceptId'], how='inner')
active_with_primary_desc = _[_['typeId'] == '900000000000003001']
active_with_primary_desc = active_with_primary_desc.drop_duplicates(['id_x'], keep='first')
active_with_synonym_desc = _[_['typeId'] == '900000000000013009']
active_with_all_desc = pd.concat([active_with_primary_desc, active_with_synonym_desc])
active_with_all_desc

Unnamed: 0,id_x,effectiveTime_x,active_x,moduleId_x,definitionStatusId,id_y,effectiveTime_y,active_y,moduleId_y,conceptId,languageCode,typeId,term,caseSignificanceId
1,101009,20020131,1,900000000000207008,900000000000074008,525331019,20070731,1,900000000000207008,101009,en,900000000000003001,Quilonia ethiopica (organism),900000000000017005
5,102002,20020131,1,900000000000207008,900000000000074008,536464016,20030731,1,900000000000207008,102002,en,900000000000003001,Hemoglobin Okaloosa (substance),900000000000017005
7,103007,20020131,1,900000000000207008,900000000000074008,547184017,20090131,1,900000000000207008,103007,en,900000000000003001,Squirrel fibroma virus (organism),900000000000017005
10,104001,20020131,1,900000000000207008,900000000000073002,557742014,20170731,1,900000000000207008,104001,en,900000000000003001,Excision of lesion of patella (procedure),900000000000448009
13,106004,20020131,1,900000000000207008,900000000000074008,577123019,20170731,1,900000000000207008,106004,en,900000000000003001,Structure of posterior carpal region (body str...,900000000000448009
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
963520,999951000000101,20160401,1,999000011000000103,900000000000074008,2577191000000117,20160401,1,999000011000000103,999951000000101,en,900000000000013009,Urine homocysteine:creatinine ratio,900000000000020002
963522,999961000000103,20160401,1,999000011000000103,900000000000074008,2572911000000110,20160401,1,999000011000000103,999961000000103,en,900000000000013009,Urine aspartate:creatinine ratio,900000000000020002
963524,999971000000105,20160401,1,999000011000000103,900000000000074008,2584231000000112,20160401,1,999000011000000103,999971000000105,en,900000000000013009,Urine alanine:creatinine ratio,900000000000020002
963526,999981000000107,20160401,1,999000011000000103,900000000000074008,2555751000000118,20160401,1,999000011000000103,999981000000107,en,900000000000013009,Urine valine:creatinine ratio,900000000000020002


In [29]:
# Check if there are the same amount of active concepts
assert len(active_with_all_desc[active_with_all_desc['typeId'] == '900000000000003001']) == len(active_terms)

In [30]:
snomed_cdb_df = pd.merge(active_with_all_desc, active_with_desc, left_on=['id_x'], right_on=['conceptId'], how='inner')

In [31]:
snomed_cdb_df.columns

Index(['id_x_x', 'effectiveTime_x_x', 'active_x_x', 'moduleId_x_x',
       'definitionStatusId_x', 'id_y_x', 'effectiveTime_y_x', 'active_y_x',
       'moduleId_y_x', 'conceptId_x', 'languageCode_x', 'typeId_x', 'term_x',
       'caseSignificanceId_x', 'id_x_y', 'effectiveTime_x_y', 'active_x_y',
       'moduleId_x_y', 'definitionStatusId_y', 'id_y_y', 'effectiveTime_y_y',
       'active_y_y', 'moduleId_y_y', 'conceptId_y', 'languageCode_y',
       'typeId_y', 'term_y', 'caseSignificanceId_y', 'tui'],
      dtype='object')

In [32]:
# clean up the merge and rename the columns to fit the medcat Concept database criteria
snomed_cdb_df = snomed_cdb_df.loc[:, ['id_x_x','term_x','typeId_x','tui']]
snomed_cdb_df.columns = ['cui', 'str', 'tty', 'sty']
snomed_cdb_df['onto'] = 'SNOMED-CT'
snomed_cdb_df['tty'] = snomed_cdb_df['tty'].replace(['900000000000003001', '900000000000013009'], [1,0])
snomed_cdb_df

Unnamed: 0,cui,str,tty,sty,onto
0,101009,Quilonia ethiopica (organism),1,organism,SNOMED-CT
1,101009,Quilonia ethiopica,0,organism,SNOMED-CT
2,102002,Hemoglobin Okaloosa (substance),1,substance,SNOMED-CT
3,102002,Hemoglobin Okaloosa,0,substance,SNOMED-CT
4,102002,"Hb 48(CD7), Leu-arg",0,substance,SNOMED-CT
...,...,...,...,...,...
963521,999971000000105,Urine alanine:creatinine ratio,0,observable entity,SNOMED-CT
963522,999981000000107,Urine valine:creatinine ratio (observable entity),1,observable entity,SNOMED-CT
963523,999981000000107,Urine valine:creatinine ratio,0,observable entity,SNOMED-CT
963524,999991000000109,Urine tyrosine:creatinine ratio (observable en...,1,observable entity,SNOMED-CT


In [None]:
# write to csv
snomed_cdb_df.to_csv('snomed_cdb_csv_SNOMED-CT-UK_Release_20191001.csv')

## Hierarchies 

### Root and top-level Concepts
All concepts appear from the root concept 138875005 |SNOMED CT Concept (SNOMED RT+CTV3)|


####  Table 3: Top Level Concepts 
These concepts all root from the base concept: 138875005, (SNOMED CT Concept (SNOMED RT+CTV3))<br>These concepts are all linked via the relationship typeId: 116680003, (is a)



|SCID|Name|
|:---:|:---:|
|404684003 |Clinical finding|
|71388002 |Procedure|
|363787002 |Observable entity|
|123037004 |Body structure|
|410607006 |Organism|
|105590001 |Substance|
|373873005 |Pharmaceutical / biologic product|
|123038009 |Specimen|
|370115009 |Special concept|
|900000000000441003 |SNOMED CT Model Component|
|78621006 |Physical force|
|272379006 |Event|
|308916002 |Environment or geographical location|
|48176007 |Social context|
|243796009 |Situation with explicit context|
|254291000 |Staging and scales|
|260787004 |Physical object|
|362981000 |Qualifier value|
|419891008 |Record artifact|


Taken from Techincal implementation guide(4.1), Table 4.1-3: https://confluence.ihtsdotools.org/display/DOCTIG 

In [33]:
# Check if the Top level concepts are the same in the SNOMED UK Extention
top_level_concepts = active_relat[active_relat['destinationId']=='138875005']
top_level_concepts['conceptname'] = top_level_concepts['sourceId'].apply(find_name)
top_level_concepts[['sourceId', 'conceptname']].reset_index()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,index,sourceId,conceptname
0,143166,254291000,Staging and scales (staging scale)
1,143194,260787004,Physical object (physical object)
2,143274,272379006,Event (event)
3,143613,308916002,Environment or geographical location (environm...
4,143811,123037004,Body structure (body structure)
5,143812,123038009,Specimen (specimen)
6,144417,48176007,Social context (social concept)
7,144501,71388002,Procedure (procedure)
8,144577,78621006,Physical force (physical force)
9,144660,362981000,Qualifier value (qualifier value)


### Parents and Children
Subtype relationship 116680003|Is a (attribute)| relates a Concept to its immediate supertype Concepts.

In [34]:
# Get parent linage through the: 116680003|Is a (attribute)|
def get_parents(snomedcode):
    """
    Will retrieve all parent concepts for a SNOMED code
    """
    snomedcode = str(snomedcode)
    if snomedcode == '138875005':  # Base concept
        return []
    else:
        tree = [[find_name(snomedcode)]]
        top_level_concepts = active_relat[active_relat['sourceId']==snomedcode]
        # 
        # top_level_concepts['conceptname'] = top_level_concepts['destinationId'].apply(find_name)
        for parent in top_level_concepts['destinationId'].values:
            level = []
            level.append(parent)
        tree.append(level)
        
        return tree


In [45]:
get_parents(12738006)

[['Brain structure (body structure)'], ['389079005']]

In [42]:
top_level_concepts = active_relat[active_relat['sourceId']=='91175000']
top_level_concepts

Unnamed: 0,id,effectiveTime,active,moduleId,sourceId,destinationId,relationshipGroup,typeId,characteristicTypeId,modifierId
293429,297347025,20020131,1,900000000000207008,91175000,313287004,0,116680003,900000000000011006,900000000000451002
2676647,9181407026,20180731,1,900000000000207008,91175000,299718000,0,116680003,900000000000011006,900000000000451002
2676648,9181408020,20180731,1,900000000000207008,91175000,12738006,0,363698007,900000000000011006,900000000000451002


In [46]:
find_name(12738006)

'Brain structure (body structure)'

In [44]:
find_name('299718000')

'Finding of brain (finding)'

In [None]:
tree = []
top_level_concepts = active_relat[active_relat['sourceId']=='91175000']
top_level_concepts['conceptname'] = top_level_concepts['destinationId'].apply(find_name)
for parent in top_level_concepts['sourceId'].values:
    tree.append(parent)
tree

In [None]:
for a in parent:
    print(a)

In [None]:
top_level_concepts = active_relat[active_relat['destinationId']=='138875005']
top_level_concepts['conceptname'] = top_level_concepts['sourceId'].apply(find_name)
top_level_concepts[['sourceId', 'conceptname']].reset_index()

In [None]:
active_terms[active_terms['id'] == '91175000']

In [None]:
find_name(10000006)

In [None]:
find_name(29857009)

In [None]:
find_name(9972008)

In [None]:
from tqdm import tqdm

name_concepts = active_descs[active_descs['typeId'] == '900000000000003001']
name_concepts.head(50)

In [None]:
for 
name_concepts['conceptname'] = name_concepts['conceptId'].apply(find_name)
name_concepts

In [None]:
True in active_terms['id'].str.match(str(91175000))

In [None]:
snomed_to_name(91175000)

In [None]:
# Relationships
# Relationships are built from sourceId linked to destinationId by typeId

def find_parents(conceptid, typeId = '116680003'):
    """
    This function will find all the parents to the concept inputed
    typeId: The type of relationship. Default (Is a)
    return: List of parents of the concept
    """
    try 
    
    
        

In [None]:
active_descs[active_descs['conceptId'] == '91175000']

In [None]:
active_descs['typeId'].unique()

In [None]:
active_descs.describe()

In [None]:
active_relat[active_relat['sourceId'] == '404684003']

In [None]:
active_relat['typeId'].unique()

In [None]:
active_descs[active_descs['conceptId'] == '116680003']

In [None]:
116680003