### Readme

`author: Alessia Peviani (The Hyve), version: 20 May 2020`

The notebook contains functions to **map a list of ontology codes to their corresponding OMOP concept_ids** (as long as the ontology is supported by OMOP). Requires connection to a database with pre-loaded OMOP vocabularies. 

Functions (see docstring for full description):
- **map_ontology_code_to_standard_and_non_standard( )**, retrieves both the ***source concept_id*** for the original ontology code, and the corresponding ***standard concept_id*** by looking up the source concept_id in the OMOP `CONCEPT_RELATIONSHIP` table - multiple mappings are possible
- **map_ontology_code_to_any( )**, retrieves any matching OMOP concept_id for the original ontology code



In [1]:
import pandas as pd
import numpy as np

In [2]:
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, aliased

# NOTE: update with correct database connection details as needed
engine = create_engine('postgresql://postgres:postgres@localhost:6000/cllear', echo=False)
Base = declarative_base(engine)

class Concept(Base):
    __tablename__ = 'concept'
    __table_args__ = {'schema': 'vocab', 'autoload' : True}
    
class ConceptRelationship(Base):
    __tablename__ = 'concept_relationship'
    __table_args__ = {'schema': 'vocab', 'autoload' : True}


In [3]:
metadata = Base.metadata
Session = sessionmaker(bind=engine)
session = Session()

In [4]:
def map_ontology_code_to_standard_and_non_standard(source_code_list, vocabulary_id='ICD10', invalid_reason=None, standard_concept=None):
    
    '''
    Retrieves the non-standard OMOP concept_id representing the original ontology code,
    and the standard OMOP concept_id (typically SNOMED) by looking up the "Maps to" relationship
    in the OMOP concept_relationship table.
    
    To find invalid codes, provide invalid_reason=['D','R','U'] (all or some values)
    
    Default behavior is to look for valid non-standard concept_ids 
    from the ICD10-CM (clinical modification extension) vocabulary.
    
    SQLAlchemy default join method is "inner" 
    (i.e. data retrieved only if concept_id present in both Concept and ConcepRelationship tables).
    This works well as non-standard concept_ids are (supposedly) 
    always mapped to standard concept_ids via the concept_relationship table.
    '''
    
    if type(invalid_reason) == list:
        
        records_sq = session \
            .query(Concept.concept_code, ConceptRelationship.concept_id_2) \
            .join(ConceptRelationship, Concept.concept_id == ConceptRelationship.concept_id_1) \
            .filter(
                Concept.concept_code.in_(source_code_list), 
                Concept.invalid_reason.in_(invalid_reason), # list
                Concept.standard_concept==standard_concept,
                Concept.vocabulary_id==vocabulary_id,
                ConceptRelationship.relationship_id=='Maps to') \
            .subquery()
    else:

        records_sq = session \
            .query(Concept.concept_code, ConceptRelationship.concept_id_2) \
            .join(ConceptRelationship, Concept.concept_id == ConceptRelationship.concept_id_1) \
            .filter(
                Concept.concept_code.in_(source_code_list), 
                Concept.invalid_reason==invalid_reason, # string or None
                Concept.standard_concept==standard_concept,
                Concept.vocabulary_id==vocabulary_id,
                ConceptRelationship.relationship_id=='Maps to') \
            .subquery()
        
    records = session \
        .query(records_sq, Concept) \
        .join(Concept, records_sq.c.concept_id_2 == Concept.concept_id)

    records_df = pd.DataFrame([{
        'code' : record[0], 
        'source_concept_id' : record[1], 
        'source_vocabulary_id' : vocabulary_id,
        'target_concept_id' : record.Concept.__dict__['concept_id'],
        'target_concept_name' : record.Concept.__dict__['concept_name'],
        'target_vocabulary_id' : record.Concept.__dict__['vocabulary_id'],
        'valid_start_date' : record.Concept.__dict__['valid_start_date'],
        'valid_end_date' : record.Concept.__dict__['valid_end_date'],
        'invalid_reason' : record.Concept.__dict__['invalid_reason']
    } for record in records])

    return records_df

# test
display(map_ontology_code_to_standard_and_non_standard(['S52.50','A00.0']))
display(map_ontology_code_to_standard_and_non_standard(['S52.50','A00.0'], vocabulary_id='ICD10'))
display(map_ontology_code_to_standard_and_non_standard(['S52.50','A00.0'], vocabulary_id='ICD10', invalid_reason=['D','R','U']))

Unnamed: 0,code,source_concept_id,source_vocabulary_id,target_concept_id,target_concept_name,target_vocabulary_id,valid_start_date,valid_end_date,invalid_reason
0,A00.0,4344638,ICD10,4344638,Cholera due to Vibrio cholerae O1 Classical bi...,SNOMED,1970-01-01,2099-12-31,


Unnamed: 0,code,source_concept_id,source_vocabulary_id,target_concept_id,target_concept_name,target_vocabulary_id,valid_start_date,valid_end_date,invalid_reason
0,A00.0,4344638,ICD10,4344638,Cholera due to Vibrio cholerae O1 Classical bi...,SNOMED,1970-01-01,2099-12-31,


Unnamed: 0,code,source_concept_id,source_vocabulary_id,target_concept_id,target_concept_name,target_vocabulary_id,valid_start_date,valid_end_date,invalid_reason
0,S52.50,437116,ICD10,437116,Closed fracture of distal end of radius,SNOMED,1970-01-01,2099-12-31,


In [5]:
def map_ontology_code_to_any(source_code_list):
    
    '''
    Retrieves the non-standard OMOP concept_id representing the original ontology code.
    Does not attempt mapping to the standard OMOP concept_id.
    
    Any matching code in the list is retrieved with no filters applied.
    '''

    records = session \
                .query(Concept) \
                .filter(Concept.concept_code.in_(source_code_list))
    
    records_df = pd.DataFrame([{
        'code' : record.__dict__['concept_code'], 
        'source_concept_id' : record.__dict__['concept_id']
    } for record in records])

    return records_df

# test
display(map_ontology_code_to_any(['S52.50','A00.0']))

Unnamed: 0,code,source_concept_id
0,A00.0,35205396
1,A00.0,45537707
2,S52.50,1573160
3,S52.50,45755919
