<img src="https://raw.githubusercontent.com/AmsterdamUMC/AmsterdamUMCdb/master/img/logo_amds.png" alt="Logo" style="width: 128px;"/>

# AmsterdamUMCdb Dictionaries

Copyright &copy; 2003-2020 Amsterdam UMC - Amsterdam Medical Data Science

# Dictionaries
Creates lists of all available parameters. Especially useful in the data exploration phase. The Dutch version of [SNOMED CT](https://browser.ihtsdotools.org/) can be used as a starting point for (official) translations to English medical terms.

**To do**: mapping with [SNOMED CT](https://browser.ihtsdotools.org/), [LOINC](https://search.loinc.org/searchLOINC/), etc.

## Imports

In [9]:
%matplotlib inline
import psycopg2
import pandas as pd
import numpy as np
import re
from tqdm import tqdm

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import matplotlib as mpl

import io
from IPython.display import display, HTML, Markdown

## Display settings

In [4]:
#matplotlib settings for image size
#needs to be in a different cell from %matplotlib inline
plt.style.use('seaborn-darkgrid')
plt.rcParams["figure.dpi"] = 288
plt.rcParams["figure.figsize"] = [16, 12]
plt.rcParams["font.size"] = 12

pd.options.display.max_columns = None
pd.options.display.max_rows = None
pd.options.display.max_colwidth = 1000

## Connection settings

In [5]:
#Modify config.ini in the root folder of the repository to change the settings to connect to your postgreSQL database
import configparser
import os
config = configparser.ConfigParser()

if os.path.isfile('../config.ini'):
    config.read('../config.ini')
else:
    config.read('../config.SAMPLE.ini')

#Open a connection to the postgres database:
con = psycopg2.connect(database=config['psycopg2']['database'], 
                       user=config['psycopg2']['username'], password=config['psycopg2']['password'], 
                       host=config['psycopg2']['host'], port=config['psycopg2']['port'])
con.set_client_encoding('WIN1252') #Uses code page for Dutch accented characters.
con.set_session(autocommit=True)

cursor = con.cursor()
cursor.execute('SET SCHEMA \'amsterdamumcdb\''); #set search_path to amsterdamumcdb schema

In [6]:
# load sepsis cohort
sepsis = pd.read_csv('../concepts/diagnosis/sepsis.csv')
sepsis_admission_ids = list(sepsis.admissionid)
sepsis_admission_ids = ','.join([str(a) for a in sepsis_admission_ids])

In [7]:

for table in 'admissions drugitems  freetextitems listitems procedureorderitems processitems'.split():
    sql = '''
    SELECT * FROM {:s}  
    WHERE admissionid in ({:s})
    '''.format(table, sepsis_admission_ids)
    print(sql[:100] + '...')
    sepsis_data = pd.read_sql(sql,con)
    print(table, len(sepsis_data))
    sepsis_data.to_csv('../data/sepsis_{:s}.csv'.format(table),index=False)


    SELECT * FROM procedureorderitems  
    WHERE admissionid in (11,20,25,44,47,50,58,66,70,76,91,...
procedureorderitems 580232

    SELECT * FROM processitems  
    WHERE admissionid in (11,20,25,44,47,50,58,66,70,76,91,94,99,1...
processitems 44405


In [28]:
sepsis = pd.read_csv('../concepts/diagnosis/sepsis.csv')
sepsis_admission_ids = [str(a) for a in sepsis.admissionid]
n = 100
delta = int(len(sepsis_admission_ids) / n) + 1
table = 'numericitems'
df_list = []
for i in tqdm(range(n)):
    start = delta * i
    end = start + delta
    if start >= len(sepsis_admission_ids):
        break
    ids = ','.join(sepsis_admission_ids[start:end])
    sql = '''
    SELECT * FROM {:s}  
    WHERE admissionid in ({:s})
    '''.format(table, ids)
    # print(sql[:100] + '...')
    sepsis_data = pd.read_sql(sql,con)
    # print(table, len(sepsis_data))
    df_list.append(sepsis_data)
    # if i >= 10:
    #     break
sepsis_data = pd.concat(df_list, ignore_index=True)
print(len(sepsis_data))
print('saving...')
sepsis_data.to_csv('../data/sepsis_{:s}.csv'.format(table),index=False)
print('end saving.')

 98%|█████████▊| 98/100 [13:51<00:16,  8.48s/it]


194838488
