# Querying MIMIC

#### BIME 498, Special Topics - SPR 2019
#### Exploring the Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC) database.


This tutorial walks through simply querying in MIMIC using Jupyter Notebook as the primary database interface instead of pgadmin4.  This was done to smooth the process of data processing and machine learning exercises, so that we wouldn't have to separately log in, query MIMIC in pgadmin4, download each query as a CSV, and subsequently upload CSV files into panda dataframes.  The tutorial assumes some prerequisites:

1) You should have already installed Python via Anachonda.

2) You should already have postgreSQL 10.6, 11.1 downloaded.  

3) You should be connected to the remote database through the ssh tunnel in git bash.  

4) You should have installed pandas for data management purposes.

5) You should have installed numpy for statistical programming requirements.

6) You should have installed psycopg2 to communicate with postgres.

7) You should have completed all required CITI PROGRAM training modules to access to MIMIC data.

Import the appropriate libraries:

In [2]:
import numbers
import numpy as np
import pandas as pd
import psycopg2 as ps
from matplotlib import pyplot as plt

Connect to the database, with the following parameters.  Remember, you must first connect through the SSH tunnel; otherwise the connection will fail and you will see an error message.

In [4]:
param = { 
    'host':'127.0.0.1',
    'port' : '5432',
    'dbname' : 'mimic_iii',
    'user' : 'mimic_ro',
    'password' : 'uj3&24rSD%$F'
}

try:
    conn = ps.connect(**param)    
except (Exception, ps.DatabaseError) as error:
    print(error)

cur = conn.cursor()

##  Getting started with MIMIC tables

There're two ways to select columns depending on your comfort with SQL querying syntax or Pandas. You could `SELECT` particular columns in the query itself, or you could elect to drop columns once the data has been read into a Pandas dataframe using `read_sql_query()` and `drop()`.  

In the event that you want to check your work and print the table you've manipulated, it is recommended that you set a row `LIMIT` in the query to prevent hundreds of thousands of rows of data from crashing your notebook. In the example below, only the first 10 rows have been imported to the dataframe.

In [6]:
sql_1 = "SELECT * FROM mimic_iii.chartevents LIMIT 10"
chartevents = pd.read_sql_query(sql_1, conn)
print(chartevents)

     row_id  subject_id  hadm_id  icustay_id  itemid           charttime  \
0  34309354           4   185777      294638      84 2191-03-16 08:00:00   
1  34309355           4   185777      294638      85 2191-03-16 08:00:00   
2  34309356           4   185777      294638      86 2191-03-16 08:00:00   
3  34309357           4   185777      294638      87 2191-03-16 08:00:00   
4  34309358           4   185777      294638      88 2191-03-16 08:00:00   
5  34334388          13   143045      263738      52 2167-01-10 08:30:00   
6  34334389          13   143045      263738      55 2167-01-10 08:30:00   
7  34334390          13   143045      263738      59 2167-01-10 08:30:00   
8  34334391          13   143045      263738      62 2167-01-10 08:30:00   
9  34334392          13   143045      263738     113 2167-01-10 08:30:00   

0 2191-03-16 09:16:00  18692       Sl. Limited       NaN     None    None   
1 2191-03-16 09:16:00  18692        Occ. Moist       NaN     None    None   
2 2191-03

## 1a. Admissions criteria

Find cause for admissions to the ICU with a simple word counter. If you need help using the admissions table, go to: https://mimic.physionet.org/mimictables/admissions/.

In [8]:
# your answer here

You now have an idea of the diverse spread of the reason for patient's admittance. Let's get more specific.

The opioid crisis has been rampant, and commands particular salience in today's news.  What percentage of people were admitted for overdoses. Is the number what you might expect? Can you hypothesize why this percentage might be so low or high? 

In [9]:
# your answer here

## 1b. What does Admissions table say about death?

Again, observe the column titles associated with the Admissions table.  Find the frequency of ICU deaths as a percentage of total ICU admittance.

In [10]:
# your answer here

## 2. Diagnoses

Billable diagnoses are recorded according to ICD9 codes, which specify different disease on a coded basis. (Healthcare currently uses ICD10, but at the time of the MIMIC data, ICD9 was used. Do not confuse ICD10 with ICD9!) A quick Google search should give you a good idea of how the ICD9 codes are organized. For starters, V codes are for diagnoses unrelated to an underlying disease or disorder; E codes are for diagnoses with an environmental cause. All other codes are purely numerical. For a quick rundown of the diagnoses table in MIMIC, go to: https://mimic.physionet.org/mimictables/diagnoses_icd/

Look up the ICD9 code conventions and design a method to count codes, and therefore diagnosis frequency. As a warning, the Mimic database converts the usual ICD9 codes into integers. Thus, "401.9", for hypertension, becomes "4019". 

Provide a list of the top ten ICD9 codes, and their count (the number of patients with this diagnosis). 

In [13]:
# your answer here

According to the data above, what is the number one diagnosis made upon ICU admittance?  How many people are coming in with diabetes?

In [14]:
# your answer here

## 3. Medications (Prescriptions)

What medications are patients given who share a diagnosis? For example, how often do hypertension patients also take insulin? (which would mean that they are also diabetic!) In addition, how similar are the top 10 medications for patients with the same diagnosis (you would expect some level of consistency, if physicians are treating the same disease in the same way). In the Mimic database, we have a table of prescriptions, (https://mimic.physionet.org/mimictables/prescriptions/) which will use as a proxy for the medications the patient is taking. 

To carry out these queries, you'll need to use tables that have prescriptions as well as the ICD9 diagnosis table. You should choose which disease / diagnosis you will inspect: Please use one of these three: Diabetes, Hypertension, or Heart disease. For your disease, print out the top 20 prescriptions. Also, you must indicate what percentage of patients had each of these prescriptions. (E.g., a large percentage of diabetes patients will have insulin prescribed. But does the data show 80%? 95%? 99%?)


In [14]:
# your answer here

## 4. Open choice query

See directions in the assignment specification. Explain your query (briefly) in this mark-down cell, and provide your answer below.  

In [None]:
# youe answer here

## Close the connection

Once you're done with your work, close the connection to the database, and remember to save your work!

In [None]:
cur.close()
conn.close()