<a href="https://colab.research.google.com/github/ufbfung/mimic/blob/main/exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MIMIC-III Dataset
Access to the MIMIC-III dataset can be found at PhysioNet website (https://physionet.org/content/mimiciii/1.4/).

Freely available for non-commercial research purposes, but requires approval.

## Setup environment


### Load environment variables
This section is just a template for future querying. Not used currently.

In [1]:
!pip install python-dotenv
from dotenv import load_dotenv
import os

# Load variables from .env file
load_dotenv('credentials.env')

# Access the variables
username = os.getenv('mimic_username')
password = os.getenv('mimic_password')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting python-dotenv
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.0


## Integrate with BigQuery
This section will allow integration with Google BigQuery to bring data into your Colab instance.

### Authenticate and authorize Colab Notebook
This step will install relevant libraries for authentication and authorization and also authenticate your Colab notebook to connect to Google BigQuery.

In [2]:
# Install relevant libraries
from google.colab import auth # this provides functions for authenticating and authorizing your Google account within the Colab environment

# Authenticate
auth.authenticate_user() # this initiates the authentication process which will prompt you to authenticate your Google account. The purpose of this authentication is to allow your Colab notebook to access Google Cloud services, such as BigQuery, using your authorized account.

### Create a client
This step will create a client to access a specific project within BigQuery. You will want to copy the name of the project you're interested in querying from BigQuery and set it to the project variable.

In [3]:
# Install relevant library
from google.cloud import bigquery

# Define project you're interested in querying from BigQuery
project = 'mimic-biomed215'

# Create a client
client = bigquery.Client(project=project)

### Write a query
This step involves writing an SQL query you want to use. Below is a simple query that can be used to quickly check the connection

In [4]:
# Write a sample query to check connection
query = """
SELECT *
FROM `physionet-data.mimiciii_clinical.admissions`
LIMIT 5
"""

### Run the query & get results


In [7]:
# Run the query
query_job = client.query(query)

# Get the results
results = query_job.result()

### Process and extract the data
This will process the data from the query and print them

In [6]:
# Process and extract the data
for row in results:
  print(row) # process the row data

Row((3757, 3115, 134067, datetime.datetime(2139, 2, 13, 3, 11), datetime.datetime(2139, 2, 20, 7, 33), None, 'EMERGENCY', 'EMERGENCY ROOM ADMIT', 'SNF', 'Medicare', None, None, None, 'WHITE', datetime.datetime(2139, 2, 13, 0, 2), datetime.datetime(2139, 2, 13, 3, 22), 'STAB WOUND', 0, 1), {'ROW_ID': 0, 'SUBJECT_ID': 1, 'HADM_ID': 2, 'ADMITTIME': 3, 'DISCHTIME': 4, 'DEATHTIME': 5, 'ADMISSION_TYPE': 6, 'ADMISSION_LOCATION': 7, 'DISCHARGE_LOCATION': 8, 'INSURANCE': 9, 'LANGUAGE': 10, 'RELIGION': 11, 'MARITAL_STATUS': 12, 'ETHNICITY': 13, 'EDREGTIME': 14, 'EDOUTTIME': 15, 'DIAGNOSIS': 16, 'HOSPITAL_EXPIRE_FLAG': 17, 'HAS_CHARTEVENTS_DATA': 18})
Row((8689, 7124, 109129, datetime.datetime(2188, 7, 11, 0, 58), datetime.datetime(2188, 8, 1, 12, 4), None, 'EMERGENCY', 'EMERGENCY ROOM ADMIT', 'SNF', 'Medicare', None, None, None, 'WHITE', datetime.datetime(2188, 7, 10, 14, 17), datetime.datetime(2188, 7, 11, 1, 52), 'PENILE LACERATION-CELLULITIS', 0, 1), {'ROW_ID': 0, 'SUBJECT_ID': 1, 'HADM_ID': 