# Analyzing Data Using the FHIR Bulk Data Export API

The Bulk Data API is still very early in it's development, which you can follow here: https://github.com/smart-on-fhir/bulk-data-server.git. 

In this notebook, we'll implement a very simple client that can access and download [FHIR bulk data](http://wiki.hl7.org/index.php?title=201801_Bulk_Data) from the [Demo SMART Bulk Data Server](https://bulk-data.smarthealthit.org). This notebook is based on a [FHIR Connectathon Project](https://github.com/plangthorne/python-fhir/blob/master/demo/BulkDataDemo.ipynb) that was also set up to implement simple authentication according to [SMART Authorization Guide protocol](http://docs.smarthealthit.org/authorization/backend-services/), but we're ignoring this aspect since it's a bit technical and out of scope for this class. 

## Initial Setup

Clone project from github: `git clone https://github.com/uw-fhir/bulk-fhir-tutorial.git`, and then open `bulk-fhir-tutorial.ipynb` in JupyterLab or Jupyter Notebook

## Server Configuration

We'll start by reading required config parameters, which define the FHIR server and other options. We'll be testing against the [Demo SMART Bulk Data Server](https://bulk-data.smarthealthit.org). 

In [2]:
import yaml

with open('config.yaml') as f:
    config = yaml.load(f)

# Generating a Data Analysis Pipeline

Creating maintainable and comprehensive pipelines for data analysis - where the process becomes as automated as possible - is incredibly useful for rapid iteration, reprucibility, updates, clarity, and documentation purposes. (talk about it more). 

To create this type of pipeline, we want to be able to query for required data at the source, automatically transform and load this data into a suitable format, and load it into the desired analysis software. 

In this app, we'll demo this approach the nascent Bulk FHIR API and Python to
1. Request at the data we want to analyze from the FHIR server
2. Access and transform the data for analysis
3. Analyze the data using R

## 1. Dataset Exploration

First, we're going to use the [SMART Patient Browser](https://patient-browser.smarthealthit.org/index.html?config=r3#/) tool to explore the patients and associated data. 

Click on the tool and play around with it a bit, clicking on the different listed [Patients](https://www.hl7.org/fhir/patient.html) and then exploring their associated FHIR Resources like [Immunizations](https://www.hl7.org/fhir/immunization.html) or [Encounters](https://www.hl7.org/fhir/encounter.html). 

## 2. Query Generation

Now that we have a feel for the data, we need to decide what specific resources we're interested in and generate a query by using the proper [FHIR Bulk Data query parameters](https://github.com/smart-on-fhir/fhir-bulk-data-docs/blob/master/export.md#query-parameters). 

We can first try out our downloads by using a very simple tool made by the SMART folks - [The FHIR Bulk Downloader](https://bulk-data.smarthealthit.org/sample-app/index.html?server=https%3A%2F%2Fbulk-data.smarthealthit.org%2FeyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MX0%2Ffhir)

We'll focus on the Patient-level export since we're only interested in data that is in some way associated with a Patient (see https://github.com/smart-on-fhir/fhir-bulk-data-docs/blob/master/export.md#query-parameters for more information), and want to download resources that will be useful for analysis. 

In this example, I'll be looking at [Immunizations](https://www.hl7.org/fhir/immunization.html). 

1. Go to [The FHIR Bulk Downloader](https://bulk-data.smarthealthit.org/sample-app/index.html?server=https%3A%2F%2Fbulk-data.smarthealthit.org%2FeyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MX0%2Ffhir)

2. Select the desired resources, patient groups, and time frame. 

3. Notice how each selection modifies the download link url.

4. Try modifying the url yourself using the available query parameters, and see what happens after pressing `Download`. 
   
   For example, type this in: `https://bulk-data.smarthealthit.org/eyJlcnIiOiIiLCJwYWdlIjoxMDAwMCwiZHVyIjoxMCwidGx0IjoxNSwibSI6MX0/fhir/Patient/$export?_type=Patient,Immunization&_typeFilter=Immunization%3Fvaccine-code%3D140,Patient%3Fgender=female`
   
5. Note this generated url string for use later in the tutorial to get the desired dataset. I will be using the query above. 

## 3. Download Data into Python

### Consume data

Once the dataset is provisioned we can iterate through the resources in the bulk data set. The client leaves most data  on the remote server and waits to retrieve subsequent files until the previous file in the manifest is consumed.

In [5]:
data = client.iter_json()
next(data)

{u'activity': [{u'detail': {u'code': {u'coding': [{u'code': u'226234005',
       u'display': u'Healthy diet',
       u'system': u'http://snomed.info/sct'}]},
    u'status': u'in-progress'}},
  {u'detail': {u'code': {u'coding': [{u'code': u'703993001',
       u'display': u'Colonoscopy planned',
       u'system': u'http://snomed.info/sct'}]},
    u'status': u'in-progress'}},
  {u'detail': {u'code': {u'coding': [{u'code': u'243072006',
       u'display': u'Cancer education',
       u'system': u'http://snomed.info/sct'}]},
    u'status': u'in-progress'}}],
 u'addresses': [{u'reference': u'urn:uuid:d4f83046-5f89-4378-96f4-4e15766f9d97'}],
 u'category': [{u'coding': [{u'code': u'395082007',
     u'display': u'Cancer care plan',
     u'system': u'http://snomed.info/sct'}]}],
 u'context': {u'reference': u'urn:uuid:334228ee-2228-43c4-bea3-438e83d73017'},
 u'intent': u'order',
 u'period': {u'start': u'2015-11-09'},
 u'resourceType': u'CarePlan',
 u'status': u'active',
 u'subject': {u'reference':

### Cleanup

We should close the connection in the underlying session prior to releasing the client. Alternatively the client also functions as a context manager for simplicity.

In [6]:
client.session.close()

with BulkDataClient(**config) as client:
    client.provision()
    print client.provisioned

True
