# Introduction 

This jupyter notbook is to illustrate the process of extracting survival curve from published literature in order to generate parametric/non-parametric survival distrbutions for use in cost-effectiveness analysis (CEA) or health technology assessments (HTA).

# System Information

In [1]:
import sys
print (sys.version)

3.6.1 (default, Sep  7 2017, 16:36:03) 
[GCC 6.3.0 20170406]


The next line is to allow for printing all python code as opposed to just the last line in the code syntax

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#  Example Curve

The curve that we will be using will be from [Neelapu et al](http://www.nejm.org/doi/full/10.1056/NEJMoa1707447) where the survival plot figure 2B indicates progression free surivival probabilities (from Kaplan-Meier estimates) are plotted. 

Below is a very brief tutorial on how to go about extracting the curve using the [WebPlotDigitizer](https://apps.automeris.io/wpd/). This app is very useful but requires a bit of manual handling (for now... working on a script that will generate the curves automatically).

# Curve Extraction using WebPlotDigitizer

1. Using snipping tool, extract a png file from the original pdf document of Figure 2B of the article and and input into the application using File -> Load image.

2. After aligning the axis (X and Y axes) and labeling the axes with appropriate values, highlight the curve area with a box then extract using X Step interpolation (0.5 Units would have change in X by 0.5 months).

3. View Data and Download .CSV or copy the two columns (via opening Graph in Plotly).

4. 

## 1.1 Create PNG

 Creating PNG using [GreenShot](http://getgreenshot.org/).

![Creating PNG of Figure 2B](www/CE_1.png)

## 1.2 Upload into WPD

Upload Into [WebPlotDigitizer](https://apps.automeris.io/wpd/)

![Input Into WebPlotDigitizer](www/CE_2.png)

## 2.1 Set values onto X and Y Axis

![Set Axis](www/GP_1.png)

## 2.2 Highlight Curve and Extract

- Highlight curve using box and extract using X Step interpolation with 0.5 as $\delta$X

![Extract Curve](www/GP_2.png)

## 3.1 View Data

- View data and format as user prefers (I usually keep 3 digits)

![View Data](www/VD_1.png)

## 3.2 Copy CSV

- Copy CSV so that it follows the Guyot guideline
- The S(t) must be monotone decreasing (there must not be any survival probabilities that x2 > x1 and S(x2) > S(x1)

![Guyot Example](www/guyotex.png)

Here we will illustrate how we can make the Guyot template csv file which contains 
column 1: Extracted coordinates which essentially numbers from 1 to n
column 2: Time
column 3: Survival Probabilities that is extracted from the WPD app
column 4: check using microsoft excel function to indicate whether current value is smaller than previous. Will indicate `FALSE` if not
column 5: S(t) after adjusting,  this will be done using a python code presented below
column 6: Final check to see if whether current value is smaller than previous.

Column 3 and 6 are essentially checks that indicate whether or not the S(x2) value is smaller than S(x1). These columns are unnecessary when running the Guyot code

In [3]:
# Load pandas for data manipulation
import pandas as pd

In [4]:
example_data = pd.read_csv("example.csv", header = None)

In [5]:
example_data.head(5)

Unnamed: 0,0,1
0,0.0,0.996
1,0.5,0.981
2,1.0,0.971
3,1.5,0.954
4,2.0,0.945


In [6]:
# To do
# 1. Add column 1 indicating extracted column
# 2. Add algorithm that will check monotone decreasing of survival probabilities
# 3. Add header for visual display
# 4. Plot KM Curve then end

In [7]:
# Add column 1
example_data['Extracted_Coordinates'] = pd.Series(range(1,len(example_data)+1))

In [8]:
example_data.head(5)

Unnamed: 0,0,1,Extracted_Coordinates
0,0.0,0.996,1
1,0.5,0.981,2
2,1.0,0.971,3
3,1.5,0.954,4
4,2.0,0.945,5


In [9]:
# Create function that will set data frame for Guyot algorithm data inserted must be pandas dataframe
def SurvivalCurveGuyot(data):
    
    # First set 0th row of the probability column as 1 because we want the survival probability at time 0 to be 1 
    data[1][0] = 1
    
    # CHeck monotonicity and if not then have the previous value be replaced by the next value. 
    for i in range(0,len(data)-1):
        if data[1][i] >= data[1][i+1]:
            pass
        else:
            data[1][i] = data[1][i+1]
            
    # Rearrange data so that the Extracted_Coordinates column is first
    data.reindex(columns=['Extracted_Coordinates',0,1])
    data.columns = ['Extracted_Coordinates','Time','S(t)']

In [None]:
SurvivalCurveGuyot(example_data)

In [12]:
example_data.head(5)

Unnamed: 0,Extracted_Coordinates,Time,S(t)
0,0.0,1.0,1
1,0.5,0.981,2
2,1.0,0.971,3
3,1.5,0.954,4
4,2.0,0.945,5


## Ready to Run

Now we are ready to put this data into the Dashboard