In [12]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from lifelines import KaplanMeierFitter

In [13]:
data = pd.read_csv('datasets/echocardiogram.csv')

In [14]:
data.head()

Unnamed: 0,survival,alive,age,pericardialeffusion,fractionalshortening,epss,lvdd,wallmotion-score,wallmotion-index,mult,name,group,aliveat1
0,11.0,0.0,71.0,0.0,0.26,9.0,4.6,14.0,1.0,1.0,name,1,0.0
1,19.0,0.0,72.0,0.0,0.38,6.0,4.1,14.0,1.7,0.588,name,1,0.0
2,16.0,0.0,55.0,0.0,0.26,4.0,3.42,14.0,1.0,1.0,name,1,0.0
3,57.0,0.0,60.0,0.0,0.253,12.062,4.603,16.0,1.45,0.788,name,1,0.0
4,19.0,1.0,57.0,0.0,0.16,22.0,5.75,18.0,2.25,0.571,name,1,0.0


In [33]:
has_pericardial_effusion = data[data['pericardialeffusion'] == 1.0]
none_pericardial_effusion = data[data['pericardialeffusion'] == 0.0]

has_pericardial_effusion = has_pericardial_effusion[['survival', 'alive', 'age', 'pericardialeffusion', 'name']]
none_pericardial_effusion = none_pericardial_effusion[['survival', 'alive', 'age', 'pericardialeffusion', 'name']]

has_pericardial_effusion['observed'] = has_pericardial_effusion['alive'] + 1.0
has_pericardial_effusion['observed'] = has_pericardial_effusion['observed'].replace(2.0, 0.0)

none_pericardial_effusion['observed'] = none_pericardial_effusion['alive'] + 1.0
none_pericardial_effusion['observed'] = none_pericardial_effusion['observed'].replace(2.0, 0.0)

has_pericardial_effusion = has_pericardial_effusion.dropna()
none_pericardial_effusion = none_pericardial_effusion.dropna()

## Heart disease patient survival
You're a data scientist at a clinical research organization that studies heart diseases. You wonder if pericardial effusion, fluid build-up around the heart, affects heart attack patients' survival outcomes. In this exercise, you will explore how to use two statistical methods to compare survival distributions from patients with and without pericardial effusion.

The data is split up into two DataFrames:

* ```has_pericardial_effusion```: patients with pericardial effusion
* ```none_pericardial_effusion```: patients without pericardial effusion 

The ```pandas``` package is loaded as `pd` and the `KaplanMeierFitter` class is imported from lifelines.

In [34]:
# Instantiate Kaplan Meier object for patients with and without pericardial effusion
kmf_has_pe = KaplanMeierFitter()
kmf_no_pe = KaplanMeierFitter()

# Fit Kaplan Meier estimators to each DataFrame
kmf_has_pe.fit(durations=has_pericardial_effusion['survival'], 
          event_observed=has_pericardial_effusion['observed'])
kmf_no_pe.fit(durations=none_pericardial_effusion['survival'], 
          event_observed=none_pericardial_effusion['observed'])

<lifelines.KaplanMeierFitter:"KM_estimate", fitted with 101 total observations, 28 right-censored observations>

In [35]:
# Print out the median survival duration of each group
print("The median survival duration (months) of patients with pericardial effusion: ", kmf_has_pe.median_survival_time_)
print("The median survival duration (months) of patients without pericardial effusion: ", kmf_no_pe.median_survival_time_)

The median survival duration (months) of patients with pericardial effusion:  27.0
The median survival duration (months) of patients without pericardial effusion:  31.0


Based on the data, Patients without pericardial effusion have a longer lifetime than patients with pericardial effusion.