# Echocardiogram 

# Content
This data was used to predict if patients would survive atleast one year after having a heart attack. All patients suffered from a heart attack at some point. 

An echocardiogram shows the size, structure, and movement of various parts of your heart. It uses high frequency sound waves (ultrasound) to make pictures of your heart. This test was used to measure the size of the heart, display fluid around the heart, conctracility around the heart,etc; which were all used to determine the survival of the patient. 


## Attributes
Number of attributes: 13
Target Column: Alive, numerical (0 dead, 1 alive)
Only the column "Group" will be ignored 


1. Survival- the number of months patient survived (has survived,if patient is still alive).

2. Still-alive - binary value; 0= dead after survival time, 1= still alive

3. age-at-heart-attack -- age in years when heart attack occurred

4. pericardial-effusion(fluid around the heart) - binary value; 0= no fluid, 1=fluid

5. fractional-shortening -- a measure of contracility around the heart;lower numbers are increasingly abnormal

6. epss -- E-point septal separation, another measure of contractility.  Larger numbers are increasingly abnormal.

7. lvdd -- left ventricular end-diastolic dimension.  This is a measure of the size of the heart at end-diastole.Large hearts tend to be sick hearts.

8. wall-motion-score -- a measure of how the segments of the left ventricle are moving

9. wall-motion-index -- equals wall-motion-score divided by number of segments seen.  Usually 12-13 segments are seen in an echocardiogram. 

10. mult -- a derivate var which can be ignored

11. name -- the name of the patient (I have replaced them with "name")

12. group -- meaningless, ignore it

13. alive-at-1 -- Boolean-valued. Derived from the first two attributes. 0 means patient was either dead after 1 year or had been followed for less than 1 year.  1 means patient was alive at 1 year.

In [1]:
%matplotlib inline

import matplotlib.pyplot as plt
import matplotlib
import sklearn
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import sys

print("Python version is "+sys.version)
print("Matplotlib version is "+matplotlib.__version__)
print("Scikit-Learn version is "+sklearn.__version__)
print("Numpy version is "+np.__version__)
print("Pandas version is "+pd.__version__)

Python version is 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
Matplotlib version is 2.2.2
Scikit-Learn version is 0.20.0
Numpy version is 1.14.3
Pandas version is 0.23.0


In [2]:
from sklearn.datasets import fetch_openml
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB

# Fetching Data 

In [3]:
df = pd.read_csv('echocardiogram.csv')
print(df.info())
print(df.head(7))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 133 entries, 0 to 132
Data columns (total 13 columns):
Survival                 132 non-null object
Still-alive              132 non-null object
age-at-heart-attack      132 non-null object
pericardial_effusion     132 non-null float64
fractional-shortening    132 non-null object
epss                     132 non-null object
Ivdd                     132 non-null object
wall-motion-score        132 non-null object
wall-motion-index        132 non-null object
mult                     132 non-null object
alive                    132 non-null object
name                     132 non-null object
group                    132 non-null object
dtypes: float64(1), object(12)
memory usage: 13.6+ KB
None
  Survival Still-alive age-at-heart-attack  pericardial_effusion  \
0       11           0                  71                   0.0   
1       19           0                  72                   0.0   
2       16           0                  55    

## Clean and PreProcessing Data

In [4]:
df = df.replace('?', pd.NaT)
df = df.dropna()
print(df.head(7))


  Survival Still-alive age-at-heart-attack  pericardial_effusion  \
0       11           0                  71                   0.0   
1       19           0                  72                   0.0   
2       16           0                  55                   0.0   
3       57           0                  60                   0.0   
4       19           1                  57                   0.0   
5       26           0                  68                   0.0   
6       13           0                  62                   0.0   

  fractional-shortening    epss   Ivdd wall-motion-score wall-motion-index  \
0                  0.26       9    4.6                14                 1   
1                  0.38       6    4.1                14               1.7   
2                  0.26       4   3.42                14                 1   
3                 0.253  12.062  4.603                16              1.45   
4                  0.16      22   5.75                18         

In [5]:
pericardial_effusionIndex = pd.Categorical(df.pericardial_effusion).categories       
print(pericardial_effusionIndex)  #get the categories
df.pericardial_effusion = pd.Categorical(df.pericardial_effusion).codes  #convert from category names to category code.
print(df.head(5))

Float64Index([0.0, 1.0], dtype='float64')
  Survival Still-alive age-at-heart-attack  pericardial_effusion  \
0       11           0                  71                     0   
1       19           0                  72                     0   
2       16           0                  55                     0   
3       57           0                  60                     0   
4       19           1                  57                     0   

  fractional-shortening    epss   Ivdd wall-motion-score wall-motion-index  \
0                  0.26       9    4.6                14                 1   
1                  0.38       6    4.1                14               1.7   
2                  0.26       4   3.42                14                 1   
3                 0.253  12.062  4.603                16              1.45   
4                  0.16      22   5.75                18              2.25   

    mult alive  name group  
0      1     0  name     1  
1  0.588     0  name  

Now we get the values 

In [6]:
data = df.values[:,:-2]

X = data[:,:-2] #Ignore the column -1 and -2. 
y = data[:,--3] #column -3 is the dependent variable


## Split the data set into Training Set and Testing Set

In [7]:
# rescale the data, use the traditional train/test split
y=y.astype('int')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
print(X_train.shape)
print(X_test.shape)

(45, 9)
(16, 9)


## Build and Train Machine Learning Models

In [8]:

mlp = MLPClassifier(hidden_layer_sizes=(9,6), max_iter=50000, alpha=1e-8,
                    solver='sgd', verbose=False, tol=1e-5, random_state=1,
                    learning_rate_init=.03, warm_start=True)

### Use fit method to train the network.

In [9]:
mlp.fit(X_train, y_train)
print("Training set score: %f" % mlp.score(X_train, y_train))
print("Testing set score: %f" % mlp.score(X_test, y_test))

Training set score: 0.800000
Testing set score: 0.875000
