In [None]:
## Import images

## Table of contents

## Abstract

**Back ground**
Cardiotocograph(CTG) is the most widely used techniques in developed countries to monitor fetal heart rate and uterine contractions. 
The information of the CTG helps medical practitioners to evaluate the fetal's wellbeing (healthy or pathological) prevent child and maternal mortality.

**Target**
In this project, I will create a model to classify the health status of the fetal in order that the medical practitioners  take immediate actions if necessary to save the child and the mother's lives.

**Methods**
1. **Data visualization** : find out interesting patterns of fetal health status. countplot, boxplot, lmplot and replot is used in this project.

2. **Modeling**: Apply Logistic regression, KNN Classifier, Decision Tree and Random forest to compare the outcome of the models



ENJOY!





## import libraries


In [1]:
import pandas as pd
import numpy as np
import warnings
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import chi2_contingency
import scipy.stats as stats
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import classification_report
import pickle


pd.set_option('display.max_columns', None)
warnings.filterwarnings('ignore')

## Load data

In [2]:
data = pd.read_csv('/Users/yuriawano/fetal_health_classification/data/fetal_health.csv')
data.head()

Unnamed: 0,baseline value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,mean_value_of_long_term_variability,histogram_width,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
0,120.0,0.0,0.0,0.0,0.0,0.0,0.0,73.0,0.5,43.0,2.4,64.0,62.0,126.0,2.0,0.0,120.0,137.0,121.0,73.0,1.0,2.0
1,132.0,0.006,0.0,0.006,0.003,0.0,0.0,17.0,2.1,0.0,10.4,130.0,68.0,198.0,6.0,1.0,141.0,136.0,140.0,12.0,0.0,1.0
2,133.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.1,0.0,13.4,130.0,68.0,198.0,5.0,1.0,141.0,135.0,138.0,13.0,0.0,1.0
3,134.0,0.003,0.0,0.008,0.003,0.0,0.0,16.0,2.4,0.0,23.0,117.0,53.0,170.0,11.0,0.0,137.0,134.0,137.0,13.0,1.0,1.0
4,132.0,0.007,0.0,0.008,0.0,0.0,0.0,16.0,2.4,0.0,19.9,117.0,53.0,170.0,9.0,0.0,137.0,136.0,138.0,11.0,1.0,1.0


In [3]:
data.shape

(2126, 22)

Dataset information:
+ This dataset consists of fetal heart rate (FHR) and uterine contraction (UC) features on CTG.

+ This data set consists 2126 test results including 22 features extracted from CTG exam. 

+ Each results were then classified by expert obstetrician into 3 classes:

1. Normal
2. Suspect
3. Pathological 


## Understand the features

### definitions of features

| features                                               | definition                                                            |
|--------------------------------------------------------|-----------------------------------------------------------------------|
| baseline_value                                         | beats per minute                                                      |
| accelerations                                          | number of accelerations per second                                    |
| fetal_movement                                         | number of fetal movement per second                                   |
| uterine_contractions                                   | number of uterine contractions per second                             |
| light_decelerations                                    | number of light decelerations per second                              |
| severe _decelerations                                  | number of severe decelerations per second                             |
| prolongued_decelerations                               | number of prolonged decelerations per second                          |
| abnormal_short_term_variability                        | percentage of time with abnormal short term variability               |
| mean_value_of_short_term_variability                   | mean value of short term variability                                  |
| percentage_of_time_with_abnormal_long_term_variability | percentage of time with abnormal long term variability                |
| mean_value_of_long_term_variability                    | mean value of long term variability                                   |
| histogram_width                                        | width of fetal heart rate (FHR) histogram                             |
| histogram_min                                          | minimum (low frequency) of FHR histogram                              |
| histogram_max                                          | maximum (high frequency) of FHR histogram                             |
| histogram_number_of_peaks                              | number of histogram peaks                                             |
| histogram_number_of_zeroes                             | number of histogram zeros                                             |
| histogram_mode                                         | histogram mode                                                        |
| histogram_mean                                         | histogram mean                                                        |
| histogram_median                                       | histogram median                                                      |
| histogram_variance                                     | histogram variance                                                    |
| histogram_tendency                                     | histogram tendency                                                    |
| fetal_health                                           | health status of the fetal: 1 (Normal), 2 (Suspect), 3 (Pathological) |

### Other terminology

| terms                  | definition                                     |
|------------------------|------------------------------------------------|
| decelerations          | temporary decreases in the fetal heart rate    |
| short term variability | the beat-to-beat variation in fetal heart rate |

Definitions of medical terms
+ FHR: Fetal heart rate

+ Accelerations: Accelerations are short-term rises in the heart rate of at least 15 beats per minute, lasting at least 15 seconds.

+ Variability : Fluctuations in baseline that are irregular in amplitude and frequency.

+ Short term variability: from one moment to the next, the fetal heart speeds up slightly and then slows down slightly, usually with the range of 3 - 5 bpm (beats per min) from the baseline value.

+ Long term variability:

+ prolongued decelerations: A decrease in FHR below the baseline of 15 bpm or more, lasting at least 2 minutes but <10 minutes from onset to return to baseline. A prolonged deceleration of 10 minutes or more is considered a change in baseline.


## Data cleaning and wrangling

### Clean the columns

In [4]:
data.columns

Index(['baseline value', 'accelerations', 'fetal_movement',
       'uterine_contractions', 'light_decelerations', 'severe_decelerations',
       'prolongued_decelerations', 'abnormal_short_term_variability',
       'mean_value_of_short_term_variability',
       'percentage_of_time_with_abnormal_long_term_variability',
       'mean_value_of_long_term_variability', 'histogram_width',
       'histogram_min', 'histogram_max', 'histogram_number_of_peaks',
       'histogram_number_of_zeroes', 'histogram_mode', 'histogram_mean',
       'histogram_median', 'histogram_variance', 'histogram_tendency',
       'fetal_health'],
      dtype='object')

In [5]:
data.columns = [columns.replace(' ','_') for columns in data.columns]
data.columns


Index(['baseline_value', 'accelerations', 'fetal_movement',
       'uterine_contractions', 'light_decelerations', 'severe_decelerations',
       'prolongued_decelerations', 'abnormal_short_term_variability',
       'mean_value_of_short_term_variability',
       'percentage_of_time_with_abnormal_long_term_variability',
       'mean_value_of_long_term_variability', 'histogram_width',
       'histogram_min', 'histogram_max', 'histogram_number_of_peaks',
       'histogram_number_of_zeroes', 'histogram_mode', 'histogram_mean',
       'histogram_median', 'histogram_variance', 'histogram_tendency',
       'fetal_health'],
      dtype='object')

### Check data types

In [6]:
data.dtypes

baseline_value                                            float64
accelerations                                             float64
fetal_movement                                            float64
uterine_contractions                                      float64
light_decelerations                                       float64
severe_decelerations                                      float64
prolongued_decelerations                                  float64
abnormal_short_term_variability                           float64
mean_value_of_short_term_variability                      float64
percentage_of_time_with_abnormal_long_term_variability    float64
mean_value_of_long_term_variability                       float64
histogram_width                                           float64
histogram_min                                             float64
histogram_max                                             float64
histogram_number_of_peaks                                 float64
histogram_

### Stats of data

In [7]:
data.describe()

Unnamed: 0,baseline_value,accelerations,fetal_movement,uterine_contractions,light_decelerations,severe_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,mean_value_of_long_term_variability,histogram_width,histogram_min,histogram_max,histogram_number_of_peaks,histogram_number_of_zeroes,histogram_mode,histogram_mean,histogram_median,histogram_variance,histogram_tendency,fetal_health
count,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0,2126.0
mean,133.303857,0.003178,0.009481,0.004366,0.001889,3e-06,0.000159,46.990122,1.332785,9.84666,8.187629,70.445908,93.579492,164.0254,4.068203,0.323612,137.452023,134.610536,138.09031,18.80809,0.32032,1.304327
std,9.840844,0.003866,0.046666,0.002946,0.00296,5.7e-05,0.00059,17.192814,0.883241,18.39688,5.628247,38.955693,29.560212,17.944183,2.949386,0.706059,16.381289,15.593596,14.466589,28.977636,0.610829,0.614377
min,106.0,0.0,0.0,0.0,0.0,0.0,0.0,12.0,0.2,0.0,0.0,3.0,50.0,122.0,0.0,0.0,60.0,73.0,77.0,0.0,-1.0,1.0
25%,126.0,0.0,0.0,0.002,0.0,0.0,0.0,32.0,0.7,0.0,4.6,37.0,67.0,152.0,2.0,0.0,129.0,125.0,129.0,2.0,0.0,1.0
50%,133.0,0.002,0.0,0.004,0.0,0.0,0.0,49.0,1.2,0.0,7.4,67.5,93.0,162.0,3.0,0.0,139.0,136.0,139.0,7.0,0.0,1.0
75%,140.0,0.006,0.003,0.007,0.003,0.0,0.0,61.0,1.7,11.0,10.8,100.0,120.0,174.0,6.0,0.0,148.0,145.0,148.0,24.0,1.0,1.0
max,160.0,0.019,0.481,0.015,0.015,0.001,0.005,87.0,7.0,91.0,50.7,180.0,159.0,238.0,18.0,10.0,187.0,182.0,186.0,269.0,1.0,3.0


### Check target variable

In [8]:
# check the imbalance of the target variable

data['fetal_health'].value_counts()

1.0    1655
2.0     295
3.0     176
Name: fetal_health, dtype: int64

### Drop duplicates

In [9]:


data = data.drop_duplicates()
data.shape

(2113, 22)

### Check null values

In [10]:
# checking the sum of null values in each rows. No null values were found
data.isna().sum()

baseline_value                                            0
accelerations                                             0
fetal_movement                                            0
uterine_contractions                                      0
light_decelerations                                       0
severe_decelerations                                      0
prolongued_decelerations                                  0
abnormal_short_term_variability                           0
mean_value_of_short_term_variability                      0
percentage_of_time_with_abnormal_long_term_variability    0
mean_value_of_long_term_variability                       0
histogram_width                                           0
histogram_min                                             0
histogram_max                                             0
histogram_number_of_peaks                                 0
histogram_number_of_zeroes                                0
histogram_mode                          

## Save it into a file for data visualization

In [11]:
data.to_csv('fetal_health_visualization.csv', index = False)

### Change values in fetal_health and convert into categorical

In [12]:
data["fetal_health"].replace({1.0: "A", 2.0: "B", 3.0: "C"}, inplace=True)

In [13]:
data["fetal_health"] = data["fetal_health"].astype(object) 

In [14]:
data.dtypes

baseline_value                                            float64
accelerations                                             float64
fetal_movement                                            float64
uterine_contractions                                      float64
light_decelerations                                       float64
severe_decelerations                                      float64
prolongued_decelerations                                  float64
abnormal_short_term_variability                           float64
mean_value_of_short_term_variability                      float64
percentage_of_time_with_abnormal_long_term_variability    float64
mean_value_of_long_term_variability                       float64
histogram_width                                           float64
histogram_min                                             float64
histogram_max                                             float64
histogram_number_of_peaks                                 float64
histogram_

### Save the data frame to csv file

In [15]:
data.to_csv('fetal_health_data.csv', index = False)