# Using Deep Learning to Predict Who Died on The Titanic

Deep Learning certainly has a lot of interesting and useful applications in today's modern times, and it has only been made more accessible to developers due to the advent of libraries like Tensorflow, and the inbuilt support it offers for Keras, one of the most widely used libraries for DL.


In this tutorial, we shall use deep learning to predict which passengers died in the horrific Titanic wreckage of 1912. We shall be using tf.keras instead of standalone Keras with TensorFlow backend in order to illustrate the simplicity the integration of Keras and TensorFlow lends to aspiring ML enthusiasts.

*Download the data files [here](https://www.kaggle.com/c/3136/download-all), and put them into a folder called input in the root directory.*

**Step 1: Import the Dataframe**

Import all necessary libraries

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

# Install TensorFlow
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

#Using latest version of Tensorflow, and using inbuilt tf.keras APIs instead of standalone Keras

import tensorflow as tf
from tensorflow.keras import layers

print(tf.__version__)


In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Import the data, for both training_set, and testing_set

Split the dataset to *x* input data set(i.e. 'Sex', 'Cabin', 'Fares and etc), and our prediction (or *y*-axis) label, i.e. *Survived*

In [0]:
training_set = pd.read_csv('https://github.com/boronhub/progress-bar/tree/master/input/train.csv?raw=true')
testing_set = pd.read_csv('https://github.com/boronhub/progress-bar/tree/master/input/test.csv?raw=true')

"""We are dropping PassengerId, Name and Ticket fields, 
as they would not have impacted a passenger's chances of survival"""

x_train = training_set.drop(['PassengerId','Name','Ticket','Survived'], axis=1)
y_train = training_set['Survived']

x_test = testing_set.drop(['PassengerId','Name','Ticket'], axis=1)

In [0]:
x_train['Age'] = x_train['Age'].fillna(x_train['Age'].mean())
x_test['Age'] = x_test['Age'].fillna(x_test['Age'].mean())

**Step 2: Assigning numerical values to the categories we shall be using**

In [0]:
def simplify_ages(df):
    bins = (-1, 0, 5, 12, 18, 25, 35, 60, 120)
    group_names = ['Unknown', 'Baby', 'Child', 'Teenager', 'Student', 'Young Adult', 'Adult', 'Senior']
    categories = pd.cut(df['Age'], bins, labels=group_names)
    df['Age'] = categories.cat.codes 
    return df

def simplify_cabins(df):
    df['Cabin'] = df['Cabin'].fillna('N')
    df['Cabin'] = df['Cabin'].apply(lambda x: x[0])
    df['Cabin'] =  pd.Categorical(df['Cabin'])
    df['Cabin'] = df['Cabin'].cat.codes 
    return df

def simplify_fares(df):
    df['Fare'] = df.Fare.fillna(-0.5)
    bins = (-1, 0, 8, 15, 31, 1000)
    group_names = ['Unknown', '1_quartile', '2_quartile', '3_quartile', '4_quartile']
    categories = pd.cut(df['Fare'], bins, labels=group_names)
    df['Fare'] = categories.cat.codes 
    return df

def simplify_sex(df):
    df['Sex'] = pd.Categorical(df['Sex'])
    df['Sex'] = df['Sex'].cat.codes 
    return df

def simplify_embarked(df):
    df['Embarked'] = pd.Categorical(df['Embarked'])
    df['Embarked'] = df['Embarked'].cat.codes + 1
    return df

def transform_features(df):
    df = simplify_ages(df)
    df = simplify_cabins(df)
    df = simplify_fares(df)
    df = simplify_sex(df)
    df = simplify_embarked(df)
    return df

Using transform_features on both the x-value splits, we can see our dataset in it's transformed form in order to understand it better as we head deeper into the tutorial.

In [0]:
transform_features(x_train)
transform_features(x_test)

**Step 3: Build a Sequential model using tf.keras**

Sequential models are linear stacks of layers with various parameters that dictate how it will be trained on the predefined dataset.

Since we are on Colab, we don't have to worry about high GPU usages, meaning we can have lots of output units, meaning our predictions will be more accurate! So, we will have 4 Layers, first 3 layers with 64 units with "Relu" activation function and the last layer with sigmoid activation function.

We are using Binary Crossentropy loss function as we are using '0' to represent passengers did not survive, and '1' to represent survivors.

In [0]:
model = tf.keras.Sequential()
model.add(layers.Dense(64, activation='relu', 
                       input_shape=(8,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
             loss=tf.keras.losses.BinaryCrossentropy(),
             metrics=['accuracy'])

**Step 4: Split the training data set with validation set, and run the model**

In [0]:
y_train = np.asarray(y_train)
x_train = np.asarray(x_train)
x_test = np.asarray(x_test)

validation_size = 200

x_val = x_train[:validation_size]
partial_x_train = x_train[validation_size:]

y_val = y_train[:validation_size]
partial_y_train = y_train[validation_size:]

In [0]:
history = model.fit(partial_x_train, partial_y_train, epochs=30, validation_data=(x_val, y_val))

**Step 5: Plot accuracy and loss for both training and validation set, to check the model optimization level**

In [0]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

In [0]:
plt.plot(acc, label = "Training Accuracy")
plt.plot(val_acc, label = "Validation Accuracy")

plt.legend()
plt.xlabel('Epochs')
plt.ylabel('Acc')
plt.show()


In [0]:
plt.clf()

In [0]:
plt.plot(loss, label = "Training Loss")
plt.plot(val_loss, label = "Validation Loss")

plt.legend()
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()

Keep experimenting with the epoch and model values to find out what best works for you.
Use different labels like Name, Sex, etc; to further explore who made it out alive.

**Step 6: Predict the actual test data**

Using PassengerID to easily identify the passenger in question. 

In [0]:
predictions = model.predict_classes(x_test)
ids = testing_set['PassengerId'].copy()
new_output = ids.to_frame()
new_output["Survived"]=predictions
new_output.head(10)

In [0]:
tensorboard dev upload --logdir logs