<h1>A Neural Model Approach On Titanic Dataset</h1>

<h3>Dependencies</h3>

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import sys
sys.path

['',
 '/Library/Frameworks/Python.framework/Versions/3.6/lib/python36.zip',
 '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6',
 '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/lib-dynload',
 '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages',
 '/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/extensions',
 '/Users/xristaggelosvasilopoulos/.ipython']

<h3>Load The Dataset</h3>

In [2]:
df = pd.read_csv('dataset/train.csv')
test = pd.read_csv('dataset/test.csv')

<h3>Total Training Data Instances</h3>

In [3]:
df.drop(['Name','PassengerId'],1, inplace=True)
test.drop(['Name','PassengerId'],1, inplace=True)

<h1>Convert the categorical Values Into Numerical</h1>

<h3>Sex to Numerical Values</h3>

In [4]:
df['Sex'].replace(to_replace='male',value=0,inplace=True)
df['Sex'].replace(to_replace='female',value=1,inplace=True)

test['Sex'].replace(to_replace='male',value=0,inplace=True)
test['Sex'].replace(to_replace='female',value=1,inplace=True)

<h3>Port of Embarcation to Numeric Value</h3>

In [5]:
port_of_embarcation = list(set(df['Embarked']))+list(set(test['Embarked']))
Cabins = list(set(df['Cabin']))+list(set(test['Cabin']))
Ticket = list(set(df['Ticket']))+list(set(test['Ticket']))

In [6]:
for i in range(0,len(port_of_embarcation)):
    
    df['Embarked'].replace(to_replace=port_of_embarcation[i],value=i+1,inplace=True)
    test['Embarked'].replace(to_replace=port_of_embarcation[i],value=i+1,inplace=True)
    
for i in range(0,len(Cabins)):
    
    df['Cabin'].replace(to_replace=Cabins[i],value=i+1,inplace=True)
    test['Cabin'].replace(to_replace=Cabins[i],value=i+1,inplace=True)
    
for i in range(0,len(Ticket)):
    
    df['Ticket'].replace(to_replace=Ticket[i],value=i+1,inplace=True)
    test['Ticket'].replace(to_replace=Ticket[i],value=i+1,inplace=True)

df.fillna(value=0,inplace=True)
test.fillna(value=0,inplace=True)

<h1>Normalize the Data</h1>

In [7]:
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,0,3,0,22.0,1,0,162,7.25,1,3.0
1,1,1,1,38.0,1,0,677,71.2833,112,2.0
2,1,3,1,26.0,0,0,70,7.925,1,3.0
3,1,1,1,35.0,1,0,99,53.1,52,3.0
4,0,3,0,35.0,0,0,583,8.05,1,3.0


In [8]:
test.head()

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,3,0,34.5,0,0,1037,7.8292,1,1
1,3,1,47.0,1,0,738,7.0,1,3
2,2,0,62.0,0,0,963,9.6875,1,1
3,3,0,27.0,0,0,932,8.6625,1,3
4,3,1,22.0,1,1,353,12.2875,1,3


<h3>Scaler Initialization</h3>

In [9]:
sc = MinMaxScaler()

<h3>Split the dataset in train and validation</h3>

In [10]:
train = df.head(n=800)
val = df.tail(n=91)

In [11]:
train_data = np.array(train)
train_data = sc.fit_transform(train_data)

val = np.array(val)
val = sc.fit_transform(val)

test = np.array(test)
test = sc.fit_transform(test)

<h3>Split the data and labels</h3>

In [12]:
x_train = train_data[:,1:]
y_train = train_data[:,0]

x_val = val[:,1:]
y_val = val[:,0]

<h1>Neural Network Model</h1>

<h3>Dependencies</h3>

In [13]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.callbacks import ModelCheckpoint
from keras.callbacks import ReduceLROnPlateau

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


<h3>Neural Network Structure</h3>

<h3>Deep Neural Network Initialization</h3>

In [14]:
classifier = Sequential()

<h3>Input Layer of the Deep Neural Netowrk</h3>

In [15]:
classifier.add(Dense(units = 80, kernel_initializer = 'uniform', activation = 'relu', input_dim = 9,bias_initializer='zeros'))
classifier.add(Dropout(0.5))

<h3>Hidden Layers of the Deep Neural Network with Dropout</h3>

In [16]:
classifier.add(Dense(units = 80, kernel_initializer = 'uniform', activation = 'relu',bias_initializer='zeros'))
classifier.add(Dropout(0.8))
classifier.add(Dense(units = 80, kernel_initializer = 'uniform', activation = 'relu',bias_initializer='zeros'))
classifier.add(Dropout(0.8))
classifier.add(Dense(units = 80, kernel_initializer = 'uniform', activation = 'relu',bias_initializer='zeros'))
classifier.add(Dropout(0.8))

<h3>Output Layer of the Deep Neural Network</h3>

In [17]:
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid',bias_initializer='zeros'))

<h3>Optimizers</h3>

In [18]:
#adam = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
nadam = keras.optimizers.Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, schedule_decay=0.002)

<h3>Deep Neural Network Model Compile</h3>

In [19]:
classifier.compile(optimizer = nadam, loss = 'binary_crossentropy', metrics = ['accuracy'])

<h3>Model Checkpoint Callback in order to save the best model based on val_acc</h3>

In [20]:
filepath = 'weights_best.hdf5'
checkpoint = ModelCheckpoint(filepath=filepath, monitor='val_acc', verbose=1, save_best_only=True)

<h3>Fitted the model to data and had a result of 84% on val_acc</h3>

In [21]:
classifier.fit(x_train,y_train, epochs = 70, validation_data=(x_val,y_val), callbacks=[checkpoint],verbose=0)


Epoch 00001: val_acc improved from -inf to 0.62637, saving model to weights_best.hdf5

Epoch 00002: val_acc did not improve

Epoch 00003: val_acc did not improve

Epoch 00004: val_acc improved from 0.62637 to 0.75824, saving model to weights_best.hdf5

Epoch 00005: val_acc improved from 0.75824 to 0.78022, saving model to weights_best.hdf5

Epoch 00006: val_acc did not improve

Epoch 00007: val_acc did not improve

Epoch 00008: val_acc improved from 0.78022 to 0.79121, saving model to weights_best.hdf5

Epoch 00009: val_acc did not improve

Epoch 00010: val_acc improved from 0.79121 to 0.82418, saving model to weights_best.hdf5

Epoch 00011: val_acc did not improve

Epoch 00012: val_acc improved from 0.82418 to 0.84615, saving model to weights_best.hdf5

Epoch 00013: val_acc did not improve

Epoch 00014: val_acc did not improve

Epoch 00015: val_acc did not improve

Epoch 00016: val_acc did not improve

Epoch 00017: val_acc did not improve

Epoch 00018: val_acc did not improve

Epoch 

<keras.callbacks.History at 0x1203def28>

<h3>Use the Model on the Kaggle data test set</h3>

<h3>If we want to load the Neural Model</h3>

In [22]:
from keras.models import load_model

best_model = load_model('weights_best.hdf5')

In [23]:
prediction = best_model.predict(test)

<h3>Fill the predictions with 0 and 1</h3>

In [24]:
for i in range(0,len(prediction)):
    
    if prediction[i] >= 0.5:
        prediction[i] = 1
        
    else:
        prediction[i] = 0

<h3>Create the Final Format</h3>

In [25]:
submit = np.array(prediction)
submission = np.zeros((418,2),dtype=np.int64)

<h3>Fill the array with the predicted values</h3>

In [26]:
for i in range(0,418):
    
    submission[i][1] = submit[i]

<h3>Fill the PassengerId values</h3>

In [27]:
for i in range(0,418):
    
    submission[i][0] = 892+i

<h3>Save the data in order to perform the submission to Kaggle</h3>

In [28]:
np.savetxt("submission.csv", submission, delimiter = ',', fmt = '%i', header='PassengerId,Survived', comments='')