<a href="https://colab.research.google.com/github/tdineth/TitanicSurvivabilityPredictor/blob/main/TitanicSurvivabilityPredictor_v2_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model Overview: Titanic Survival Prediction Model


This project uses the data from a project created by Tensorflow — an improved version of an existing model architecture. The enhancements and optimizations for this version were implemented by **Theekshana Dineth** to improve performance and accuracy on the given dataset.


In [None]:
pip install scikit-learn



In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib

import tensorflow.compat.v2.feature_column as fc

import tensorflow as tf

In [None]:
# Load dataset.
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv') # training data
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv') # testing data
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')

**Example Dataset from the Training Data**

The table below presents a representative sample extracted from the training dataset used in this project. This example illustrates the structure and format of the data that was utilized during the model training process.


In [None]:
dftrain.head()

Unnamed: 0,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,male,22.0,1,0,7.25,Third,unknown,Southampton,n
1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
2,female,26.0,0,0,7.925,Third,unknown,Southampton,y
3,female,35.0,1,0,53.1,First,C,Southampton,n
4,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y


**Number of Training Samples**

The total number of samples used for training in this project is 627. This dataset was carefully prepared to ensure both quality and relevance for effective model training.


In [None]:
print(f'Number of training data = {dftrain.shape[0]}')

Number of training data = 627


**Number of Testing Data**

The total number of testing samples used for this project is 264. This testing set was employed to evaluate the model's performance and generalization capability on unseen data.


In [None]:
print(f'Number of testing data = {dfeval.shape[0]}')

Number of testing data = 264


In [None]:
CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
                       'embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']

feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
  vocabulary = dftrain[feature_name].unique()  # gets a list of all unique values from given feature column
  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))

for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

print(feature_columns)

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


[VocabularyListCategoricalColumn(key='sex', vocabulary_list=('male', 'female'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='n_siblings_spouses', vocabulary_list=(1, 0, 3, 4, 2, 5, 8), dtype=tf.int64, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='parch', vocabulary_list=(0, 1, 2, 5, 3, 4), dtype=tf.int64, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='class', vocabulary_list=('Third', 'First', 'Second'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='deck', vocabulary_list=('unknown', 'C', 'G', 'A', 'B', 'D', 'F', 'E'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='embark_town', vocabulary_list=('Southampton', 'Cherbourg', 'Queenstown', 'unknown'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='alone', vocabulary_list=('n', 'y'), dtype=tf.string, def

In [None]:
def make_input_fn(data_df, label_df, num_epochs=30, shuffle=True, batch_size=32):
  def input_function():  # inner function, this will be returned
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))  # create tf.data.Dataset object with data and its label
    if shuffle:
      ds = ds.shuffle(1000)  # randomize order of data
    ds = ds.batch(batch_size).repeat(num_epochs)  # split dataset into batches of 32 and repeat process for number of epochs
    return ds  # return a batch of the dataset
  return input_function  # return a function object for use

train_input_fn = make_input_fn(dftrain, y_train)  # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)


In [None]:
#DO NOT RUN THIS CODE SHELL TWICE
!pip install tensorflow==2.15.0



In [None]:
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
# We create a linear estimtor by passing the feature columns we created earlier

Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Use tf.keras instead.


**Model Accuracy Evaluation**

You can check the accuracy and performance of the AI model in this section. The evaluation metrics are based on the testing dataset to ensure an unbiased assessment of the model's generalization ability.


In [None]:
linear_est.train(train_input_fn)  # train
result = linear_est.evaluate(eval_input_fn)  # get model metrics/stats by testing on tetsing data
linear_est.train(train_input_fn)  # train
result = linear_est.evaluate(eval_input_fn)  # get model metrics/stats by testing on tetsing data

clear_output()  # clears consoke output
print('The final result of this AI engine is in a accuracy of %.2f %%'%((result["accuracy"])*100))  # the result variable is simply a dict of stats about our model

The final result of this AI engine is in a accuracy of 78.41 %


**Result Inspection**

You can change the record number inside the square brackets to view specific samples. For each selected record, the actual result and the model’s predicted survival probability will be displayed as a percentage, allowing for easy comparison and evaluation.


In [22]:
result=list(linear_est.predict(eval_input_fn))
print(dfeval.loc[39])
print(f'The real result is (survived=1,Not survived=0)={y_eval.loc[39]}')
print(f'The prediction percentage for survival is %.2f %%'%((result[18]["probabilities"][1])*100))

sex                          male
age                          30.0
n_siblings_spouses              0
parch                           0
fare                         8.05
class                       Third
deck                      unknown
embark_town           Southampton
alone                           y
Name: 39, dtype: object
The real result is (survived=1,Not survived=0)=0
The prediction percentage for survival is 15.36 %


📋 **Titanic Survival Prediction**

The below program takes passenger details such as:
- Sex
- Age
- Number of siblings/spouses aboard
- Number of parents/children aboard
- Fare
- Passenger class
- Deck information
- Embarkation town
- Whether the passenger was alone or not

and uses a trained machine learning model to predict the probability of survival on the Titanic! 🚢

💡 After you enter the passenger details, the model will output:
✅ The predicted **probability percentage** of survival.  
✅ The predicted **class label** (Survived = 1 or Not Survived = 0).

Just run the next cell, input the passenger's info, and let the model show you the chances of survival!


In [None]:
# Define CATEGORICAL_COLUMNS and NUMERIC_COLUMNS
CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck', 'embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']

# Function to collect user input
def collect_user_input():
    user_data = {}
    print("Please provide values for the following features:")

    for feature_name in CATEGORICAL_COLUMNS:
        if feature_name in ['n_siblings_spouses', 'parch']:  # These are numeric categorical features
            while True:
                try:
                    val = int(input(f"Enter value for '{feature_name}' (integer categorical): "))
                    user_data[feature_name] = [val]
                    break
                except ValueError:
                    print("Invalid input. Please enter an integer value.")
        else:
            user_data[feature_name] = [input(f"Enter value for '{feature_name}' (categorical): ")]

    for feature_name in NUMERIC_COLUMNS:
        while True:
            try:
                val = float(input(f"Enter value for '{feature_name}' (numeric): "))
                user_data[feature_name] = [val]
                break
            except ValueError:
                print("Invalid input. Please enter a numeric value.")

    return pd.DataFrame.from_dict(user_data)

# Create a prediction-specific input function (no label_df needed)
def make_prediction_input_fn(data_df, batch_size=1):
    def input_function():
        ds = tf.data.Dataset.from_tensor_slices(dict(data_df))  # Only features are needed for prediction
        ds = ds.batch(batch_size)  # Batch size for prediction
        return ds
    return input_function
while True:
# Main workflow
  if __name__ == "__main__":
    # Collect user input
    user_input_df = collect_user_input()

    # Create the input function for the collected user data
    predict_input_fn = make_prediction_input_fn(user_input_df, batch_size=1)

    # Make predictions
    result = list(linear_est.predict(input_fn=predict_input_fn))

    print(f'The prediction percentage for survival is %.2f %%'%((result[0]['probabilities'][1])*100))
    print(f"Predicted class: {'Survived' if result[0]['class_ids'][0] == 1 else 'Did Not Survive'}")
    print('\n\n')

Please provide values for the following features:


**Here is some example input and expected classes you can try above**
\
\begin{array}{|l|l|l|l|}
\hline
\textbf{Feature} & \textbf{Passenger 1} & \textbf{Passenger 2} & \textbf{Passenger 3} \\
\hline
\text{Sex} & \text{female} & \text{female} & \text{male} \\
\text{Age} & 25.0 & 43.0 & 30.0 \\
\text{Number of Siblings/Spouses} & 2 & 1 & 0 \\
\text{Parental/Children aboard (parch)} & 0 & 1 & 0 \\
\text{Fare} & 7.05 & 15.85 & 8.05 \\
\text{Class} & \text{First} & \text{Second} & \text{Third} \\
\text{Deck} & \text{C} & \text{D} & \text{unknown} \\
\text{Embark Town} & \text{Cherbourg} & \text{Queenstown} & \text{Southampton} \\
\text{Alone} & \text{n} & \text{n} & \text{y} \\
\hline
\textbf{Actual Result} & \textbf{Survived = 1} & \textbf{Survived = 1} & \textbf{Not Survived = 0} \\
\hline
\end{array}

