# titanicdeath.com

this is the writeup for <a href="http://titanicdeath.com">titanicdeath.com</a>, besides having a turly fantastic name, this little website i created tells you whether or not you would be likely to die on the titanic provided some small amount of personal information.

the first question you are probably asking is: <b>why did i die?</b> (or) <b>why did i live?</b>

the biggest determining factor in whether you lived or died is your gender. <b>females were much more likely to suvive the sinking of the titanic than men.</b> if you have ever heard the phrase 'women and children first' you will have an intuitive understanding of why this is. still, don't take my word for it, let's examine some evidence.

In [9]:
#let us load the logistic regression model used
import pandas as pd
from sklearn.externals import joblib
logreg = joblib.load('../titanicdeath/static/logreg.pkl')

In [59]:
# now let us have a look at the inputs
# and how strongly the predict suvival
coefficients = pd.DataFrame(logreg.coef_[0])
coefficients.columns = ['coefficents']
coefficients['inputs'] = pd.Series(['class', 'sex', 'age', 'fare', 'embarked', 'title', 'is_alone', 'age*class'])
coefficients.sort_values(by='coefficents')

Unnamed: 0,coefficents,inputs
0,-0.749007,class
7,-0.3112,age*class
3,-0.08515,fare
6,0.12914,is_alone
4,0.261762,embarked
2,0.287163,age
5,0.398234,title
1,2.201527,sex


the 'sex' input has the biggest effect on suvival

in our data i assigned women a value of 1 and men a value of 0. the positive coefficient of 2.2 means that <b>as we go more in a positive direction (towards 1/female gender) we are more likely to survive</b> and as we go more in a negative direction (towards 0/male gender) we are less likely to survive.

<br/>
the 'class' input has the next biggest effect on survival

in our data class was either a 1 (first), 2 (second), or 3 (third class). the negative value of the coefficient means that the more any one passenger's class goes in a positive direction (towards 3/third class) they become less likely to survive. <b>if a user's class input value is more towards the negative (towards 1/first class) that passenger becomes more likely to survive</b>. this makes some sense, often times first class passengers receive benefits, one of the benefits on a sinking ship may have been easier access to lifeboats.

some of the inputs were too strange to ask. for example as an inhabitant of the modern day you do not possess a 'port of embarkation.' to resolve this i simply pulled a value at random from the dataset for 'port of embarkation.'


i could have made the sampling process better by restricitng it to passengers whose data points matched those given by visitors to titanicdeath.com, so for example if a user marked that they usually travel first class, i could have sampled ports of embarkation only from passengers that travelled fist class on the titanic. i chose not to do this to add some element of randomness to the site. i think of everything i do as an experimental product; most products require some variable reward (this is also a little bit more fun).

In [60]:
'''
this bit of code takes in a passenger's input values
and tells us what probability that user has of living
and dying. the number shows our probability of survival
'''

age = 2
fare = 0
embarkation = 2
title = 1
is_alone = 1
age_class = 6

passenger_class = 3 
sex = 0 

passenger_input = pd.DataFrame([[passenger_class, sex, age, fare, embarkation, title, is_alone, age_class]])
pred = logreg.predict_proba(passenger_input)
pred[0][1]

0.078280156772079126

In [61]:
'''
if we change the sex from male to female 0->1
we see a huge improvement in suvival rate.
'''
passenger_class = 3 
sex = 1

passenger_input = pd.DataFrame([[passenger_class, sex, age, fare, embarkation, title, is_alone, age_class]])
pred = logreg.predict_proba(passenger_input)
pred[0][1]

0.43427750040904095

In [62]:
'''
if we change the class from 3->1
we improve odds of suvival even further
'''
passenger_class = 1 
sex = 1

passenger_input = pd.DataFrame([[passenger_class, sex, age, fare, embarkation, title, is_alone, age_class]])
pred = logreg.predict_proba(passenger_input)
pred[0][1]

0.77444683956853377

if you would like to explore this data yourself there is a really nice tutorial here:

https://www.kaggle.com/c/titanic

https://www.kaggle.com/startupsci/titanic/titanic-data-science-solutions

if you are interested in machine learning and a very good overview of how an algorithm like logistic regression works i highly recommend you check out the first 3 lectures of andrew ng's coursera course:

https://www.coursera.org/learn/machine-learning