In [1]:
import pandas as pd

In [2]:
titanic = pd.read_csv('titanic.csv')

In [3]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.25,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.925,S,
3,1,1,female,35.0,1,0,53.1,S,C
4,0,3,male,35.0,0,0,8.05,S,


Below, we use `.groupby()` and `.mean()` to calculate the surivival rate for six distinct groups of passengers

In [4]:
titanic.groupby(['sex', 'pclass']).survived.mean()

sex     pclass
female  1         0.968085
        2         0.921053
        3         0.500000
male    1         0.368852
        2         0.157407
        3         0.135447
Name: survived, dtype: float64

And here, we use `.transform()` to turn that into a Series where the calculated mean is applied to each index in the dataframe

In [5]:
titanic.groupby(['sex', 'pclass']).survived.transform('mean')

0      0.135447
1      0.968085
2      0.500000
3      0.968085
4      0.135447
         ...   
886    0.157407
887    0.968085
888    0.500000
889    0.368852
890    0.135447
Name: survived, Length: 891, dtype: float64

And then we can use that data to create a new column in our original dataframe

In [6]:
titanic['group_surv_rate'] = titanic.groupby(['sex', 'pclass']).survived.transform('mean')

In [7]:
titanic.head(10)

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck,group_surv_rate
0,0,3,male,22.0,1,0,7.25,S,,0.135447
1,1,1,female,38.0,1,0,71.2833,C,C,0.968085
2,1,3,female,26.0,0,0,7.925,S,,0.5
3,1,1,female,35.0,1,0,53.1,S,C,0.968085
4,0,3,male,35.0,0,0,8.05,S,,0.135447
5,0,3,male,,0,0,8.4583,Q,,0.135447
6,0,1,male,54.0,0,0,51.8625,S,E,0.368852
7,0,3,male,2.0,3,1,21.075,S,,0.135447
8,1,3,female,27.0,0,2,11.1333,S,,0.5
9,1,2,female,14.0,1,0,30.0708,C,,0.921053


And then, below, we use that survival rate to calculate a value that can help us identify outliers (those who survived when most of their group did not or vice-versa)

In [8]:
titanic['outliers'] = abs(titanic.survived - titanic.group_surv_rate)

In [9]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck,group_surv_rate,outliers
0,0,3,male,22.0,1,0,7.25,S,,0.135447,0.135447
1,1,1,female,38.0,1,0,71.2833,C,C,0.968085,0.031915
2,1,3,female,26.0,0,0,7.925,S,,0.5,0.5
3,1,1,female,35.0,1,0,53.1,S,C,0.968085,0.031915
4,0,3,male,35.0,0,0,8.05,S,,0.135447,0.135447


In [10]:
titanic[titanic.outliers > .85]

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck,group_surv_rate,outliers
36,1,3,male,,0,0,7.2292,C,,0.135447,0.864553
41,0,2,female,27.0,1,0,21.0,S,,0.921053,0.921053
65,1,3,male,,1,1,15.2458,C,,0.135447,0.864553
74,1,3,male,32.0,0,0,56.4958,S,,0.135447,0.864553
81,1,3,male,29.0,0,0,9.5,S,,0.135447,0.864553
107,1,3,male,,0,0,7.775,S,,0.135447,0.864553
125,1,3,male,12.0,1,0,11.2417,C,,0.135447,0.864553
127,1,3,male,24.0,0,0,7.1417,S,,0.135447,0.864553
146,1,3,male,27.0,0,0,7.7958,S,,0.135447,0.864553
165,1,3,male,9.0,0,2,20.525,S,,0.135447,0.864553
