# Hands on: Visualization


### Overview

- [The Data](#ch1)</a>

- [Import the file titanicData.csv to a dataframe](#ch1_1)</a>

- [Obtain a summary of the data.](#ch1_2)</a>



- [4.1.	Visualy inspect if the dataset is imbalanced.](#ch4_1)</a>

- [4.2. Show the number of passengers by gender. Which gender had higher survival rate?](#ch4_1)</a>

- [4.3.	Show the number of passengers per class and gender.](#ch4_3)</a>

- [4.4. Show the distribution of passengers by age. Suggestion: create bins of width equal to 10. Comment on the results. Which age group had more survivals?](#ch4_4)</a>

- [4.5. Plot the total of minors w.r.t the survival status and passenger class.](#ch4_5)</a>

- [4.6. Show the distribution of the age of the minors according to the survival status.](#ch4_6)</a>

- [4.7. Add the information of the density estimation to the previous graph.](#ch4_7)</a>

- [4.8. Did passenger class make any difference to his survival?](#ch4_8)</a>

- [Does the survival of passenger depending on ticket class?](#ch4_8_1)</a>

- [4.9. Show male and female survival per class and by age.](#ch4_9)</a>

- [4.10. Did a person travelling with others had more survival possibility?](#ch4_10)</a>

- [Now lets review these by their survival](#ch4_10_1)</a>

- [4.11. How does Embarkation vary across age?](#ch4_11)</a>

- [4.12. Show the distribution of ticked fare w.r.t ticket class and gender.](#ch4_12)</a>

- [4.13. Inspect the association between passenger class and fare.](#ch4_13)</a>

- [4.14. Show the relationship between the attributes age and fare.](#ch4_14)</a>

- [4.15. Which are the features that most correlate with survival status?](#ch4_15)</a>


In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 
%matplotlib inline 

# The Data <a name="ch1"></a>
### The titanic dataset contains information on the survival status of individual passengers on the Titanic 

**Pass_class** - Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd)
    
**name**	

**sex**	

**age**	

**sibsp** - Number of Siblings/Spouses Aboard

**parch** - Number of Parents/Children Aboard	

**ticket** - Ticket number	

**fare** - Passenger fare	

**cabin**	- Cabin number	

**embarked** - Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
    
**survived** - Survival status (0 = No, 1 = Yes)


### Import the file titanicData.csv to a dataframe (csv file is available in eLearning). <a name="ch1_1"></a>

In [2]:
df = pd.read_csv("titanicData.csv", sep=';')
df

FileNotFoundError: [Errno 2] No such file or directory: 'titanicData.csv'

### Insert new attributes

In [None]:
df.insert(1, "Pclass",df.Pass_class)
df.loc[df.Pclass == 1,'Pclass'] = '1st'
df.loc[df.Pclass == 2,'Pclass'] = '2nd'
df.loc[df.Pclass == 3,'Pclass'] = '3rd'

df.insert(12, "survival",df.survived)
df.loc[df.survival == 0,'survival'] = 'not survived'
df.loc[df.survival == 1,'survival'] = 'survived'

### Obtain a summary of the data  <a name="ch1_2"></a>

##	4.1.	Visualy inspect if the dataset is imbalanced. <a name="ch4_1"></a>

In [None]:
data= df.survival.value_counts()

colors = sns.color_palette('pastel')[0:7]
sns.set_theme(palette=colors, font="arial", font_scale= 2.5)
plt.figure(figsize=(5,5))
plt.pie(data, labels = data.index, colors = colors, autopct='%.0f%%')
plt.show()

##	4.2.	Show the number of passengers by gender. Which gender had higher survival rate? <a name="ch4_2"></a>

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(20, 5))
fig.suptitle('Titanic Data',fontweight='bold', fontsize=40)

g = sns.countplot(ax=axes[0], x='sex', data=df)
axes[0].bar_label(g.containers[0],fontweight='bold', fontsize=30)
axes[0].set_xlabel("gender",fontweight='bold', fontsize=30)
axes[0].set_ylabel("count",fontweight='bold', fontsize=30)
axes[0].tick_params(labelsize=30)


df1 = df.groupby('sex')['survival'].value_counts(normalize=True)
df1 = df1.mul(100)
df1 = df1.rename('percent').reset_index()

g = sns.barplot(ax=axes[1], x='sex', y='percent',hue='survival',data=df1)
axes[1].bar_label(g.containers[0],fontweight='bold', fontsize=30)
axes[1].bar_label(g.containers[1],fontweight='bold', fontsize=30)
axes[1].set_xlabel("gender",fontweight='bold', fontsize=30)
axes[1].set_ylabel("percentage",fontweight='bold', fontsize=30)
axes[1].set_ylim(0,100)
axes[1].tick_params(labelsize=30)

plt.show()    

##	4.3.	Show the number of passengers per class and gender. <a name="ch4_3"></a>

##	4.4.	Show the distribution of passengers by age. Suggestion: create bins of width equal to 10. 

## Comment on the results. Which age group had more survivals?  <a name="ch4_4"></a>

In [None]:
maxAge = df.age.max()
minAge = df.age.min()
numberBins = int(np.round((maxAge-minAge)/10))

##	4.5.	Plot the total of minors w.r.t the survival status and passenger class.  <a name="ch4_5"></a>

In [None]:
df_minors = df[df.age < 18]



##	4.6.	Show the distribution of the age of the  minors according to the survival status.  <a name="ch4_6"></a>

In [None]:
g = sns.FacetGrid(df_minors,col='survived',  margin_titles=True, height=12)
g.map(sns.histplot,'age', bins = 10)
g.fig.suptitle('Titanic Data: minors',fontweight='bold', fontsize=50)
g.axes[0,0].set_title('not survived',fontweight='bold', fontsize=40)
g.axes[0,1].set_title('survived',fontweight='bold', fontsize=40)
g.axes[0,0].set_xlabel('age',fontweight='bold', fontsize=30)
g.axes[0,1].set_xlabel('age',fontweight='bold', fontsize=30)
g.axes[0,0].tick_params(labelsize=30)
g.axes[0,1].tick_params(labelsize=30)
g.fig.subplots_adjust(top=0.9) # adjust the Figure in rp

##	4.7.	Add the information of the density estimation to the previous graph.  <a name="ch4_7"></a>

##	4.8.	Did passenger class make any difference to his survival?  <a name="ch4_8"></a>

##	4.9. Show male and female survival per class and by age.  <a name="ch4_9"></a>

In [None]:
g = sns.FacetGrid(df, col='sex', row = 'Pclass', hue = 'survival', margin_titles=True, height=10)
g.map(sns.histplot,'age', bins = 10)
g.fig.suptitle('Titanic Data: minors',fontweight='bold', fontsize=50)
g.axes[0,0].set_xlabel('Passenger class',fontweight='bold', fontsize=30)
g.axes[0,1].set_xlabel('Passenger class',fontweight='bold', fontsize=30)
g.axes[0,0].tick_params(labelsize=30)
g.axes[0,1].tick_params(labelsize=30)
g.fig.subplots_adjust(top=0.9) # adjust the Figure in rp
plt.legend()
plt.show()

##	4.10.	Did a person travelling with others had more survival possibility?  <a name="ch4_10"></a>

##	4.11.	How does Embarkation vary across age? <a name="ch4_11"></a>

In [None]:
g = sns.FacetGrid(df,col='embarked', margin_titles=True, height=10)
g.map(sns.histplot, 'age')
g.fig.subplots_adjust(top=0.9) # adjust the Figure in rp
g.fig.suptitle('Titanic Data',fontweight='bold', fontsize=30)
plt.show()

##	4.12.	Show the distribution of ticked fare w.r.t ticket class and gender. <a name="ch4_12"></a>

##	4.13.	Inspect the association between passenger class and fare. <a name="ch4_13"></a>

##	4.14. Show the relationship between the attributes age and fare. <a name="ch4_14"></a>

##	4.15.	Which are the features that most correlate with survival status? <a name="ch4_15"></a>