# Titanic Dataset Exploratory Data Analysis
*Date: 2025-06-09*

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (10, 6)

df = pd.read_csv("train.csv")
df.info()

In [None]:
df.describe(include='all')

In [None]:
sns.histplot(df['Age'].dropna(), kde=True, bins=30)
plt.title('Distribution of Age')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

**Observation:** The Age distribution is right-skewed, with most passengers between 20 and 40 years old.

In [None]:
sns.boxplot(x='Survived', y='Age', data=df)
plt.title('Age Distribution by Survival')
plt.xlabel('Survived')
plt.ylabel('Age')
plt.show()

**Observation:** Survivors tended to be younger on average than non-survivors. Outliers exist in both groups.

In [None]:
sns.countplot(x='Sex', hue='Survived', data=df)
plt.title('Survival Count by Sex')
plt.xlabel('Sex')
plt.ylabel('Count')
plt.show()

**Observation:** Females had a significantly higher survival rate compared to males.

In [None]:
sns.heatmap(df[['Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

**Observation:** Fare and Pclass are moderately correlated with survival. SibSp and Parch show weaker relationships.

In [None]:
df['Survived'] = pd.to_numeric(df['Survived'], errors='coerce')
df['Pclass'] = pd.to_numeric(df['Pclass'], errors='coerce')
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')
df['Fare'] = pd.to_numeric(df['Fare'], errors='coerce')
clean_df = df[['Survived', 'Pclass', 'Age', 'Fare']].dropna()
clean_df['Survived'] = clean_df['Survived'].astype(int)

sns.pairplot(clean_df, hue='Survived', palette='husl', diag_kind='hist')
plt.suptitle('Pairplot of Selected Features by Survival', y=1.02)
plt.show()

**Observation:** Survival is visually separable based on Pclass and Fare; Age has a slight separation.

## Summary of Findings

- **Age:** Younger passengers had a higher survival rate.
- **Sex:** Females had a much better survival outcome than males.
- **Pclass & Fare:** Higher class and fare were associated with survival.
- **Correlations:** `Pclass`, `Fare`, and `Age` showed relevant relationships with survival.

This analysis offers foundational insights useful for modeling Titanic survival.