# Titanic Survival Prediction

This project demonstrates a complete machine learning pipeline using the Titanic dataset from Kaggle.

**Author:** Aikerim Turgynbek  
**Dataset:** Titanic â€“ Machine Learning from Disaster (Kaggle)

## 1. Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

plt.style.use('default')

## 2. Load Dataset

In [None]:
train = pd.read_csv('../data/train.csv')
test = pd.read_csv('../data/test.csv')

train.head()

## 3. Exploratory Data Analysis (EDA)

In [None]:
train.info()

train.describe()

In [None]:
sns.countplot(x='Survived', data=train)
plt.title('Survival Distribution')
plt.show()

sns.countplot(x='Survived', hue='Sex', data=train)
plt.title('Survival by Gender')
plt.show()

## 4. Data Preprocessing

In [None]:
train['Sex'] = train['Sex'].map({'male': 0, 'female': 1})
test['Sex'] = test['Sex'].map({'male': 0, 'female': 1})

train['Age'].fillna(train['Age'].median(), inplace=True)
test['Age'].fillna(test['Age'].median(), inplace=True)

test['Fare'].fillna(test['Fare'].median(), inplace=True)

## 5. Feature Selection

In [None]:
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']

X = train[features]
y = train['Survived']

X_test = test[features]

## 6. Model Training

In [None]:
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

## 7. Evaluation

In [None]:
accuracy = model.score(X_val, y_val)
print(f'Validation Accuracy: {accuracy * 100:.2f}%')

## 8. Prediction and Submission

In [None]:
predictions = model.predict(X_test)

submission = pd.DataFrame({
    'PassengerId': test['PassengerId'],
    'Survived': predictions
})

submission.to_csv('submission.csv', index=False)
submission.head()

## 9. Conclusion

A Random Forest classifier was applied to predict survival outcomes on the Titanic dataset. The model achieves reasonable accuracy with minimal feature engineering and provides a solid baseline for further experimentation.