# Task 1 : Titanic Survival Prediction
This notebook builds a machine learning model to predict survival of Titanic passengers.

# Titanic Survival Prediction

**Task:** Predict whether a passenger survived the Titanic disaster or not.

**Dataset Source:**  
https://www.kaggle.com/competitions/titanic/data  
(Dataset provided publicly by Kaggle)

---


In [15]:
import pandas as pd

# Load the dataset
df = pd.read_csv("Titanic-Dataset.csv")
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Data Preprocessing
Handle missing values and encode categorical columns.

In [18]:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder

features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
target = 'Survived'
df_model = df[features + [target]].copy()

# Handle missing values properly
embarked_imputer = SimpleImputer(strategy='most_frequent')
df_model['Embarked'] = embarked_imputer.fit_transform(df_model[['Embarked']]).ravel()

age_imputer = SimpleImputer(strategy='mean')
df_model['Age'] = age_imputer.fit_transform(df_model[['Age']]).ravel()

# Encode categorical variables
label_encoders = {}
for col in ['Sex', 'Embarked']:
    le = LabelEncoder()
    df_model[col] = le.fit_transform(df_model[col])
    label_encoders[col] = le


## Model Training and Evaluation
Split the data, train a Random Forest model, and evaluate performance.

In [20]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Split data
X = df_model[features]
y = df_model[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Classification Report:\n", report)

Accuracy: 0.8100558659217877
Classification Report:
               precision    recall  f1-score   support

           0       0.82      0.87      0.84       105
           1       0.79      0.73      0.76        74

    accuracy                           0.81       179
   macro avg       0.81      0.80      0.80       179
weighted avg       0.81      0.81      0.81       179



**Note:** Dataset file (`Titanic-Dataset.csv`) must be in the same folder as this notebook.
