# Titanic Survival Prediction System

This notebook demonstrates the development of a machine learning model to predict Titanic passenger survival using selected features. Steps include loading the dataset, exploring its structure, preprocessing, model training, evaluation, and model persistence.

## 1. Load Dataset from Model Folder

Load the Titanic dataset from the 'model' folder using pandas.

In [5]:
import pandas as pd

df = pd.read_csv('../model/test.csv')
df.head()

ModuleNotFoundError: No module named 'pandas'

## 2. Explore Dataset Structure

Display basic information about the dataset, such as shape, columns, and sample rows.

In [None]:
print('Shape:', df.shape)
print('Columns:', df.columns.tolist())
df.info()
df.sample(5)

## 3. Preprocess Data

Handle missing values, select features, encode categorical variables, and scale features as needed.

In [None]:
from sklearn.preprocessing import LabelEncoder, StandardScaler

# Select features
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Fare']
target = 'Survived'

# Handle missing values
# Fill Age with median, Fare with median, Sex with mode
for col in ['Age', 'Fare']:
    df[col].fillna(df[col].median(), inplace=True)
df['Sex'].fillna(df['Sex'].mode()[0], inplace=True)

# Encode categorical variables
le_sex = LabelEncoder()
df['Sex'] = le_sex.fit_transform(df['Sex'])

# Feature scaling
scaler = StandardScaler()
df[['Age', 'Fare']] = scaler.fit_transform(df[['Age', 'Fare']])

X = df[features]
y = df[target]

## 4. Implement Model Training

Train a Logistic Regression model using the preprocessed data.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

## 5. Evaluate Model Performance

Assess the trained model's performance using classification report and accuracy score.

In [None]:
from sklearn.metrics import classification_report, accuracy_score

y_pred = model.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

## 6. Save Trained Model

Persist the trained model to disk using joblib.

In [None]:
import joblib

joblib.dump(model, '../model/titanic_survival_model.pkl')
print('Model saved as titanic_survival_model.pkl')

## 7. Reload Model and Predict

Demonstrate that the saved model can be reloaded and used for prediction without retraining.

In [None]:
loaded_model = joblib.load('../model/titanic_survival_model.pkl')
# Predict using the loaded model
sample = X_test.iloc[0:1]
pred = loaded_model.predict(sample)
print('Sample input:', sample.values)
print('Predicted survival:', 'Survived' if pred[0] == 1 else 'Did Not Survive')