## GitHub: https://github.com/williamclark13/P3Analytics

# Data Prediction Algorithm: Linear Regression


### How it Works?
* Linear Regression is a statistical method used for regression problems, focusing on predicting a continuous numeric outcome. In this case, it models the relationship between various features and predicts a numerical value indicating the likelihood of a passenger's survival.

### What type of Tasks does it fit for?
* Primarily used for regression tasks, predicting a numeric outcome. In this scenario, it predicts the likelihood of a passenger's survival based on the given features.

### Advantages
* Can handle a large number of features.
* Simple & Interpretable

### Disadvantages
* May not perform well with complex relationships.
* Assumes features are independent.
* Sensitive to outliers.

### Import Libraries

In [43]:
import joblib
import pandas as pd
from sklearn.metrics import mean_absolute_error, accuracy_score
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from flask import Flask, request, jsonify, render_template

df = pd.read_csv("Project3Css/Titanic-Dataset.csv")

### Data Wrangling
* Family Size = Sibling/Spouse and Parent/Child (if any of either/all).
* Determines whether individual has (mr., mrs., etc).
* Ages are separated into children, teenagers, young adult, adult, and senior.
* Columns are dropped because unnecessary towards determining survivability.

In [44]:
df['Family_Size'] = df['SibSp'] + df['Parch']
df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\.')
df['Age_Category'] = pd.cut(df['Age'], bins=[0, 12, 18, 30, 50, 100], labels=['Child', 'Teen', 'Young Adult', 'Adult', 'Senior'])
df = df.drop(['Name', 'Ticket', 'Cabin'], axis=1)
df['Age'].fillna(df['Age'].median(), inplace=True)
df = pd.get_dummies(df, columns=['Sex', 'Embarked', 'Title', 'Age_Category'], drop_first=True)

# Training Model/Evaluation
* Model is split into training and testing sets.
* Then standardized using StandardScaler.
* Then model is fitted to training data.
* And predictions are made on the test set and [measuring the error/accuracy]

In [45]:
X = df.drop('Survived', axis=1)
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

mae = mean_absolute_error(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred.round())
print(f'Mean Absolute Error (MAE): {mae}')
print(f'Model Accuracy: {accuracy * 100:.2f}%')

Mean Absolute Error (MAE): 0.2863490681190359
Model Accuracy: 81.01%


### Flask Webpage
* The above model predicts likelihood of survival and displays result on the following webpage.

In [46]:
joblib.dump(model, 'titanic_survival_model.pkl')

['titanic_survival_model.pkl']

In [None]:
app = Flask(__name__)

model = joblib.load('titanic_survival_model.pkl')

@app.route('/')
def home():
    return render_template('index.html')

@app.route('/predict', methods=['POST'])
def predict():
    features = [float(request.form['Pclass']),
                float(request.form['Age']),
                float(request.form['SibSp']),
                float(request.form['Parch']),
                float(request.form['Fare']),
                float(request.form['Family_Size']),
                float(request.form['Sex_male']),
                float(request.form['Embarked_Q']),
                float(request.form['Embarked_S']),
                float(request.form['Title_Miss']),
                float(request.form['Title_Mr']),
                float(request.form['Title_Mrs']),
                float(request.form['Title_Other']),
                float(request.form['Age_Category_Teen']),
                float(request.form['Age_Category_Young Adult']),
                float(request.form['Age_Category_Adult']),
                float(request.form['Age_Category_Senior'])]

    features = [features]
    features_scaled = scaler.transform(features)

    prediction = model.predict(features_scaled)
    result = "Survived" if prediction[0] > 0.5 else "Not Survived"

    return render_template('index.html', result=result)

if __name__ == '__main__':
    app.run(debug=True)

### Summary
* Predicts survivability on the titanic, through a convenient Flask Webpage.