## GitHub: https://github.com/williamclark13/P3Analytics

# Data Prediction Algorithm: Linear Regression

### How it Works?
* Statistical method used for binary classification problems. It models the probability that a given instance belongs to a particular category. In this case, it predicts whether a passenger survived (1) or not (0) based on varius features.

### What type of Tasks does it fit for?
* Primarily used for binary classification, predicting one of two possible outcomes, in this case it predicts whether a passenger survived or not.

### Advantages
* Requires less computational resources.
* Can handle a large number of features.
* Simple & Interpretable

### Disadvantages
* Assumes a linear relationship between features and log-odds of response.
* May not perform well with complex relationships.
* Assumes features are independent.
* Sensitive to outliers.

In [34]:
import joblib
import pandas as pd
from sklearn.metrics import accuracy_score, mean_absolute_error
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from flask import Flask, request, jsonify, render_template

df = pd.read_csv("Project3Css\Titanic-Dataset.csv")

In [35]:
df['Family_Size'] = df['SibSp'] + df['Parch']
df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\.')
df['Age_Category'] = pd.cut(df['Age'], bins=[0, 12, 18, 30, 50, 100], labels=['Child', 'Teen', 'Young Adult', 'Adult', 'Senior'])
df = df.drop(['Name', 'Ticket', 'Cabin'], axis=1)
df['Age'].fillna(df['Age'].median(), inplace=True)
df = pd.get_dummies(df, columns=['Sex', 'Embarked', 'Title', 'Age_Category'], drop_first=True)

In [36]:
X = df.drop('Survived', axis=1)
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LogisticRegression()
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

mae = mean_absolute_error(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
print(f'Mean Absolute Error (MAE): {mae}')
print(f'Model Accuracy: {accuracy * 100:.2f}%')

Mean Absolute Error (MAE): 0.18994413407821228
Model Accuracy: 81.01%


In [37]:
joblib.dump(model, 'titanic_survival_model.pkl')

['titanic_survival_model.pkl']

In [38]:
app = Flask(__name__)

model = joblib.load('titanic_survival_model.pkl')

@app.route('/')
def home():
    return render_template('index.html')

@app.route('/predict', methods=['POST'])
def predict():
    scaler = joblib.load('scaler.pkl')

    features = [float(request.form['Pclass']),
                float(request.form['Age'])]
    
    features = [features]
    features_scaled = scaler.transform(features)
    
    prediction = model.predict(features_scaled)
    result = "Survived" if prediction[0] == 1 else "Not Survived"
    
    return render_template('index.html', result=result)

if __name__ == '__main__':
    app.run(debug=True)

 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with watchdog (windowsapi)


SystemExit: 1

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
