## Model Comparison & Selection

This notebook compares all trained models using fraud-focused evaluation metrics.
The goal is to select a final model suitable for real-world deployment, where
recall, precision, and business impact matter more than raw accuracy.


In [None]:
# import libraries
import pandas as pd


# Manual Log Model Results

In [2]:
# This is a summary of the results from the previous notebooks. It includes the precision, recall, F1-score for the fraud class, and the ROC-AUC for each model. The Random Forest with optimized threshold performs the best in terms of F1-score for fraud detection.
results = [
    {
        "Model": "Logistic Regression",
        "Fraud Precision": 0.0578,
        "Fraud Recall": 0.9184,
        "Fraud F1": 0.1088,
        "ROC-AUC": 0.9708
    },
    {
        "Model": "Logistic Regression + SMOTE",
        "Fraud Precision": 0.3010,
        "Fraud Recall": 0.8878,
        "Fraud F1": 0.4496,
        "ROC-AUC": 0.9708
    },
    {
        "Model": "Random Forest (threshold=0.5)",
        "Fraud Precision": 0.8283,
        "Fraud Recall": 0.8367,
        "Fraud F1": 0.8325,
        "ROC-AUC": 0.9685
    },
    {
        "Model": "Random Forest (optimized threshold)",
        "Fraud Precision": 0.9186,
        "Fraud Recall": 0.8061,
        "Fraud F1": 0.8587,
        "ROC-AUC": 0.9685
    }
]

df_results = pd.DataFrame(results)
df_results


Unnamed: 0,Model,Fraud Precision,Fraud Recall,Fraud F1,ROC-AUC
0,Logistic Regression,0.0578,0.9184,0.1088,0.9708
1,Logistic Regression + SMOTE,0.301,0.8878,0.4496,0.9708
2,Random Forest (threshold=0.5),0.8283,0.8367,0.8325,0.9685
3,Random Forest (optimized threshold),0.9186,0.8061,0.8587,0.9685


## Highlight best model

In [3]:
df_results.sort_values(by="Fraud F1", ascending=False)


Unnamed: 0,Model,Fraud Precision,Fraud Recall,Fraud F1,ROC-AUC
3,Random Forest (optimized threshold),0.9186,0.8061,0.8587,0.9685
2,Random Forest (threshold=0.5),0.8283,0.8367,0.8325,0.9685
1,Logistic Regression + SMOTE,0.301,0.8878,0.4496,0.9708
0,Logistic Regression,0.0578,0.9184,0.1088,0.9708


## Business interpretation

## Model Selection Decision

The Random Forest model with optimized classification threshold was selected
as the final model.

Reasons:
- Highest fraud F1-score
- Strong balance between recall and precision
- Reduced false positives compared to SMOTE-based models
- Robust ROC-AUC performance
- Interpretable feature importance

This model is best suited for deployment in a real-world fraud detection system.
