# Tree-Based Modeling: Random Forest & XGBoost

In this notebook, we:
- Train a Random Forest Classifier as a tree-based baseline
- Fine-tune an XGBoost Classifier
- Evaluate models using ROC-AUC, precision, recall, and F1-score
- Compare performance to logistic regression models from the previous notebook

In [3]:
# Core
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Modeling
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier

# Model selection & evaluation
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.metrics import classification_report, roc_auc_score, roc_curve

In [4]:
# Load processed data
import joblib
X_train = joblib.load('X_train_smote.pkl')
y_train = joblib.load('y_train_smote.pkl')
X_test = joblib.load('X_test.pkl')
y_test = joblib.load('y_test.pkl')

## 1. Baseline Random Forest Classifier

In [5]:
# Baseline Random Forest
rf_baseline = RandomForestClassifier(random_state=42, n_jobs=-1)
rf_baseline.fit(X_train, y_train)

In [6]:
# Predictions
y_pred_rf = rf_baseline.predict(X_test)
y_proba_rf = rf_baseline.predict_proba(X_test)[:, 1]

In [7]:
# Evaluation
print("Classification Report (Random Forest - Baseline):")
print(classification_report(y_test, y_pred_rf))
print("ROC-AUC (Random Forest - Baseline):", roc_auc_score(y_test, y_proba_rf))

Classification Report (Random Forest - Baseline):
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     85295
           1       0.87      0.77      0.82       148

    accuracy                           1.00     85443
   macro avg       0.93      0.89      0.91     85443
weighted avg       1.00      1.00      1.00     85443

ROC-AUC (Random Forest - Baseline): 0.9652455389324491


We trained a Random Forest Classifier using default settings. This model significantly outperforms logistic regression in terms of fraud detection precision, while maintaining high recall.

**Evaluation Metrics (Class 1 - Fraud):**
- **Precision:** 0.87
- **Recall:** 0.77
- **F1-Score:** 0.82
- **ROC-AUC:** 0.9652

This shows Random Forest is much better at distinguishing fraud from legitimate transactions than linear models. Next, we’ll tune its hyperparameters to push performance further.


## 2. Hyperparameter Tuning - Random Forest

In [8]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, StratifiedKFold

# Smaller, faster parameter grid
rf_param_grid = {
    'n_estimators': [100],        # Fewer trees = faster
    'max_depth': [None, 10],      # Keep depth shallow vs unlimited
    'min_samples_split': [2, 5]   # Minimum samples per split
}

# 3-fold cross-validation
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)

# Grid search setup
rf_grid = GridSearchCV(
    RandomForestClassifier(random_state=42, n_jobs=-1),
    param_grid=rf_param_grid,
    cv=cv,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1
)

# Run the search
rf_grid.fit(X_train, y_train)

Fitting 3 folds for each of 4 candidates, totalling 12 fits


In [9]:
print("Best Params:", rf_grid.best_params_)
print("Best ROC-AUC (CV):", rf_grid.best_score_)

Best Params: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}
Best ROC-AUC (CV): 0.9999988251900699


## 3. Evaluate Turned Random Forest on Test Set

In [10]:
# Get the best model
rf_best = rf_grid.best_estimator_

# Predict on test data
y_pred_rf_best = rf_best.predict(X_test)
y_proba_rf_best = rf_best.predict_proba(X_test)[:, 1]

# Evaluation
print("Classification Report (Tuned Random Forest):")
print(classification_report(y_test, y_pred_rf_best))
print("ROC-AUC (Tuned Random Forest):", roc_auc_score(y_test, y_proba_rf_best))

Classification Report (Tuned Random Forest):
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     85295
           1       0.87      0.77      0.82       148

    accuracy                           1.00     85443
   macro avg       0.93      0.89      0.91     85443
weighted avg       1.00      1.00      1.00     85443

ROC-AUC (Tuned Random Forest): 0.9652455389324491


We tuned `n_estimators`, `max_depth`, and `min_samples_split` using 3-fold stratified cross-validation. The best hyperparameters were:

- **n_estimators:** 100
- **max_depth:** None
- **min_samples_split:** 2

**Evaluation on Test Set:**
- **Precision (fraud):** 0.87
- **Recall (fraud):** 0.77
- **F1-score (fraud):** 0.82
- **ROC-AUC:** 0.9652

The tuned model performs identically to the baseline, confirming that the default settings were already optimal for this task.

## 4. Baseline XGBoost Classifier

In [12]:
from xgboost import XGBClassifier
from sklearn.metrics import classification_report, roc_auc_score

# Baseline XGBoost (defaults)
xgb_baseline = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)
xgb_baseline.fit(X_train, y_train)

# Predict on test set
y_pred_xgb = xgb_baseline.predict(X_test)
y_proba_xgb = xgb_baseline.predict_proba(X_test)[:, 1]

# Evaluation
print("Classification Report (XGBoost - Baseline):")
print(classification_report(y_test, y_pred_xgb))
print("ROC-AUC (XGBoost - Baseline):", roc_auc_score(y_test, y_proba_xgb))

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


Classification Report (XGBoost - Baseline):
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     85295
           1       0.76      0.81      0.78       148

    accuracy                           1.00     85443
   macro avg       0.88      0.91      0.89     85443
weighted avg       1.00      1.00      1.00     85443

ROC-AUC (XGBoost - Baseline): 0.9755530487988429


XGBoost is a gradient boosting model known for high performance on structured/tabular data. We trained a baseline model using default parameters.

**Evaluation on Test Set:**
- **Precision (fraud):** 0.76
- **Recall (fraud):** 0.81
- **F1-score (fraud):** 0.78
- **ROC-AUC:** 0.9756

Compared to Random Forest, XGBoost achieved higher recall and ROC-AUC, making it a strong candidate for our final model. Precision slightly dropped, which is expected when optimizing for recall.


## 5. GridSearchCV for XGBoost

In [13]:
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from xgboost import XGBClassifier

# Parameter grid
xgb_param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [3, 5],
    'learning_rate': [0.05, 0.1]
}

# 3-fold stratified cross-validation
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)

# GridSearchCV
xgb_grid = GridSearchCV(
    XGBClassifier(eval_metric='logloss', random_state=42, n_jobs=-1, use_label_encoder=False),
    param_grid=xgb_param_grid,
    scoring='roc_auc',
    cv=cv,
    n_jobs=-1,
    verbose=1
)

# Fit
xgb_grid.fit(X_train, y_train)

Fitting 3 folds for each of 8 candidates, totalling 24 fits


Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)
Parameters: { "use_label_encoder" } are not used.


In [14]:
print("Best Params:", xgb_grid.best_params_)
print("Best ROC-AUC (CV):", xgb_grid.best_score_)

Best Params: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 200}
Best ROC-AUC (CV): 0.9999754201829543


## 6. Evaluate Tuned XGBoost on Test Set

In [15]:
# Get best model
xgb_best = xgb_grid.best_estimator_

# Predict on test set
y_pred_xgb_best = xgb_best.predict(X_test)
y_proba_xgb_best = xgb_best.predict_proba(X_test)[:, 1]

# Evaluate
print("Classification Report (Tuned XGBoost):")
print(classification_report(y_test, y_pred_xgb_best))
print("ROC-AUC (Tuned XGBoost):", roc_auc_score(y_test, y_proba_xgb_best))

Classification Report (Tuned XGBoost):
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     85295
           1       0.51      0.82      0.63       148

    accuracy                           1.00     85443
   macro avg       0.76      0.91      0.82     85443
weighted avg       1.00      1.00      1.00     85443

ROC-AUC (Tuned XGBoost): 0.9728013111886727


We tuned `n_estimators`, `max_depth`, and `learning_rate` using 3-fold stratified cross-validation. The best hyperparameters were:

- **n_estimators:** 200  
- **max_depth:** 5  
- **learning_rate:** 0.1  

**Evaluation on Test Set:**
- **Precision (fraud):** 0.51
- **Recall (fraud):** 0.82
- **F1-score (fraud):** 0.63
- **ROC-AUC:** 0.9728

This model detected slightly more fraud cases than the baseline (higher recall), but at the cost of significantly more false positives (lower precision). These results reflect the classic trade-off in fraud detection: catching more fraud comes with increased noise.