# Building Your First Fraud Detection Models 🤖

## 🎯 Learning Objectives
By the end of this tutorial, you will:
- Build multiple machine learning models for fraud detection
- Understand the ML pipeline from data prep to evaluation
- Handle class imbalance with proper techniques
- Evaluate models using appropriate metrics for fraud detection
- Compare different algorithms and choose the best one

## 📋 What This File Does
The `fraud_detection_models.py` file implements a complete machine learning pipeline including:

**🔧 Data Preprocessing:**
- Feature scaling and normalization
- Train/validation/test splits with stratification
- Class imbalance handling

**🤖 Multiple ML Algorithms:**
- Logistic Regression (baseline)
- Random Forest (ensemble)
- Neural Networks (deep learning)
- Isolation Forest (anomaly detection)
- One-Class SVM (anomaly detection)

**⚖️ Imbalance Handling:**
- Class weights adjustment
- SMOTE (Synthetic Minority Over-sampling)
- Anomaly detection approaches

**📊 Comprehensive Evaluation:**
- Confusion matrices
- ROC curves and AUC scores
- Precision-Recall curves
- F1-scores and business metrics

## 1. Setting Up the Environment

Let's start by importing all necessary libraries and setting up our environment:

In [None]:
# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Scikit-learn imports
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, IsolationForest
from sklearn.svm import OneClassSVM
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import (
    classification_report, confusion_matrix, 
    roc_auc_score, precision_recall_curve, roc_curve,
    average_precision_score, f1_score, precision_score, recall_score
)

# Imbalanced-learn for SMOTE
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.pipeline import Pipeline as ImbPipeline

# For saving models
import joblib

# Set visualization style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("✅ Libraries imported successfully!")
print(f"📅 Current time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 2. Loading and Preparing the Data

We'll load the data and create proper train/test splits while maintaining the class distribution:

In [None]:
# Load the dataset
df = pd.read_csv('../creditcard.csv')

print("📊 Dataset Overview:")
print(f"Total transactions: {len(df):,}")
print(f"Features: {df.shape[1]}")
print(f"Fraud transactions: {df['Class'].sum():,} ({df['Class'].mean()*100:.3f}%)")
print(f"Normal transactions: {(1-df['Class']).sum():,} ({(1-df['Class']).mean()*100:.3f}%)")

# Feature engineering
print("\n🔧 Feature Engineering...")
# Add log-transformed amount (helps with skewed distribution)
df['Amount_log'] = np.log(df['Amount'] + 1)

# Add hour of day (cyclical pattern)
df['Hour'] = (df['Time'] % (24 * 3600)) // 3600

# Create time-based features
df['Time_sin'] = np.sin(2 * np.pi * df['Hour'] / 24)
df['Time_cos'] = np.cos(2 * np.pi * df['Hour'] / 24)

print("✅ Added engineered features: Amount_log, Hour, Time_sin, Time_cos")

In [None]:
# Prepare features and target
# Select features (exclude Time and Class, include engineered features)
feature_cols = [col for col in df.columns if col not in ['Class', 'Time']]
X = df[feature_cols]
y = df['Class']

print(f"\n📊 Feature matrix shape: {X.shape}")
print(f"Selected features ({len(feature_cols)}): {', '.join(feature_cols[:5])}...")

# Train-test split with stratification to maintain class distribution
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\n📂 Train set: {len(X_train):,} samples")
print(f"   - Fraud: {y_train.sum():,} ({y_train.mean()*100:.3f}%)")
print(f"📂 Test set: {len(X_test):,} samples")
print(f"   - Fraud: {y_test.sum():,} ({y_test.mean()*100:.3f}%)")

# Verify stratification worked
print(f"\n✅ Stratification check: Train fraud rate = {y_train.mean():.4f}, Test fraud rate = {y_test.mean():.4f}")

## 3. Feature Scaling

Most ML algorithms perform better with scaled features. Let's standardize our data:

In [None]:
# Initialize and fit the scaler
scaler = StandardScaler()

# Fit on training data and transform both sets
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert back to DataFrames for easier handling
X_train_scaled = pd.DataFrame(X_train_scaled, columns=feature_cols, index=X_train.index)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=feature_cols, index=X_test.index)

print("✅ Features scaled using StandardScaler")
print(f"Mean of scaled training features: {X_train_scaled.mean().mean():.6f} (should be ~0)")
print(f"Std of scaled training features: {X_train_scaled.std().mean():.6f} (should be ~1)")

# Visualize the effect of scaling
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Before scaling
ax1.boxplot([X_train['Amount'].values, X_train['V1'].values, X_train['V2'].values], 
            labels=['Amount', 'V1', 'V2'])
ax1.set_title('Before Scaling')
ax1.set_ylabel('Value')

# After scaling
ax2.boxplot([X_train_scaled['Amount'].values, X_train_scaled['V1'].values, X_train_scaled['V2'].values], 
            labels=['Amount', 'V1', 'V2'])
ax2.set_title('After Scaling')
ax2.set_ylabel('Standardized Value')

plt.tight_layout()
plt.show()

## 4. Building Baseline Models

Let's start with simple models to establish baselines. We'll use class weights to handle imbalance:

In [None]:
# Initialize models dictionary to store results
models = {}
results = {}

print("🤖 Training Baseline Models...")
print("=" * 50)

# 1. Logistic Regression
print("\n1️⃣ Logistic Regression")
lr = LogisticRegression(
    random_state=42, 
    max_iter=1000, 
    class_weight='balanced'  # Automatically adjust weights
)
lr.fit(X_train_scaled, y_train)
models['Logistic Regression'] = lr

# Make predictions
y_pred_lr = lr.predict(X_test_scaled)
y_proba_lr = lr.predict_proba(X_test_scaled)[:, 1]

# Calculate metrics
lr_metrics = {
    'precision': precision_score(y_test, y_pred_lr),
    'recall': recall_score(y_test, y_pred_lr),
    'f1': f1_score(y_test, y_pred_lr),
    'roc_auc': roc_auc_score(y_test, y_proba_lr)
}
results['Logistic Regression'] = lr_metrics

print(f"✅ Precision: {lr_metrics['precision']:.4f}")
print(f"✅ Recall: {lr_metrics['recall']:.4f}")
print(f"✅ F1-Score: {lr_metrics['f1']:.4f}")
print(f"✅ ROC-AUC: {lr_metrics['roc_auc']:.4f}")

In [None]:
# 2. Random Forest
print("\n2️⃣ Random Forest")
rf = RandomForestClassifier(
    n_estimators=100,
    random_state=42,
    class_weight='balanced',
    n_jobs=-1  # Use all CPU cores
)
rf.fit(X_train_scaled, y_train)
models['Random Forest'] = rf

# Make predictions
y_pred_rf = rf.predict(X_test_scaled)
y_proba_rf = rf.predict_proba(X_test_scaled)[:, 1]

# Calculate metrics
rf_metrics = {
    'precision': precision_score(y_test, y_pred_rf),
    'recall': recall_score(y_test, y_pred_rf),
    'f1': f1_score(y_test, y_pred_rf),
    'roc_auc': roc_auc_score(y_test, y_proba_rf)
}
results['Random Forest'] = rf_metrics

print(f"✅ Precision: {rf_metrics['precision']:.4f}")
print(f"✅ Recall: {rf_metrics['recall']:.4f}")
print(f"✅ F1-Score: {rf_metrics['f1']:.4f}")
print(f"✅ ROC-AUC: {rf_metrics['roc_auc']:.4f}")

In [None]:
# 3. Neural Network
print("\n3️⃣ Neural Network")
nn = MLPClassifier(
    hidden_layer_sizes=(100, 50),  # Two hidden layers
    random_state=42,
    max_iter=500,
    early_stopping=True,
    validation_fraction=0.1
)
nn.fit(X_train_scaled, y_train)
models['Neural Network'] = nn

# Make predictions
y_pred_nn = nn.predict(X_test_scaled)
y_proba_nn = nn.predict_proba(X_test_scaled)[:, 1]

# Calculate metrics
nn_metrics = {
    'precision': precision_score(y_test, y_pred_nn),
    'recall': recall_score(y_test, y_pred_nn),
    'f1': f1_score(y_test, y_pred_nn),
    'roc_auc': roc_auc_score(y_test, y_proba_nn)
}
results['Neural Network'] = nn_metrics

print(f"✅ Precision: {nn_metrics['precision']:.4f}")
print(f"✅ Recall: {nn_metrics['recall']:.4f}")
print(f"✅ F1-Score: {nn_metrics['f1']:.4f}")
print(f"✅ ROC-AUC: {nn_metrics['roc_auc']:.4f}")

print("\n📊 Summary of Baseline Models:")
baseline_df = pd.DataFrame(results).T
print(baseline_df.round(4))

## 5. Anomaly Detection Approaches

Since fraud is rare, we can treat it as an anomaly detection problem. These models learn what "normal" looks like:

In [None]:
print("🔍 Training Anomaly Detection Models...")
print("=" * 50)

# Get only normal transactions for training
X_train_normal = X_train_scaled[y_train == 0]
print(f"Training on {len(X_train_normal):,} normal transactions only")

# 1. Isolation Forest
print("\n1️⃣ Isolation Forest")
iso_forest = IsolationForest(
    contamination=0.002,  # Expected proportion of outliers
    random_state=42,
    n_jobs=-1
)
iso_forest.fit(X_train_normal)
models['Isolation Forest'] = iso_forest

# Predict (-1 for anomaly, 1 for normal)
y_pred_iso = iso_forest.predict(X_test_scaled)
# Convert to binary (0 for normal, 1 for fraud)
y_pred_iso_binary = np.where(y_pred_iso == -1, 1, 0)

# Get anomaly scores
y_scores_iso = -iso_forest.score_samples(X_test_scaled)  # Higher score = more anomalous

# Calculate metrics
iso_metrics = {
    'precision': precision_score(y_test, y_pred_iso_binary),
    'recall': recall_score(y_test, y_pred_iso_binary),
    'f1': f1_score(y_test, y_pred_iso_binary),
    'roc_auc': roc_auc_score(y_test, y_scores_iso)
}
results['Isolation Forest'] = iso_metrics

print(f"✅ Precision: {iso_metrics['precision']:.4f}")
print(f"✅ Recall: {iso_metrics['recall']:.4f}")
print(f"✅ F1-Score: {iso_metrics['f1']:.4f}")
print(f"✅ ROC-AUC: {iso_metrics['roc_auc']:.4f}")

## 6. Handling Imbalance with SMOTE

SMOTE (Synthetic Minority Over-sampling Technique) creates synthetic fraud examples:

In [None]:
print("⚖️ Balancing Data with SMOTE...")
print("=" * 50)

# Apply SMOTE
smote = SMOTE(random_state=42)
X_train_smote, y_train_smote = smote.fit_resample(X_train_scaled, y_train)

print(f"Original training set:")
print(f"  - Total: {len(y_train):,}")
print(f"  - Fraud: {y_train.sum():,} ({y_train.mean()*100:.3f}%)")

print(f"\nAfter SMOTE:")
print(f"  - Total: {len(y_train_smote):,}")
print(f"  - Fraud: {y_train_smote.sum():,} ({y_train_smote.mean()*100:.3f}%)")

# Visualize the effect of SMOTE
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Original distribution
ax1.bar(['Normal', 'Fraud'], [len(y_train) - y_train.sum(), y_train.sum()], 
        color=['#2ecc71', '#e74c3c'])
ax1.set_title('Original Training Set')
ax1.set_ylabel('Count')

# After SMOTE
ax2.bar(['Normal', 'Fraud'], [len(y_train_smote) - y_train_smote.sum(), y_train_smote.sum()], 
        color=['#2ecc71', '#e74c3c'])
ax2.set_title('After SMOTE')
ax2.set_ylabel('Count')

plt.tight_layout()
plt.show()

In [None]:
# Train models on SMOTE-balanced data
print("\n🤖 Training Models with SMOTE Data...")

# Random Forest with SMOTE
print("\n1️⃣ Random Forest (SMOTE)")
rf_smote = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
rf_smote.fit(X_train_smote, y_train_smote)
models['Random Forest SMOTE'] = rf_smote

y_pred_rf_smote = rf_smote.predict(X_test_scaled)
y_proba_rf_smote = rf_smote.predict_proba(X_test_scaled)[:, 1]

rf_smote_metrics = {
    'precision': precision_score(y_test, y_pred_rf_smote),
    'recall': recall_score(y_test, y_pred_rf_smote),
    'f1': f1_score(y_test, y_pred_rf_smote),
    'roc_auc': roc_auc_score(y_test, y_proba_rf_smote)
}
results['Random Forest SMOTE'] = rf_smote_metrics

print(f"✅ Precision: {rf_smote_metrics['precision']:.4f}")
print(f"✅ Recall: {rf_smote_metrics['recall']:.4f}")
print(f"✅ F1-Score: {rf_smote_metrics['f1']:.4f}")
print(f"✅ ROC-AUC: {rf_smote_metrics['roc_auc']:.4f}")

# Logistic Regression with SMOTE
print("\n2️⃣ Logistic Regression (SMOTE)")
lr_smote = LogisticRegression(random_state=42, max_iter=1000)
lr_smote.fit(X_train_smote, y_train_smote)
models['Logistic Regression SMOTE'] = lr_smote

y_pred_lr_smote = lr_smote.predict(X_test_scaled)
y_proba_lr_smote = lr_smote.predict_proba(X_test_scaled)[:, 1]

lr_smote_metrics = {
    'precision': precision_score(y_test, y_pred_lr_smote),
    'recall': recall_score(y_test, y_pred_lr_smote),
    'f1': f1_score(y_test, y_pred_lr_smote),
    'roc_auc': roc_auc_score(y_test, y_proba_lr_smote)
}
results['Logistic Regression SMOTE'] = lr_smote_metrics

print(f"✅ Precision: {lr_smote_metrics['precision']:.4f}")
print(f"✅ Recall: {lr_smote_metrics['recall']:.4f}")
print(f"✅ F1-Score: {lr_smote_metrics['f1']:.4f}")
print(f"✅ ROC-AUC: {lr_smote_metrics['roc_auc']:.4f}")

## 7. Model Comparison

Let's compare all our models to see which performs best:

In [None]:
# Create comparison DataFrame
comparison_df = pd.DataFrame(results).T
comparison_df = comparison_df.sort_values('f1', ascending=False)

print("📊 Model Performance Comparison")
print("=" * 60)
print(comparison_df.round(4))

# Visualize model comparison
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. F1-Score comparison
ax1 = axes[0, 0]
comparison_df['f1'].plot(kind='bar', ax=ax1, color='#3498db')
ax1.set_title('F1-Score by Model', fontsize=14, fontweight='bold')
ax1.set_ylabel('F1-Score')
ax1.set_xticklabels(comparison_df.index, rotation=45, ha='right')
ax1.axhline(y=0.5, color='r', linestyle='--', alpha=0.5)

# 2. Precision vs Recall
ax2 = axes[0, 1]
ax2.scatter(comparison_df['recall'], comparison_df['precision'], s=100)
for idx, model in enumerate(comparison_df.index):
    ax2.annotate(model, (comparison_df.iloc[idx]['recall'], comparison_df.iloc[idx]['precision']),
                xytext=(5, 5), textcoords='offset points', fontsize=8)
ax2.set_xlabel('Recall')
ax2.set_ylabel('Precision')
ax2.set_title('Precision vs Recall Trade-off', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)

# 3. ROC-AUC comparison
ax3 = axes[1, 0]
comparison_df['roc_auc'].plot(kind='bar', ax=ax3, color='#e74c3c')
ax3.set_title('ROC-AUC by Model', fontsize=14, fontweight='bold')
ax3.set_ylabel('ROC-AUC')
ax3.set_xticklabels(comparison_df.index, rotation=45, ha='right')
ax3.axhline(y=0.9, color='g', linestyle='--', alpha=0.5)

# 4. All metrics heatmap
ax4 = axes[1, 1]
sns.heatmap(comparison_df.T, annot=True, fmt='.3f', cmap='YlOrRd', ax=ax4)
ax4.set_title('All Metrics Heatmap', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

# Find best model
best_model_name = comparison_df['f1'].idxmax()
print(f"\n🏆 Best Model (by F1-Score): {best_model_name}")
print(f"   - F1-Score: {comparison_df.loc[best_model_name, 'f1']:.4f}")
print(f"   - Precision: {comparison_df.loc[best_model_name, 'precision']:.4f}")
print(f"   - Recall: {comparison_df.loc[best_model_name, 'recall']:.4f}")
print(f"   - ROC-AUC: {comparison_df.loc[best_model_name, 'roc_auc']:.4f}")

## 8. Detailed Analysis of Best Model

Let's dive deeper into our best performing model:

In [None]:
# Get best model
best_model = models[best_model_name]

# Make predictions with best model
if best_model_name == 'Isolation Forest':
    y_pred_best = np.where(best_model.predict(X_test_scaled) == -1, 1, 0)
    y_scores_best = -best_model.score_samples(X_test_scaled)
else:
    y_pred_best = best_model.predict(X_test_scaled)
    y_scores_best = best_model.predict_proba(X_test_scaled)[:, 1]

# Create detailed visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Confusion Matrix
ax1 = axes[0, 0]
cm = confusion_matrix(y_test, y_pred_best)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax1,
            xticklabels=['Normal', 'Fraud'], yticklabels=['Normal', 'Fraud'])
ax1.set_title(f'Confusion Matrix - {best_model_name}', fontsize=14, fontweight='bold')
ax1.set_ylabel('True Label')
ax1.set_xlabel('Predicted Label')

# Add text annotations
tn, fp, fn, tp = cm.ravel()
ax1.text(2.5, -0.5, f'True Negatives: {tn:,}', ha='left')
ax1.text(2.5, -0.3, f'False Positives: {fp:,}', ha='left')
ax1.text(2.5, -0.1, f'False Negatives: {fn:,}', ha='left')
ax1.text(2.5, 0.1, f'True Positives: {tp:,}', ha='left')

# 2. ROC Curve
ax2 = axes[0, 1]
fpr, tpr, _ = roc_curve(y_test, y_scores_best)
auc = roc_auc_score(y_test, y_scores_best)
ax2.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {auc:.3f})')
ax2.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
ax2.set_xlim([0.0, 1.0])
ax2.set_ylim([0.0, 1.05])
ax2.set_xlabel('False Positive Rate')
ax2.set_ylabel('True Positive Rate')
ax2.set_title('ROC Curve', fontsize=14, fontweight='bold')
ax2.legend(loc="lower right")
ax2.grid(True, alpha=0.3)

# 3. Precision-Recall Curve
ax3 = axes[1, 0]
precision, recall, _ = precision_recall_curve(y_test, y_scores_best)
avg_precision = average_precision_score(y_test, y_scores_best)
ax3.plot(recall, precision, color='red', lw=2, label=f'PR curve (AP = {avg_precision:.3f})')
ax3.set_xlabel('Recall')
ax3.set_ylabel('Precision')
ax3.set_title('Precision-Recall Curve', fontsize=14, fontweight='bold')
ax3.legend()
ax3.grid(True, alpha=0.3)

# 4. Feature Importance (if available)
ax4 = axes[1, 1]
if hasattr(best_model, 'feature_importances_'):
    importances = best_model.feature_importances_
    indices = np.argsort(importances)[-10:]  # Top 10 features
    
    ax4.barh(range(len(indices)), importances[indices], color='#2ecc71')
    ax4.set_yticks(range(len(indices)))
    ax4.set_yticklabels([feature_cols[i] for i in indices])
    ax4.set_title('Top 10 Feature Importances', fontsize=14, fontweight='bold')
    ax4.set_xlabel('Importance')
else:
    ax4.text(0.5, 0.5, 'Feature importance not available\nfor this model type', 
             ha='center', va='center', transform=ax4.transAxes)
    ax4.set_title('Feature Importance', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

# Print classification report
print(f"\n📋 Classification Report for {best_model_name}:")
print("=" * 60)
print(classification_report(y_test, y_pred_best, target_names=['Normal', 'Fraud'], digits=4))

## 9. Business Impact Analysis

Let's analyze the business impact of our model:

In [None]:
# Business metrics calculation
print("💼 Business Impact Analysis")
print("=" * 50)

# Confusion matrix values
tn, fp, fn, tp = confusion_matrix(y_test, y_pred_best).ravel()

# Calculate business metrics
total_transactions = len(y_test)
avg_transaction_amount = df['Amount'].mean()
avg_fraud_amount = df[df['Class'] == 1]['Amount'].mean()

# Assumptions for business impact
false_positive_cost = 5  # Cost of investigating false positive
fraud_loss_prevented = avg_fraud_amount  # Amount saved per caught fraud

# Calculate costs and savings
investigation_cost = fp * false_positive_cost
fraud_prevented_value = tp * fraud_loss_prevented
fraud_losses = fn * avg_fraud_amount
net_benefit = fraud_prevented_value - investigation_cost - fraud_losses

print(f"\n📊 Model Performance on Test Set:")
print(f"   - Total transactions: {total_transactions:,}")
print(f"   - Frauds caught: {tp}/{tp+fn} ({tp/(tp+fn)*100:.1f}%)")
print(f"   - False alarms: {fp:,} ({fp/total_transactions*100:.2f}% of all transactions)")

print(f"\n💰 Financial Impact (Estimated):")
print(f"   - Average fraud amount: ${avg_fraud_amount:.2f}")
print(f"   - Fraud prevented value: ${fraud_prevented_value:,.2f}")
print(f"   - Investigation costs: ${investigation_cost:,.2f}")
print(f"   - Fraud losses (missed): ${fraud_losses:,.2f}")
print(f"   - Net benefit: ${net_benefit:,.2f}")

print(f"\n📈 Efficiency Metrics:")
print(f"   - Precision (investigation efficiency): {tp/(tp+fp)*100:.1f}%")
print(f"   - For every 100 investigations, {tp/(tp+fp)*100:.0f} are actual frauds")
print(f"   - Workload increase: {(tp+fp)/total_transactions*100:.2f}% of transactions flagged")

# Visualize business impact
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Cost breakdown
ax1.bar(['Fraud\nPrevented', 'Investigation\nCost', 'Fraud\nLosses', 'Net\nBenefit'],
        [fraud_prevented_value, -investigation_cost, -fraud_losses, net_benefit],
        color=['#2ecc71', '#e74c3c', '#e74c3c', '#3498db'])
ax1.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax1.set_title('Financial Impact Breakdown', fontsize=14, fontweight='bold')
ax1.set_ylabel('Amount ($)')

# Detection rates
ax2.pie([tp, fn], labels=['Caught', 'Missed'], autopct='%1.1f%%',
        colors=['#2ecc71', '#e74c3c'], startangle=90)
ax2.set_title('Fraud Detection Rate', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

## 10. Saving the Models

Let's save our trained models for future use:

In [None]:
# Save models and preprocessing objects
print("💾 Saving models and preprocessing objects...")

# Save all models
joblib.dump(models, '../fraud_models.joblib')
print("✅ Saved all models to 'fraud_models.joblib'")

# Save the scaler
joblib.dump(scaler, '../scaler.joblib')
print("✅ Saved scaler to 'scaler.joblib'")

# Save results for comparison
joblib.dump(results, '../model_results.joblib')
print("✅ Saved results to 'model_results.joblib'")

# Save best model separately
joblib.dump(best_model, f'../best_model_{best_model_name.replace(" ", "_").lower()}.joblib')
print(f"✅ Saved best model to 'best_model_{best_model_name.replace(' ', '_').lower()}.joblib'")

# Create a model summary
model_summary = {
    'best_model_name': best_model_name,
    'best_model_metrics': results[best_model_name],
    'feature_columns': feature_cols,
    'training_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'test_set_size': len(y_test),
    'fraud_rate': y_test.mean()
}

joblib.dump(model_summary, '../model_summary.joblib')
print("✅ Saved model summary to 'model_summary.joblib'")

print("\n📦 All models and artifacts saved successfully!")

## 11. Key Takeaways and Conclusions

### 🎯 What We Learned:

1. **Class Imbalance is Critical**: 
   - With only 0.172% fraud rate, accuracy is misleading
   - Precision, Recall, and F1-score are better metrics
   - ROC-AUC helps evaluate overall discrimination ability

2. **Different Approaches Work**:
   - **Class Weights**: Adjust algorithm focus on minority class
   - **SMOTE**: Create synthetic fraud examples
   - **Anomaly Detection**: Learn normal behavior patterns

3. **Model Performance Insights**:
   - Simple models (Logistic Regression) can be surprisingly effective
   - Ensemble methods (Random Forest) often perform best
   - SMOTE can improve recall but may reduce precision

4. **Business Impact Matters**:
   - High recall catches more fraud but increases false positives
   - Balance depends on investigation costs vs fraud losses
   - Model threshold can be adjusted based on business needs

### 💡 Best Practices:

1. **Always use stratified splits** to maintain class distribution
2. **Scale features** for distance-based algorithms
3. **Try multiple approaches** - no single solution fits all
4. **Evaluate with multiple metrics** - not just accuracy
5. **Consider business constraints** when selecting models

### 🚀 Next Steps:

In the next tutorials, you'll learn:
- **Enhanced Models**: XGBoost, LightGBM, and advanced ensembles
- **Deep Learning**: Autoencoders and neural networks for fraud
- **Real-time Systems**: Building production-ready APIs
- **Advanced Techniques**: Graph neural networks and active learning

### 📝 Practice Exercises:

1. Try adjusting the `class_weight` parameter to see its effect
2. Experiment with different SMOTE ratios
3. Create a custom threshold for the best model to optimize for precision or recall
4. Try combining predictions from multiple models (ensemble)

## 🎉 Congratulations!

You've successfully built your first fraud detection models! You now understand:
- How to handle severely imbalanced datasets
- Multiple approaches to fraud detection
- How to evaluate models appropriately
- The importance of business context in ML

Ready for more advanced techniques? Check out `enhanced_fraud_models.ipynb`!