# BNPL ML Model Development

**Objective**: Develop production-ready ML model for BNPL default risk prediction

**Performance Targets**:
- Beat current 3.5x risk discrimination baseline
- Achieve >40% precision on high-risk segment
- Maintain <100ms inference latency
- Production deployment ready

**Focus**: Primary target is `will_default` prediction (binary classification).

## Implementation Steps

### **Step 1: Algorithm Research & Selection** 
- Evaluate ML algorithms for BNPL risk assessment
- Consider inference latency requirements (<100ms)
- Assess interpretability for regulatory compliance
- Select candidate algorithms for testing

### **Step 2: Data Loading & Feature Engineering**
- Load engineered features using `BNPLFeatureEngineer` class
- Validate feature quality and distribution
- Prepare data for modeling

### **Step 3: Baseline Performance Establishment**
- Implement current underwriting baseline (3.5x discrimination)
- Train candidate models on engineered features
- Establish performance benchmarks

### **Step 4: Model Training & Evaluation**
- Cross-validation framework
- Business metrics: discrimination ratio, precision/recall
- Technical metrics: latency, model size, memory usage
- Feature importance analysis

### **Step 5: Production Readiness Assessment**
- Inference latency benchmarking
- Model serialization and deployment format
- Edge case handling and fallback strategies
- A/B testing framework preparation

### **Step 6: Model Selection & Recommendations**
- Compare algorithms across business and technical dimensions
- Select final model for production deployment
- Document trade-offs and deployment considerations
- Prepare ML engineering handoff

In [None]:
# Add project root to Python path for imports
import sys
import os
project_root = os.path.abspath(os.path.join(os.getcwd(), '../..'))
if project_root not in sys.path:
    sys.path.insert(0, project_root)
    
print(f"Project root added to path: {project_root}")

In [None]:
# Environment setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
import joblib
import time
import warnings
warnings.filterwarnings('ignore')

# Import our custom feature engineering class
from flit_ml.features.bnpl_feature_engineering import BNPLFeatureEngineer

# Configuration
pd.set_option('display.max_columns', 50)
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

print("ML development environment ready!")
print(f"Available ML libraries loaded successfully")

## Step 1: Algorithm Research & Selection

### ML Algorithm Comparison for BNPL Risk Assessment

| Algorithm | Inference Speed | Interpretability | Performance | Memory Usage | Production Ready | Pros | Cons |
|-----------|----------------|------------------|-------------|--------------|------------------|------|------|
| **LogisticRegression** | Fastest (~1ms) | High (coefficients) | Good baseline | Minimal | Excellent | Fast inference, Interpretable, Stable, Low memory | Linear assumptions, May miss complex patterns |
| **RandomForestClassifier** | Fast (~5-10ms) | Medium (feature importance) | Strong for tabular | Moderate | Good | Handles non-linearity, Feature importance, Robust | Larger model size, Less interpretable |
| **XGBoost** | Fast (~10-20ms) | Medium (SHAP values) | Often best for tabular | Moderate | Good | High performance, Feature importance, Handles missing values | Hyperparameter tuning, Model complexity |
| **LightGBM** | Very Fast (~5ms) | Medium (SHAP values) | Excellent for tabular | Low | Excellent | Fast training/inference, Memory efficient, High performance | Can overfit small datasets, Less interpretable than linear |

### Selection Criteria for BNPL Production System

1. **Inference latency**: <100ms (preferably <20ms)
2. **Performance**: Beat 3.5x discrimination ratio
3. **Interpretability**: Regulatory compliance requirements
4. **Production stability**: Minimal maintenance overhead
5. **Memory efficiency**: Scalable serving architecture

### Recommended Starting Models

Based on BNPL constraints, we'll focus on all 4 models before considering more complex models.

## Step 2: Data Loading & Feature Engineering

Load engineered features using our production-ready `BNPLFeatureEngineer` class.

In [None]:
# Initialize feature engineer with clean output for development                                                                                                                                       
feature_engineer = BNPLFeatureEngineer(verbose=True)                                                                                                                                               
                                                                                                                                                                                                        
# Load and engineer features                                                                                                                                                                          
print("🚀 Loading and engineering features for model development...")                                                                                                                                 
df_features, feature_metadata = feature_engineer.engineer_features(                                                                                                                                   
    sample_size=1000,  # Sample size for development                                                                                                                                                  
    random_seed=42                                                                                                                                                                                    
)                                                                                                                                                                                                     
                                                                                                                                                                                                      
print(f"\n📊 Feature Engineering Complete:")                                                                                                                                                          
print(f"   Dataset shape: {df_features.shape}")                                                                                                                                                       
print(f"   Features available: {len(feature_metadata['all_features'])}")                                                                                                                              
print(f"   Target variable: {feature_metadata['target_variable']}")                                                                                                                                   
print(f"   Default rate: {df_features['will_default'].mean():.1%}")

In [None]:
# Validate feature quality and distribution
print("🔍 Feature Quality Assessment:")
print(f"   Data shape: {df_features.shape}")
print(f"   Memory usage: {df_features.memory_usage(deep=True).sum() / 1024**2:.1f} MB")
print(f"   Default rate: {df_features['will_default'].mean():.1%}")

# Check for any remaining issues
missing_values = df_features.isnull().sum().sum()
print(f"   Missing values: {missing_values}")

# Feature type summary
print(f"\n📋 Feature Types:")
for feature_type, features in feature_metadata.items():
    if isinstance(features, list) and feature_type.endswith('_features'):
        print(f"   {feature_type}: {len(features)} features")

# Display sample of features
print(f"\n📋 Sample of engineered features:")
display_cols = df_features.columns[:10].tolist()
if 'will_default' not in display_cols:
    display_cols.append('will_default')
    
df_features[display_cols].head()

In [None]:
# Prepare data for modeling
print("🔧 Preparing data for model training...")

# Separate features and target - exclude both target variables
exclude_cols = ['will_default', 'days_to_first_missed_payment']  # Remove both target variables, even though the 2nd one is secondary (for more enhanced models)
feature_cols = [col for col in df_features.columns if col not in exclude_cols]
numeric_features = ['amount', 'risk_score', 'payment_credit_limit', 'price_comparison_time', 'customer_tenure_days']
categorical_features = [col for col in feature_cols if col not in numeric_features]
primary_target_col = exclude_cols[0]  # 'will_default'
secondary_target_col = exclude_cols[1]  # 'days_to_first_missed_payment'

# For now, we will focus on the primary target
target_col = primary_target_col

X = df_features[feature_cols]
y = df_features[target_col]

print(f"   Features shape: {X.shape}")
print(f"   Target shape: {y.shape}")
print(f"   Target distribution: {y.value_counts().to_dict()}")
print(f"   Class balance: {y.mean():.1%} positive class")

# Train-test split with stratification
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42, 
    stratify=y
)

print(f"\n📊 Train-Test Split:")
print(f"   Training set: {X_train.shape}")
print(f"   Test set: {X_test.shape}")
print(f"   Train default rate: {y_train.mean():.1%}")
print(f"   Test default rate: {y_test.mean():.1%}")

In [None]:
numeric_features

In [None]:
X_train.columns.sort_values()

In [None]:
# Feature scaling for algorithms that need it
# scaler = StandardScaler()
# X_train_scaled = scaler.fit_transform(X_train)
# X_test_scaled = scaler.transform(X_test)


 # Create preprocessor
preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numeric_features),      # Scale numeric features
    ('cat', 'passthrough', categorical_features)      # Leave categorical unchanged
])

X_train_processed = preprocessor.fit_transform(X_train)  # Fit AND transform train
X_test_processed = preprocessor.transform(X_test)        # Only transform test (using train's fit) --> Avoid inconsistent scaling train vs test


print(f"\n✅ Data preparation complete")
print(f"   Ready for model training and evaluation")

## Step 3: Baseline Performance Establishment

Implement current underwriting baseline and train candidate models.

In [None]:
# Current underwriting baseline analysis
print("📊 Current Underwriting Baseline Analysis")
print("=" * 45)

# Analyze current risk_score and risk_level performance
if 'risk_score' in df_features.columns and 'risk_level_encoded' in df_features.columns:
    
    # Risk level performance
    risk_performance = df_features.groupby('risk_level_encoded')['will_default'].agg(['count', 'mean']).round(3)
    risk_performance.columns = ['transaction_count', 'default_rate']
    
    print(f"\n🎯 Current Risk Level Performance:")
    risk_level_mapping = {0: 'Low', 1: 'Medium', 2: 'High'}
    for idx, row in risk_performance.iterrows():
        level_name = risk_level_mapping.get(idx, f'Level_{idx}')
        print(f"   {level_name} Risk: {row['default_rate']:.1%} default rate ({row['transaction_count']:,} transactions)")
    
    # Calculate discrimination ratio
    high_risk_rate = risk_performance.loc[2, 'default_rate']  # High risk (encoded as 2)
    low_risk_rate = risk_performance.loc[0, 'default_rate']   # Low risk (encoded as 0)
    current_discrimination = high_risk_rate / low_risk_rate if low_risk_rate > 0 else 0
    
    print(f"\n📈 Current System Discrimination:")
    print(f"   High risk default rate: {high_risk_rate:.1%}")
    print(f"   Low risk default rate: {low_risk_rate:.1%}")
    print(f"   Discrimination ratio: {current_discrimination:.1f}x")
    print(f"   Target to beat: >{current_discrimination:.1f}x")
    
    # Risk score distribution analysis
    print(f"\n📊 Risk Score Distribution:")
    risk_score_stats = df_features['risk_score'].describe()
    print(f"   Range: {risk_score_stats['min']:.2f} - {risk_score_stats['max']:.2f}")
    print(f"   Mean: {risk_score_stats['mean']:.2f}")
    print(f"   Std: {risk_score_stats['std']:.2f}")
    
else:
    print("⚠️  Risk score/level features not available in dataset")
    current_discrimination = 3.5  # Use known baseline from context
    print(f"   Using known baseline discrimination ratio: {current_discrimination}x")

# Set performance targets
target_discrimination = max(current_discrimination * 1.1, 4.0)  # At least 10% improvement or 4.0x
target_precision = 0.40  # 40% precision on high-risk segment

print(f"\n🎯 Performance Targets for ML Models:")
print(f"   Discrimination ratio: >{target_discrimination:.1f}x")
print(f"   High-risk precision: >{target_precision:.0%}")
print(f"   Inference latency: <100ms")
print(f"   Model size: <50MB (for fast loading)")

---
## ⚠️ IMPLEMENTATION GATE

**NEXT**: Ready to proceed with Step 4 - Model Training & Evaluation

**Current Status**:
- ✅ Features engineered and validated
- ✅ Algorithms researched and selected
- ✅ Baseline performance established
- ✅ Data prepared for training

**Ready to implement**:
- Model training with cross-validation
- Business metrics evaluation
- Latency benchmarking
- Feature importance analysis