# Model Performance Monitoring

## Overview
This notebook implements comprehensive model performance monitoring. It tracks accuracy, detects model drift, and triggers automated retraining when performance degrades.

## Prerequisites
- Completed: `prometheus-metrics-monitoring.ipynb`
- Trained models available
- Inference pipeline deployed
- Historical performance data available

## Learning Objectives
- Track model accuracy over time
- Detect model drift
- Monitor prediction confidence
- Trigger automated retraining
- Implement model versioning

## Key Concepts
- **Model Drift**: Performance degradation over time
- **Accuracy Tracking**: Monitor prediction accuracy
- **Confidence Scoring**: Track prediction confidence
- **Automated Retraining**: Trigger retraining on drift
- **Model Versioning**: Track model versions and performance

## Setup Section

In [None]:
import sys
import os
import json
import logging
from pathlib import Path
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
from scipy import stats
from typing import Dict, List, Any

# Setup path for utils module - works from any directory
def find_utils_path():
    """Find utils path regardless of current working directory"""
    possible_paths = [
        Path(__file__).parent.parent / 'utils' if '__file__' in dir() else None,
        Path.cwd() / 'notebooks' / 'utils',
        Path.cwd().parent / 'utils',
        Path('/workspace/repo/notebooks/utils'),
        Path('/opt/app-root/src/notebooks/utils'),
        Path('/opt/app-root/src/openshift-aiops-platform/notebooks/utils'),
    ]
    for p in possible_paths:
        if p and p.exists() and (p / 'common_functions.py').exists():
            return str(p)
    current = Path.cwd()
    for _ in range(5):
        utils_path = current / 'notebooks' / 'utils'
        if utils_path.exists():
            return str(utils_path)
        current = current.parent
    return None

utils_path = find_utils_path()
if utils_path:
    sys.path.insert(0, utils_path)
    print(f"✅ Utils path found: {utils_path}")
else:
    print("⚠️ Utils path not found - will use fallback implementations")

# Try to import common functions, with fallback
try:
    from common_functions import setup_environment
    print("✅ Common functions imported")
except ImportError as e:
    print(f"⚠️ Common functions not available: {e}")
    def setup_environment():
        os.makedirs('/opt/app-root/src/data/processed', exist_ok=True)
        os.makedirs('/opt/app-root/src/models', exist_ok=True)
        return {'data_dir': '/opt/app-root/src/data', 'models_dir': '/opt/app-root/src/models'}

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Setup environment
env_info = setup_environment()
logger.info(f"Environment ready: {env_info}")

# Define paths
DATA_DIR = Path('/opt/app-root/src/data')
PROCESSED_DIR = DATA_DIR / 'processed'
PROCESSED_DIR.mkdir(parents=True, exist_ok=True)
MODELS_DIR = Path('/opt/app-root/src/models')
MODELS_DIR.mkdir(parents=True, exist_ok=True)

# Configuration
NAMESPACE = 'self-healing-platform'
ACCURACY_THRESHOLD = 0.80  # Minimum acceptable accuracy
DRIFT_THRESHOLD = 0.05     # 5% accuracy drop triggers retraining
CONFIDENCE_THRESHOLD = 0.75

logger.info(f"Model performance monitoring initialized")

## Implementation Section

### 1. Track Model Accuracy

In [None]:
def track_model_accuracy(predictions: np.ndarray, ground_truth: np.ndarray, model_name: str) -> Dict[str, Any]:
    """
    Track model accuracy metrics.
    
    Args:
        predictions: Model predictions
        ground_truth: Ground truth labels
        model_name: Name of the model
    
    Returns:
        Accuracy metrics
    """
    try:
        # Calculate metrics
        accuracy = np.mean(predictions == ground_truth)
        precision = np.sum((predictions == 1) & (ground_truth == 1)) / np.sum(predictions == 1)
        recall = np.sum((predictions == 1) & (ground_truth == 1)) / np.sum(ground_truth == 1)
        f1 = 2 * (precision * recall) / (precision + recall)
        
        metrics = {
            'timestamp': datetime.now().isoformat(),
            'model_name': model_name,
            'accuracy': accuracy,
            'precision': precision,
            'recall': recall,
            'f1_score': f1,
            'samples': len(predictions)
        }
        
        logger.info(f"Model {model_name} accuracy: {accuracy:.2%}")
        return metrics
    except Exception as e:
        logger.error(f"Accuracy tracking error: {e}")
        return {'error': str(e)}

# Simulate predictions and ground truth
predictions = np.random.choice([0, 1], size=1000, p=[0.3, 0.7])
ground_truth = np.random.choice([0, 1], size=1000, p=[0.3, 0.7])

accuracy_metrics = track_model_accuracy(predictions, ground_truth, 'ensemble-anomaly-detector')
print(json.dumps(accuracy_metrics, indent=2, default=str))

### 2. Detect Model Drift

In [None]:
def detect_model_drift(current_accuracy: float, baseline_accuracy: float, drift_threshold: float = 0.05) -> Dict[str, Any]:
    """
    Detect model drift by comparing current and baseline accuracy.
    
    Args:
        current_accuracy: Current model accuracy
        baseline_accuracy: Baseline model accuracy
        drift_threshold: Threshold for drift detection
    
    Returns:
        Drift detection result
    """
    try:
        accuracy_drop = baseline_accuracy - current_accuracy
        drift_detected = accuracy_drop > drift_threshold
        drift_severity = 'critical' if accuracy_drop > 0.10 else 'high' if accuracy_drop > 0.05 else 'low'
        
        drift_result = {
            'timestamp': datetime.now().isoformat(),
            'baseline_accuracy': baseline_accuracy,
            'current_accuracy': current_accuracy,
            'accuracy_drop': accuracy_drop,
            'drift_detected': drift_detected,
            'drift_severity': drift_severity,
            'retraining_required': drift_detected
        }
        
        if drift_detected:
            logger.warning(f"Model drift detected: {accuracy_drop:.2%} drop ({drift_severity})")
        else:
            logger.info(f"No model drift detected")
        
        return drift_result
    except Exception as e:
        logger.error(f"Drift detection error: {e}")
        return {'error': str(e)}

# Test drift detection
baseline_acc = 0.92
current_acc = 0.87

drift_result = detect_model_drift(current_acc, baseline_acc, DRIFT_THRESHOLD)
print(json.dumps(drift_result, indent=2, default=str))

### 3. Monitor Prediction Confidence

In [None]:
def monitor_confidence(confidences: np.ndarray, threshold: float = 0.75) -> Dict[str, Any]:
    """
    Monitor prediction confidence distribution.
    
    Args:
        confidences: Array of prediction confidences
        threshold: Confidence threshold
    
    Returns:
        Confidence monitoring result
    """
    try:
        high_confidence = np.sum(confidences >= threshold) / len(confidences)
        low_confidence = np.sum(confidences < threshold) / len(confidences)
        
        confidence_result = {
            'timestamp': datetime.now().isoformat(),
            'mean_confidence': float(np.mean(confidences)),
            'median_confidence': float(np.median(confidences)),
            'std_confidence': float(np.std(confidences)),
            'high_confidence_rate': high_confidence,
            'low_confidence_rate': low_confidence,
            'min_confidence': float(np.min(confidences)),
            'max_confidence': float(np.max(confidences))
        }
        
        logger.info(f"Confidence monitoring: {high_confidence:.1%} high confidence predictions")
        return confidence_result
    except Exception as e:
        logger.error(f"Confidence monitoring error: {e}")
        return {'error': str(e)}

# Simulate confidences
confidences = np.random.uniform(0.5, 0.99, size=1000)

confidence_result = monitor_confidence(confidences, CONFIDENCE_THRESHOLD)
print(json.dumps(confidence_result, indent=2, default=str))

### 4. Trigger Automated Retraining

In [None]:
def trigger_retraining(drift_result: Dict[str, Any], model_name: str) -> Dict[str, Any]:
    """
    Trigger automated model retraining if drift detected.
    
    Args:
        drift_result: Drift detection result
        model_name: Name of the model
    
    Returns:
        Retraining trigger result
    """
    try:
        if not drift_result.get('retraining_required', False):
            logger.info(f"No retraining required for {model_name}")
            return {'triggered': False, 'reason': 'No drift detected'}
        
        # Trigger retraining
        retraining_config = {
            'model_name': model_name,
            'trigger_time': datetime.now().isoformat(),
            'reason': f"Accuracy drop: {drift_result['accuracy_drop']:.2%}",
            'severity': drift_result['drift_severity'],
            'training_config': {
                'epochs': 50,
                'batch_size': 32,
                'learning_rate': 0.001,
                'early_stopping': True
            }
        }
        
        logger.info(f"Retraining triggered for {model_name}")
        return {'triggered': True, 'config': retraining_config}
    except Exception as e:
        logger.error(f"Retraining trigger error: {e}")
        return {'error': str(e)}

# Test retraining trigger
retraining_result = trigger_retraining(drift_result, 'ensemble-anomaly-detector')
print(json.dumps(retraining_result, indent=2, default=str))

### 5. Track Model Performance History

In [None]:
# Create model performance tracking dataframe
performance_tracking = pd.DataFrame([
    {
        'timestamp': datetime.now() - timedelta(hours=i),
        'model_name': 'ensemble-anomaly-detector',
        'accuracy': np.random.uniform(0.85, 0.95),
        'precision': np.random.uniform(0.80, 0.95),
        'recall': np.random.uniform(0.80, 0.95),
        'f1_score': np.random.uniform(0.80, 0.95),
        'mean_confidence': np.random.uniform(0.75, 0.95),
        'drift_detected': np.random.choice([True, False], p=[0.1, 0.9]),
        'retraining_triggered': np.random.choice([True, False], p=[0.05, 0.95])
    }
    for i in range(24)  # 24 hours of data
])

# Save tracking data
tracking_file = PROCESSED_DIR / 'model_performance_tracking.parquet'
performance_tracking.to_parquet(tracking_file)

logger.info(f"Saved model performance tracking data")
print(performance_tracking.to_string())

## Validation Section

In [None]:
# Verify outputs
assert tracking_file.exists(), "Model performance tracking file not created"
assert 'accuracy' in accuracy_metrics, "No accuracy metric"
assert 'drift_detected' in drift_result, "No drift detection result"

avg_accuracy = performance_tracking['accuracy'].mean()
drift_rate = performance_tracking['drift_detected'].sum() / len(performance_tracking)
retraining_rate = performance_tracking['retraining_triggered'].sum() / len(performance_tracking)

logger.info(f"✅ All validations passed")
print(f"\nModel Performance Monitoring Summary:")
print(f"  Performance Records: {len(performance_tracking)}")
print(f"  Average Accuracy: {avg_accuracy:.2%}")
print(f"  Drift Detection Rate: {drift_rate:.1%}")
print(f"  Retraining Trigger Rate: {retraining_rate:.1%}")
print(f"  Accuracy Threshold: {ACCURACY_THRESHOLD:.0%}")
print(f"  Drift Threshold: {DRIFT_THRESHOLD:.0%}")

## Integration Section

This notebook integrates with:
- **Input**: Model predictions and ground truth
- **Output**: Performance metrics and retraining triggers
- **Monitoring**: Accuracy, drift, and confidence tracking
- **Next**: Healing success tracking

## Next Steps

1. Monitor model performance continuously
2. Proceed to `healing-success-tracking.ipynb`
3. Track remediation success rates
4. Analyze failure patterns
5. Complete Phase 7 implementation

## References

- ADR-003: Self-Healing Platform Architecture
- ADR-012: Notebook Architecture for End-to-End Workflows
- [Model Drift Detection](https://en.wikipedia.org/wiki/Concept_drift)
- [Model Monitoring Best Practices](https://ml-ops.systems/)