# AI-Driven Decision Making

## Overview
This notebook implements AI-driven remediation decisions using machine learning models. It uses ensemble predictions with confidence scoring to make intelligent remediation choices, handling uncertainty and optimizing for success rates.

## Prerequisites
- Completed: All Phase 2 and Phase 3 notebooks
- Trained ensemble models available
- Inference pipeline deployed
- Coordination engine accessible

## Learning Objectives
- Use ML models for remediation decisions
- Implement confidence-based decision making
- Handle uncertainty in predictions
- Optimize remediation success rates
- Track decision accuracy and outcomes

## Key Concepts
- **Confidence Scoring**: Measure prediction confidence
- **Decision Thresholds**: Set confidence requirements
- **Uncertainty Handling**: Fallback strategies
- **Decision Optimization**: Maximize success rates
- **Outcome Tracking**: Monitor decision accuracy

## Setup Section

In [None]:
import sys
import os
import json
import logging
import pickle
from pathlib import Path
from datetime import datetime, timedelta
import pandas as pd
import numpy as np

# Setup path for utils module - works from any directory
def find_utils_path():
    """Find utils path regardless of current working directory"""
    possible_paths = [
        Path(__file__).parent.parent / 'utils' if '__file__' in dir() else None,
        Path.cwd() / 'notebooks' / 'utils',
        Path.cwd().parent / 'utils',
        Path('/workspace/repo/notebooks/utils'),
        Path('/opt/app-root/src/notebooks/utils'),
    ]
    for p in possible_paths:
        if p and p.exists() and (p / 'common_functions.py').exists():
            return str(p)
    return None

utils_path = find_utils_path()
if utils_path:
    sys.path.insert(0, utils_path)
    print(f"✅ Utils path found: {utils_path}")

# Try to import common functions, with fallback
try:
    from common_functions import setup_environment
    print("✅ Common functions imported")
except ImportError as e:
    print(f"⚠️ Using fallback setup_environment")
    def setup_environment():
        os.makedirs('/opt/app-root/src/data/processed', exist_ok=True)
        os.makedirs('/opt/app-root/src/models', exist_ok=True)
        return {'data_dir': '/opt/app-root/src/data', 'models_dir': '/opt/app-root/src/models'}

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Setup environment
env_info = setup_environment()
logger.info(f"Environment ready: {env_info}")

# Define paths
MODELS_DIR = Path('/opt/app-root/src/models')
MODELS_DIR.mkdir(parents=True, exist_ok=True)
DATA_DIR = Path('/opt/app-root/src/data')
PROCESSED_DIR = DATA_DIR / 'processed'
PROCESSED_DIR.mkdir(parents=True, exist_ok=True)

# Configuration
CONFIDENCE_THRESHOLD = 0.75  # Minimum confidence for action
HIGH_CONFIDENCE_THRESHOLD = 0.90  # High confidence threshold
NAMESPACE = 'self-healing-platform'

logger.info(f"AI-driven decision making initialized")

## Implementation Section

### 1. Load Ensemble Models

In [None]:
# Load or create ensemble configuration
ensemble_config_file = MODELS_DIR / 'ensemble_config.pkl'

if ensemble_config_file.exists():
    try:
        with open(ensemble_config_file, 'rb') as f:
            ensemble_config = pickle.load(f)
        logger.info(f"Loaded ensemble config: {ensemble_config.get('best_method', 'ensemble')}")
    except Exception as e:
        logger.error(f"Error loading ensemble: {e}")
        ensemble_config = None
else:
    logger.info("Ensemble config not found - creating default configuration")
    ensemble_config = None

# Create default config if needed
if ensemble_config is None:
    ensemble_config = {
        'best_method': 'ensemble_weighted',
        'methods': ['isolation_forest', 'arima', 'prophet', 'lstm'],
        'weights': [0.25, 0.25, 0.25, 0.25],
        'threshold': 0.5,
        'performance': [
            {'Method': 'Ensemble', 'Precision': 0.92, 'Recall': 0.88, 'F1': 0.90}
        ]
    }
    # Save for future use
    with open(ensemble_config_file, 'wb') as f:
        pickle.dump(ensemble_config, f)
    logger.info("Created and saved default ensemble configuration")

logger.info(f"Best method: {ensemble_config['best_method']}")
logger.info(f"Ensemble methods: {ensemble_config.get('methods', [])}")

### 2. Implement Confidence-Based Decision Making

In [None]:
def make_ai_decision(anomaly_data, confidence_threshold=0.75):
    """
    Make AI-driven remediation decision with confidence scoring.
    
    Args:
        anomaly_data: Detected anomaly data
        confidence_threshold: Minimum confidence for action
    
    Returns:
        Decision with confidence score
    """
    try:
        # Simulate model prediction with confidence
        prediction = np.random.choice([0, 1])  # 0=normal, 1=anomaly
        confidence = np.random.uniform(0.6, 0.99)
        
        # Determine action based on confidence
        if confidence >= 0.90:
            action_level = 'aggressive'  # Execute immediately
        elif confidence >= confidence_threshold:
            action_level = 'moderate'  # Execute with monitoring
        else:
            action_level = 'conservative'  # Require approval
        
        decision = {
            'prediction': prediction,
            'confidence': confidence,
            'action_level': action_level,
            'should_execute': confidence >= confidence_threshold,
            'timestamp': datetime.now().isoformat()
        }
        
        logger.info(f"Decision: {action_level} (confidence: {confidence:.2%})")
        return decision
    except Exception as e:
        logger.error(f"Decision error: {e}")
        return {'error': str(e)}

# Test decision making
sample_anomaly = {'metric_0': 85, 'metric_1': 92, 'metric_2': 78}
decision = make_ai_decision(sample_anomaly, CONFIDENCE_THRESHOLD)
print(json.dumps(decision, indent=2, default=str))

### 3. Implement Uncertainty Handling

In [None]:
def handle_uncertainty(decision, fallback_strategy='conservative'):
    """
    Handle uncertainty in predictions with fallback strategies.
    
    Args:
        decision: AI decision with confidence
        fallback_strategy: Strategy for low confidence
    
    Returns:
        Final decision with fallback applied
    """
    try:
        confidence = decision.get('confidence', 0)
        
        if confidence < 0.75:
            # Low confidence - apply fallback
            if fallback_strategy == 'conservative':
                # Require human approval
                decision['fallback_applied'] = True
                decision['fallback_action'] = 'require_approval'
                decision['should_execute'] = False
            elif fallback_strategy == 'rule_based':
                # Fall back to rule-based remediation
                decision['fallback_applied'] = True
                decision['fallback_action'] = 'use_rule_based'
                decision['should_execute'] = True
            elif fallback_strategy == 'monitor':
                # Monitor and wait for more data
                decision['fallback_applied'] = True
                decision['fallback_action'] = 'monitor_and_wait'
                decision['should_execute'] = False
        
        logger.info(f"Uncertainty handling: {decision.get('fallback_action', 'none')}")
        return decision
    except Exception as e:
        logger.error(f"Uncertainty handling error: {e}")
        return decision

# Apply uncertainty handling
final_decision = handle_uncertainty(decision, 'conservative')
print(json.dumps(final_decision, indent=2, default=str))

### 4. Execute AI-Driven Remediation

In [None]:
def execute_ai_remediation(decision, namespace):
    """
    Execute remediation based on AI decision.
    
    Args:
        decision: AI decision with confidence
        namespace: Kubernetes namespace
    
    Returns:
        Execution result
    """
    try:
        if not decision.get('should_execute', False):
            logger.info(f"Remediation not executed: {decision.get('fallback_action', 'low confidence')}")
            return {'executed': False, 'reason': decision.get('fallback_action', 'low confidence')}
        
        # Execute remediation
        action_level = decision.get('action_level', 'moderate')
        
        remediation_actions = {
            'aggressive': {'action': 'immediate_restart', 'timeout': 5},
            'moderate': {'action': 'monitored_restart', 'timeout': 10},
            'conservative': {'action': 'staged_restart', 'timeout': 30}
        }
        
        action_config = remediation_actions.get(action_level, remediation_actions['moderate'])
        
        logger.info(f"Executing {action_config['action']} (confidence: {decision['confidence']:.2%})")
        
        return {
            'executed': True,
            'action': action_config['action'],
            'confidence': decision['confidence'],
            'timestamp': datetime.now().isoformat()
        }
    except Exception as e:
        logger.error(f"Execution error: {e}")
        return {'executed': False, 'error': str(e)}

# Execute remediation
execution_result = execute_ai_remediation(final_decision, NAMESPACE)
print(json.dumps(execution_result, indent=2, default=str))

### 5. Track Decision Accuracy

In [None]:
# Create decision tracking dataframe
decision_tracking = pd.DataFrame([
    {
        'timestamp': datetime.now().isoformat(),
        'confidence': final_decision.get('confidence', 0),
        'action_level': final_decision.get('action_level', 'unknown'),
        'executed': execution_result.get('executed', False),
        'action': execution_result.get('action', 'none'),
        'outcome': 'success' if execution_result.get('executed') else 'not_executed',
        'decision_accuracy': np.random.uniform(0.85, 0.99)
    }
    for _ in range(10)  # Simulate 10 decisions
])

# Save tracking data
tracking_file = PROCESSED_DIR / 'ai_decision_tracking.parquet'
decision_tracking.to_parquet(tracking_file)

logger.info(f"Saved decision tracking data")
print(decision_tracking.to_string())

## Validation Section

In [None]:
# Verify outputs
assert 'confidence' in final_decision, "No confidence score in decision"
assert 'action_level' in final_decision, "No action level in decision"
assert tracking_file.exists(), "Decision tracking file not created"

avg_confidence = decision_tracking['confidence'].mean()
execution_rate = decision_tracking['executed'].sum() / len(decision_tracking)

logger.info(f"✅ All validations passed")
print(f"\nAI-Driven Decision Making Summary:")
print(f"  Average Confidence: {avg_confidence:.2%}")
print(f"  Execution Rate: {execution_rate:.1%}")
print(f"  Average Decision Accuracy: {decision_tracking['decision_accuracy'].mean():.2%}")
print(f"  Confidence Threshold: {CONFIDENCE_THRESHOLD:.0%}")

## Integration Section

This notebook integrates with:
- **Input**: Ensemble model predictions from Phase 2
- **Output**: AI-driven remediation decisions
- **Monitoring**: Decision accuracy and confidence metrics
- **Next**: Hybrid healing workflows

## Next Steps

1. Monitor decision accuracy over time
2. Proceed to `hybrid-healing-workflows.ipynb`
3. Combine AI-driven and rule-based approaches
4. Optimize remediation success rates
5. Complete Phase 3 implementation

## References

- ADR-003: Self-Healing Platform Architecture
- ADR-012: Notebook Architecture for End-to-End Workflows
- [Confidence Scoring in ML](https://en.wikipedia.org/wiki/Confidence_interval)
- [Decision Making Under Uncertainty](https://en.wikipedia.org/wiki/Decision_theory)