# Security Incident Response Automation

## Overview
This notebook implements automated security incident response. It detects security threats, triggers automated responses, and coordinates incident remediation.

## Prerequisites
- Completed: `predictive-scaling-capacity-planning.ipynb`
- Security monitoring tools deployed
- Incident response procedures defined
- Network policies configured

## Learning Objectives
- Detect security incidents
- Implement automated responses
- Coordinate incident remediation
- Track security events
- Generate incident reports

## Key Concepts
- **Threat Detection**: Identify security threats
- **Incident Response**: Automated remediation
- **Containment**: Isolate affected resources
- **Investigation**: Analyze incident root cause
- **Recovery**: Restore normal operations

## Setup Section

In [None]:
import sys
import os
import json
import logging
from pathlib import Path
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
from typing import Dict, List, Any

# Setup path for utils module - works from any directory
def find_utils_path():
    """Find utils path regardless of current working directory"""
    possible_paths = [
        Path(__file__).parent.parent / 'utils' if '__file__' in dir() else None,
        Path.cwd() / 'notebooks' / 'utils',
        Path.cwd().parent / 'utils',
        Path('/workspace/repo/notebooks/utils'),
        Path('/opt/app-root/src/notebooks/utils'),
        Path('/opt/app-root/src/openshift-aiops-platform/notebooks/utils'),
    ]
    for p in possible_paths:
        if p and p.exists() and (p / 'common_functions.py').exists():
            return str(p)
    current = Path.cwd()
    for _ in range(5):
        utils_path = current / 'notebooks' / 'utils'
        if utils_path.exists():
            return str(utils_path)
        current = current.parent
    return None

utils_path = find_utils_path()
if utils_path:
    sys.path.insert(0, utils_path)
    print(f"✅ Utils path found: {utils_path}")
else:
    print("⚠️ Utils path not found - will use fallback implementations")

# Try to import common functions, with fallback
try:
    from common_functions import setup_environment
    print("✅ Common functions imported")
except ImportError as e:
    print(f"⚠️ Common functions not available: {e}")
    def setup_environment():
        os.makedirs('/opt/app-root/src/data/processed', exist_ok=True)
        os.makedirs('/opt/app-root/src/models', exist_ok=True)
        return {'data_dir': '/opt/app-root/src/data', 'models_dir': '/opt/app-root/src/models'}

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Setup environment
env_info = setup_environment()
logger.info(f"Environment ready: {env_info}")

# Define paths
DATA_DIR = Path('/opt/app-root/src/data')
PROCESSED_DIR = DATA_DIR / 'processed'
PROCESSED_DIR.mkdir(parents=True, exist_ok=True)
REPORTS_DIR = DATA_DIR / 'reports'
REPORTS_DIR.mkdir(parents=True, exist_ok=True)

# Configuration
NAMESPACE = 'self-healing-platform'
THREAT_LEVELS = ['low', 'medium', 'high', 'critical']

logger.info(f"Security incident response automation initialized")

## Implementation Section

### 1. Detect Security Incidents

In [None]:
def detect_security_incident(security_events: List[Dict]) -> Dict[str, Any]:
    """
    Detect security incidents from security events.
    
    Args:
        security_events: List of security events
    
    Returns:
        Incident detection result
    """
    try:
        incidents = []
        
        for event in security_events:
            # Analyze event for threat indicators
            threat_score = 0
            threat_indicators = []
            
            if event.get('failed_auth_attempts', 0) > 5:
                threat_score += 0.3
                threat_indicators.append('Multiple failed auth attempts')
            
            if event.get('suspicious_network_activity', False):
                threat_score += 0.4
                threat_indicators.append('Suspicious network activity')
            
            if event.get('privilege_escalation', False):
                threat_score += 0.5
                threat_indicators.append('Privilege escalation detected')
            
            if threat_score > 0.5:
                incident = {
                    'timestamp': datetime.now().isoformat(),
                    'source': event.get('source', 'unknown'),
                    'threat_score': threat_score,
                    'threat_level': 'critical' if threat_score > 0.8 else 'high' if threat_score > 0.6 else 'medium',
                    'threat_indicators': threat_indicators,
                    'incident_detected': True
                }
                incidents.append(incident)
        
        logger.info(f"Detected {len(incidents)} security incidents")
        return {'incidents': incidents, 'total_detected': len(incidents)}
    except Exception as e:
        logger.error(f"Incident detection error: {e}")
        return {'error': str(e)}

# Simulate security events
security_events = [
    {'source': 'pod-1', 'failed_auth_attempts': 10, 'suspicious_network_activity': True},
    {'source': 'pod-2', 'failed_auth_attempts': 2, 'suspicious_network_activity': False},
    {'source': 'pod-3', 'failed_auth_attempts': 8, 'privilege_escalation': True}
]

detection_result = detect_security_incident(security_events)
print(json.dumps(detection_result, indent=2, default=str))

### 2. Implement Automated Response

In [None]:
def execute_incident_response(incident: Dict[str, Any]) -> Dict[str, Any]:
    """
    Execute automated incident response.
    
    Args:
        incident: Security incident
    
    Returns:
        Response execution result
    """
    try:
        threat_level = incident.get('threat_level', 'low')
        
        response_actions = []
        
        # Determine response based on threat level
        if threat_level == 'critical':
            response_actions = [
                'isolate_pod',
                'revoke_credentials',
                'capture_logs',
                'alert_security_team'
            ]
        elif threat_level == 'high':
            response_actions = [
                'restrict_network_access',
                'increase_monitoring',
                'capture_logs'
            ]
        elif threat_level == 'medium':
            response_actions = [
                'increase_monitoring',
                'log_event'
            ]
        
        response_result = {
            'timestamp': datetime.now().isoformat(),
            'incident_source': incident.get('source', 'unknown'),
            'threat_level': threat_level,
            'response_actions': response_actions,
            'actions_executed': len(response_actions),
            'status': 'success',
            'execution_time_ms': np.random.randint(100, 1000)
        }
        
        logger.info(f"Incident response executed: {len(response_actions)} actions")
        return response_result
    except Exception as e:
        logger.error(f"Incident response error: {e}")
        return {'error': str(e)}

# Test response execution
if detection_result.get('incidents'):
    response = execute_incident_response(detection_result['incidents'][0])
    print(json.dumps(response, indent=2, default=str))

### 3. Coordinate Incident Remediation

In [None]:
def coordinate_remediation(incidents: List[Dict]) -> Dict[str, Any]:
    """
    Coordinate remediation for multiple incidents.
    
    Args:
        incidents: List of incidents
    
    Returns:
        Remediation coordination result
    """
    try:
        remediation_plan = {
            'timestamp': datetime.now().isoformat(),
            'total_incidents': len(incidents),
            'remediation_steps': [],
            'estimated_time_minutes': 0
        }
        
        # Group by threat level
        critical_incidents = [i for i in incidents if i.get('threat_level') == 'critical']
        high_incidents = [i for i in incidents if i.get('threat_level') == 'high']
        
        # Create remediation steps
        if critical_incidents:
            remediation_plan['remediation_steps'].append({
                'priority': 1,
                'action': 'Isolate critical incidents',
                'count': len(critical_incidents),
                'estimated_time_minutes': 5
            })
        
        if high_incidents:
            remediation_plan['remediation_steps'].append({
                'priority': 2,
                'action': 'Restrict high-threat resources',
                'count': len(high_incidents),
                'estimated_time_minutes': 10
            })
        
        remediation_plan['estimated_time_minutes'] = sum(
            s['estimated_time_minutes'] for s in remediation_plan['remediation_steps']
        )
        
        logger.info(f"Remediation plan created: {len(remediation_plan['remediation_steps'])} steps")
        return remediation_plan
    except Exception as e:
        logger.error(f"Remediation coordination error: {e}")
        return {'error': str(e)}

# Test remediation coordination
remediation = coordinate_remediation(detection_result.get('incidents', []))
print(json.dumps(remediation, indent=2, default=str))

### 4. Track Security Events

In [None]:
# Create security event tracking dataframe
security_tracking = pd.DataFrame([
    {
        'timestamp': datetime.now() - timedelta(hours=i),
        'incident_type': np.random.choice(['auth_failure', 'network_anomaly', 'privilege_escalation']),
        'threat_level': np.random.choice(THREAT_LEVELS),
        'detected': np.random.choice([True, False], p=[0.95, 0.05]),
        'response_time_seconds': np.random.randint(1, 60),
        'remediation_successful': np.random.choice([True, False], p=[0.92, 0.08])
    }
    for i in range(168)  # 7 days of data
])

# Save tracking data
tracking_file = PROCESSED_DIR / 'security_incident_tracking.parquet'
security_tracking.to_parquet(tracking_file)

logger.info(f"Saved security incident tracking data")
print(security_tracking.to_string())

## Validation Section

In [None]:
# Verify outputs
assert tracking_file.exists(), "Security tracking file not created"
assert 'incidents' in detection_result, "No incident detection"

detection_rate = security_tracking['detected'].sum() / len(security_tracking)
remediation_success = security_tracking['remediation_successful'].sum() / len(security_tracking)
avg_response_time = security_tracking['response_time_seconds'].mean()

logger.info(f"✅ All validations passed")
print(f"\nSecurity Incident Response Automation Summary:")
print(f"  Tracking Records: {len(security_tracking)}")
print(f"  Detection Rate: {detection_rate:.1%}")
print(f"  Remediation Success Rate: {remediation_success:.1%}")
print(f"  Average Response Time: {avg_response_time:.1f}s")
print(f"  Threat Levels Monitored: {len(THREAT_LEVELS)}")

## Integration Section

This notebook integrates with:
- **Input**: Security events and threat indicators
- **Output**: Incident responses and remediation plans
- **Monitoring**: Security events and response metrics
- **Next**: Cost optimization and resource efficiency

## Next Steps

1. Deploy security incident response
2. Proceed to `cost-optimization-resource-efficiency.ipynb`
3. Implement cost optimization
4. Optimize resource usage
5. Complete advanced scenarios

## References

- ADR-003: Self-Healing Platform Architecture
- ADR-012: Notebook Architecture for End-to-End Workflows
- [NIST Incident Response](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r3.pdf)
- [Kubernetes Security](https://kubernetes.io/docs/concepts/security/)