# 5G Network Anomaly Detection Demo

This notebook demonstrates the complete pipeline for detecting anomalies in 5G network metrics using autoencoders and generating natural language reports with LLMs.

## Overview
1. **Data Generation**: Create synthetic 5G network metrics
2. **Model Training**: Train an autoencoder for anomaly detection
3. **Anomaly Detection**: Detect anomalies in new data
4. **LLM Reporting**: Generate natural language reports
5. **Visualization**: Create comprehensive plots and analysis

---

## 1. Setup and Imports

In [None]:
# Standard imports
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Add project root to path
sys.path.append('.')

# Import our custom modules
from data.generate_synthetic_data import generate_synthetic_5g_data
from models.anomaly_detector import AnomalyDetector, plot_training_history, plot_reconstruction_errors
from models.llm_reporter import NetworkAnomalyReporter

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12

print("✅ All imports successful!")
print(f"Python version: {sys.version}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## 2. Data Generation and Exploration

Let's generate synthetic 5G network metrics with known anomalies for demonstration.

In [None]:
# Generate synthetic 5G network data
print("📊 Generating synthetic 5G network metrics...")

# Create training data (less anomalies for better training)
train_data = generate_synthetic_5g_data(num_samples=5000, anomaly_rate=0.05)

# Create test data (more anomalies for better demonstration)
test_data = generate_synthetic_5g_data(num_samples=2000, anomaly_rate=0.15)

print(f"Training data: {len(train_data)} samples, {train_data['is_anomaly'].sum()} anomalies ({train_data['is_anomaly'].mean():.1%})")
print(f"Test data: {len(test_data)} samples, {test_data['is_anomaly'].sum()} anomalies ({test_data['is_anomaly'].mean():.1%})")

# Define feature columns
feature_columns = [
    'prb_utilization',
    'active_ue_count', 
    'throughput_mbps',
    'latency_ms',
    'handover_success_rate',
    'snr_db',
    'packet_loss_rate'
]

print(f"Feature columns: {feature_columns}")

In [None]:
# Explore the training data
print("📈 Training Data Summary:")
display(train_data[feature_columns + ['is_anomaly']].describe())

# Show sample of the data
print("\n📋 Sample Data:")
display(train_data[feature_columns + ['is_anomaly']].head(10))

In [None]:
# Visualize the training data distribution
fig, axes = plt.subplots(3, 3, figsize=(20, 15))
axes = axes.ravel()

for i, col in enumerate(feature_columns):
    # Plot normal vs anomaly distributions
    normal_data = train_data[train_data['is_anomaly'] == False][col]
    anomaly_data = train_data[train_data['is_anomaly'] == True][col]
    
    axes[i].hist(normal_data, bins=50, alpha=0.7, label='Normal', density=True, color='blue')
    axes[i].hist(anomaly_data, bins=50, alpha=0.7, label='Anomaly', density=True, color='red')
    axes[i].set_title(f'{col.replace("_", " ").title()}')
    axes[i].legend()
    axes[i].grid(True, alpha=0.3)

# Hide the last two empty subplots
axes[7].axis('off')
axes[8].axis('off')

plt.tight_layout()
plt.suptitle('Distribution of 5G Network Metrics: Normal vs Anomalous', fontsize=16, y=1.02)
plt.show()

print("🎯 The red histograms show anomalous values that deviate significantly from normal patterns.")

## 3. Model Training

Now let's train our autoencoder model on normal network data.

In [None]:
# Initialize the anomaly detector
print("🤖 Initializing anomaly detector...")

detector = AnomalyDetector(
    input_dim=len(feature_columns),
    encoding_dim=16,  # Smaller for demo
    hidden_dim=32     # Smaller for demo
)

print(f"Model architecture:")
print(f"- Input dimension: {len(feature_columns)}")
print(f"- Hidden dimension: 32")
print(f"- Encoding dimension: 16")
print(f"- Device: {detector.device}")

In [None]:
# Prepare training data
print("📚 Preparing training data...")

train_loader, val_loader = detector.prepare_data(
    train_data, 
    feature_columns, 
    test_size=0.2
)

print(f"Training samples: {len(train_loader.dataset)}")
print(f"Validation samples: {len(val_loader.dataset)}")
print("✅ Data preparation complete!")

In [None]:
# Train the model
print("🚀 Starting model training...")

history = detector.train(
    train_loader,
    val_loader,
    epochs=50,  # Fewer epochs for demo
    learning_rate=0.001,
    patience=10
)

print(f"\n✅ Training completed!")
print(f"Epochs trained: {history['epochs_trained']}")
print(f"Final training loss: {history['train_losses'][-1]:.6f}")
print(f"Final validation loss: {history['val_losses'][-1]:.6f}")

In [None]:
# Plot training history
plot_training_history(history)
plt.title('Autoencoder Training Progress')
plt.show()

print("📊 The training loss should decrease over time, indicating the model is learning the normal patterns.")

## 4. Anomaly Detection

Let's calculate the anomaly threshold and test our model on new data.

In [None]:
# Calculate anomaly threshold
print("🎯 Calculating anomaly threshold...")

threshold = detector.calculate_threshold(train_data, feature_columns, percentile=95)
print(f"Anomaly threshold set to: {threshold:.6f}")

In [None]:
# Evaluate model on test data
print("🔍 Evaluating model on test data...")

metrics = detector.evaluate_model(test_data, feature_columns)

print("\n📊 Model Performance Metrics:")
for metric, value in metrics.items():
    print(f"- {metric.capitalize()}: {value:.4f}")

In [None]:
# Get reconstruction errors for visualization
reconstruction_errors = detector.detect_anomalies(test_data, feature_columns)
true_labels = test_data['is_anomaly'].values

# Plot reconstruction errors
plot_reconstruction_errors(reconstruction_errors, true_labels, threshold)
plt.suptitle('Reconstruction Error Analysis', fontsize=16)
plt.show()

print("🎯 Left plot shows the distribution - anomalies should have higher reconstruction errors.")
print("📈 Right plot shows errors over time - red dots are true anomalies.")

In [None]:
# Detailed anomaly analysis
predictions = reconstruction_errors > threshold
anomaly_indices = np.where(predictions)[0]

print(f"🔍 Anomaly Detection Results:")
print(f"- Total samples analyzed: {len(test_data)}")
print(f"- True anomalies: {true_labels.sum()}")
print(f"- Predicted anomalies: {predictions.sum()}")
print(f"- Accuracy: {((predictions == true_labels).sum() / len(true_labels)):.3f}")

# Show some detected anomalies
if len(anomaly_indices) > 0:
    print(f"\n📋 Sample of detected anomalies:")
    anomaly_samples = test_data.iloc[anomaly_indices[:5]][feature_columns + ['is_anomaly']]
    anomaly_samples['reconstruction_error'] = reconstruction_errors[anomaly_indices[:5]]
    display(anomaly_samples)

## 5. LLM Report Generation

Now let's generate natural language reports about the detected anomalies using our LLM reporter.

In [None]:
# Initialize the LLM reporter
print("🤖 Initializing LLM reporter...")

reporter = NetworkAnomalyReporter(model_name="google/flan-t5-small")

print("✅ LLM reporter initialized successfully!")

In [None]:
# Generate comprehensive anomaly report
print("📝 Generating comprehensive anomaly report...")

report = reporter.create_detailed_report(
    test_data, 
    anomaly_indices,
    timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S")
)

print("✅ Report generation complete!")
print(f"Report covers {len(anomaly_indices)} detected anomalies out of {len(test_data)} samples")

In [None]:
# Display the generated report
print("="*80)
print("📊 5G NETWORK ANOMALY REPORT")
print("="*80)

print(f"\n🕐 Timestamp: {report['timestamp']}")
print(f"📈 Statistics: {report['anomaly_statistics']['total_anomalies']} anomalies out of {report['anomaly_statistics']['total_samples']} samples ({report['anomaly_statistics']['anomaly_rate']})")

print(f"\n📋 EXECUTIVE SUMMARY:")
print("-" * 40)
print(report['executive_summary'])

print(f"\n🔧 TECHNICAL DETAILS:")
print("-" * 40)
print(report['technical_details'])

print(f"\n💡 RECOMMENDATIONS:")
print("-" * 40)
print(report['recommendations'])

print("\n" + "="*80)

In [None]:
# Generate real-time alerts for critical issues
print("🚨 Generating real-time alerts for critical issues...")

alerts = reporter.generate_real_time_alerts(test_data, anomaly_indices)

if alerts:
    print(f"\n⚠️  CRITICAL ALERTS ({len(alerts)} alerts):")
    print("=" * 60)
    
    for i, alert in enumerate(alerts, 1):
        print(f"\n🚨 Alert #{i} - {alert['metric'].upper()}:")
        print(f"   Severity: {alert['severity'].upper()}")
        print(f"   Affected samples: {alert['affected_samples']}")
        print(f"   Message: {alert['message']}")
        print(f"   Timestamp: {alert['timestamp']}")
else:
    print("✅ No critical alerts generated - all metrics within acceptable ranges.")

## 6. Advanced Visualizations

Let's create comprehensive visualizations to better understand our results.

In [None]:
# Create a correlation matrix of features
plt.figure(figsize=(12, 10))
correlation_matrix = test_data[feature_columns].corr()
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))
sns.heatmap(correlation_matrix, mask=mask, annot=True, cmap='coolwarm', center=0,
            square=True, fmt='.2f')
plt.title('Correlation Matrix of 5G Network Metrics')
plt.tight_layout()
plt.show()

print("🔍 This shows how different network metrics correlate with each other.")

In [None]:
# Anomaly detection performance visualization
from sklearn.metrics import confusion_matrix, roc_curve, auc

# Confusion Matrix
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Plot 1: Confusion Matrix
cm = confusion_matrix(true_labels, predictions)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[0])
axes[0].set_title('Confusion Matrix')
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('Actual')
axes[0].set_xticklabels(['Normal', 'Anomaly'])
axes[0].set_yticklabels(['Normal', 'Anomaly'])

# Plot 2: ROC Curve
fpr, tpr, _ = roc_curve(true_labels, reconstruction_errors)
roc_auc = auc(fpr, tpr)

axes[1].plot(fpr, tpr, color='darkorange', lw=2, 
             label=f'ROC curve (AUC = {roc_auc:.3f})')
axes[1].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', alpha=0.8)
axes[1].set_xlim([0.0, 1.0])
axes[1].set_ylim([0.0, 1.05])
axes[1].set_xlabel('False Positive Rate')
axes[1].set_ylabel('True Positive Rate')
axes[1].set_title('ROC Curve')
axes[1].legend(loc="lower right")
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"🎯 Confusion Matrix shows classification results.")
print(f"📈 ROC AUC of {roc_auc:.3f} indicates {'excellent' if roc_auc > 0.9 else 'good' if roc_auc > 0.8 else 'fair'} performance.")

In [None]:
# Time series visualization of anomalies
fig, axes = plt.subplots(3, 1, figsize=(16, 12))

# Convert timestamps to datetime for plotting
test_data['datetime'] = pd.to_datetime(test_data['timestamp'])
time_index = range(len(test_data))

# Plot 1: Key metrics over time with anomalies highlighted
axes[0].plot(time_index, test_data['throughput_mbps'], alpha=0.7, label='Throughput (Mbps)')
axes[0].plot(time_index, test_data['latency_ms'] * 10, alpha=0.7, label='Latency (ms × 10)')
anomaly_mask = test_data['is_anomaly'] == True
axes[0].scatter(np.array(time_index)[anomaly_mask], test_data['throughput_mbps'][anomaly_mask], 
                color='red', alpha=0.8, s=50, label='True Anomalies', zorder=5)
axes[0].set_title('Network Performance Over Time')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2: Reconstruction errors with threshold
axes[1].plot(time_index, reconstruction_errors, alpha=0.8, color='blue', label='Reconstruction Error')
axes[1].axhline(y=threshold, color='red', linestyle='--', alpha=0.8, 
                label=f'Threshold ({threshold:.4f})')
axes[1].fill_between(time_index, 0, threshold, alpha=0.2, color='green', label='Normal Range')
axes[1].fill_between(time_index, threshold, reconstruction_errors.max(), 
                     alpha=0.2, color='red', label='Anomaly Range')
axes[1].set_title('Reconstruction Errors and Anomaly Threshold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Plot 3: Multiple metrics comparison
# Normalize metrics for comparison
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(test_data[feature_columns[:4]])

for i, col in enumerate(feature_columns[:4]):
    axes[2].plot(time_index, normalized_data[:, i], alpha=0.7, label=col)

axes[2].scatter(np.array(time_index)[anomaly_mask], 
                np.ones(anomaly_mask.sum()) * 1.1, 
                color='red', alpha=0.8, s=30, marker='v', label='Anomalies')
axes[2].set_title('Normalized Metrics Comparison')
axes[2].legend()
axes[2].grid(True, alpha=0.3)
axes[2].set_ylim(-0.1, 1.2)

plt.tight_layout()
plt.show()

print("📊 These plots show how anomalies manifest in different metrics over time.")

In [None]:
# Feature importance analysis
# Calculate which features contribute most to anomaly detection
feature_anomaly_correlation = []

for col in feature_columns:
    # Calculate correlation between feature values and being anomalous
    correlation = np.corrcoef(test_data[col], reconstruction_errors)[0, 1]
    feature_anomaly_correlation.append(abs(correlation))

# Create feature importance plot
plt.figure(figsize=(12, 8))
feature_importance_df = pd.DataFrame({
    'Feature': [col.replace('_', ' ').title() for col in feature_columns],
    'Importance': feature_anomaly_correlation
}).sort_values('Importance', ascending=True)

bars = plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'], 
                color=plt.cm.viridis(np.linspace(0, 1, len(feature_columns))))
plt.xlabel('Correlation with Reconstruction Error (Absolute Value)')
plt.title('Feature Importance for Anomaly Detection')
plt.grid(True, alpha=0.3, axis='x')

# Add value labels on bars
for i, (bar, value) in enumerate(zip(bars, feature_importance_df['Importance'])):
    plt.text(value + 0.001, bar.get_y() + bar.get_height()/2, 
             f'{value:.3f}', va='center', fontsize=10)

plt.tight_layout()
plt.show()

print("📊 Features with higher correlation contribute more to anomaly detection.")
print(f"Most important feature: {feature_importance_df.iloc[-1]['Feature']}")
print(f"Least important feature: {feature_importance_df.iloc[0]['Feature']}")

## 7. Real-world Application Simulation

Let's simulate how this system would work in a real-world monitoring scenario.

In [None]:
# Simulate real-time monitoring
print("🔄 Simulating real-time network monitoring...")

# Generate a small batch of "live" data
live_data = generate_synthetic_5g_data(num_samples=50, anomaly_rate=0.2)

# Process each sample as if it's arriving in real-time
live_results = []

for i, (idx, sample) in enumerate(live_data.iterrows()):
    # Create a single-sample DataFrame
    sample_df = pd.DataFrame([sample[feature_columns]], columns=feature_columns)
    
    # Detect anomaly
    error, prediction = detector.predict_anomalies(sample_df, feature_columns)
    
    result = {
        'timestamp': sample['timestamp'],
        'is_anomaly': bool(prediction[0]),
        'true_anomaly': sample['is_anomaly'],
        'reconstruction_error': float(error[0]),
        'confidence': abs(float(error[0]) - threshold) / threshold
    }
    
    # Add key metrics
    for col in feature_columns[:3]:  # Just show first 3 for brevity
        result[col] = sample[col]
    
    live_results.append(result)
    
    # Show progress
    if (i + 1) % 10 == 0:
        print(f"Processed {i + 1}/50 samples...")

live_results_df = pd.DataFrame(live_results)
print("\n✅ Real-time simulation complete!")

In [None]:
# Display real-time monitoring results
detected_anomalies = live_results_df[live_results_df['is_anomaly'] == True]

print(f"🎯 Real-time Monitoring Summary:")
print(f"- Total samples processed: {len(live_results_df)}")
print(f"- Anomalies detected: {len(detected_anomalies)}")
print(f"- True anomalies: {live_results_df['true_anomaly'].sum()}")
print(f"- Accuracy: {((live_results_df['is_anomaly'] == live_results_df['true_anomaly']).sum() / len(live_results_df)):.2%}")

if len(detected_anomalies) > 0:
    print(f"\n🚨 Detected Anomalies:")
    display(detected_anomalies[['timestamp', 'reconstruction_error', 'confidence', 
                               'prb_utilization', 'throughput_mbps', 'latency_ms']].head())
else:
    print("✅ No anomalies detected in this batch.")

In [None]:
# Generate a quick report for the live monitoring session
if len(detected_anomalies) > 0:
    print("📝 Generating quick report for live monitoring session...")
    
    # Get anomaly indices for the live data
    live_anomaly_indices = np.where(live_results_df['is_anomaly'].values)[0]
    
    # Generate alerts for critical issues
    live_alerts = reporter.generate_real_time_alerts(
        live_data.iloc[live_anomaly_indices], 
        range(len(live_anomaly_indices))
    )
    
    if live_alerts:
        print("\n🚨 LIVE MONITORING ALERTS:")
        print("=" * 50)
        
        for alert in live_alerts:
            print(f"\n⚠️  {alert['metric'].upper()}: {alert['message'][:80]}...")
    else:
        print("\n✅ No critical alerts for this monitoring session.")
else:
    print("✅ No anomalies detected - system operating normally.")

## 8. Summary and Next Steps

This demo has shown the complete pipeline for 5G network anomaly detection:

In [None]:
# Final summary
print("🎉 DEMO COMPLETION SUMMARY")
print("=" * 60)

print(f"\n✅ Successfully demonstrated:")
print(f"   📊 Synthetic 5G data generation with {len(feature_columns)} metrics")
print(f"   🤖 Autoencoder training with {history['epochs_trained']} epochs")
print(f"   🎯 Anomaly detection with {metrics['f1_score']:.3f} F1-score")
print(f"   📝 LLM report generation with natural language insights")
print(f"   📈 Comprehensive visualizations and analysis")
print(f"   🔄 Real-time monitoring simulation")

print(f"\n📈 Model Performance:")
print(f"   - F1 Score: {metrics['f1_score']:.4f}")
print(f"   - Precision: {metrics['precision']:.4f}")
print(f"   - Recall: {metrics['recall']:.4f}")
print(f"   - AUC: {metrics['auc_score']:.4f}")

print(f"\n🚀 Next Steps for Production:")
print(f"   1. Train on real network data from your infrastructure")
print(f"   2. Deploy using the FastAPI service (src/app.py)")
print(f"   3. Use Kubernetes manifests for scalable deployment")
print(f"   4. Integrate with monitoring systems (Prometheus, Grafana)")
print(f"   5. Set up automated alerting workflows")
print(f"   6. Fine-tune thresholds based on operational requirements")

print(f"\n💡 Key Insights:")
print(f"   - Autoencoders effectively detect network anomalies")
print(f"   - LLM reports provide actionable insights")
print(f"   - System can handle real-time monitoring scenarios")
print(f"   - Visualization helps understand model behavior")

print("\n" + "=" * 60)
print("🎯 Demo completed successfully! Ready for production deployment.")

---

## Additional Resources

- **Training Script**: Use `src/train.py` for full model training
- **Inference Script**: Use `src/inference.py` for batch processing
- **API Service**: Run `src/app.py` for REST API deployment
- **Kubernetes**: Deploy using manifests in `kubernetes/` folder
- **Docker**: Build container with the provided `Dockerfile`

For questions or issues, refer to the project documentation or create an issue in the repository.