<a href="https://colab.research.google.com/github/zbovaird/GPT5_vs_Sonnet4.5/blob/main/UHG_IDS_v4_8_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install PyTorch Geometric (matches your current torch/cuda)
!pip -q install --upgrade pip
import torch
pt = torch.__version__.split('+')[0]
cuda = torch.version.cuda
if torch.cuda.is_available() and cuda:
  idx = f"https://data.pyg.org/whl/torch-{pt}+cu{cuda.replace('.','')}.html"
else:
  idx = f"https://data.pyg.org/whl/torch-{pt}+cpu.html"

!pip -q install torch_scatter torch_sparse torch_cluster torch_spline_conv -f {idx}
!pip -q install torch_geometric scikit-learn scipy pandas tqdm

# Install PyNNDescent for ultra-fast approximate KNN
!pip -q install pynndescent

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m74.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
"""
Intrusion Detection using Universal Hyperbolic Geometry (UHG) v4.8.1
🔬 PyNNDescent + Class Weighting @ 10% Data (APPLES-TO-APPLES vs v4.5!)

v4.8.1 TESTING CONFIGURATION:
- 🎯 PURPOSE: Isolate PyNNDescent's impact on training quality
- PyNNDescent for approximate KNN (vs sklearn in v4.5)
- 10% data sampling (283k samples) - SAME as v4.5 for comparison!
- CLASS WEIGHTED LOSS - SAME as v4.5!
- k=2, PCA, all settings IDENTICAL to v4.5 except KNN method

Why v4.8.1 @ 10%:
- v4.8 @ 20% showed extreme class weights broke BENIGN detection
- v4.5 @ 10% with weighting worked well (95.51% accuracy)
- Need to test: Does PyNNDescent (approximate KNN) affect accuracy vs sklearn (exact)?
- This is the ONLY way to fairly compare KNN methods!

Expected vs v4.5:
- v4.5 @ 10%: sklearn KNN (~70s), good accuracy (95.51%)
- v4.8.1 @ 10%: PyNN KNN (~4-5s), SAME accuracy? (test this!)
- If accuracy matches: PyNNDescent is 10-15x faster with NO quality loss! 🚀
- If accuracy drops: Approximate KNN trades accuracy for speed ⚠️

PyNNDescent Details:
- Algorithm: Nearest Neighbor Descent (approximate)
- Accuracy: 95-98% of exact KNN (per library docs)
- Speedup: 10-50x faster than sklearn
- Question: Does 2-5% KNN approximation error affect GNN training?

Class Weighting Strategy:
- Inverse frequency weighting (same as v4.5)
- At 10% scale, weights are less extreme than 20%
- Expected to work well based on v4.5 results

v4.8.1 Features:
- CLASS WEIGHTED LOSS for minority class detection
- PyNNDescent for ultra-fast approximate KNN ⭐ TESTING!
- PCA dimensionality reduction (77→20 dims, 89% variance)
- k=2 neighbors (same as v4.5)
- Fixed evaluation for missing classes in test set
- Comprehensive timing instrumentation and bottleneck analysis
- GPU detection and memory usage tracking
- UHG constraint verification post-training

Expected Performance @ 10%:
- Data Loading:     ~20s
- KNN (PyNNDescent): ~4-5s 🚀 (vs ~70s sklearn in v4.5!)
- Training:         ~30-40s (fewer samples)
- Total:            ~60-75s (1-1.5 min vs ~2 min in v4.5)

Expected Accuracy @ 10%:
- Overall: ~95-96% (matching v4.5 if PyNN doesn't hurt)
- BENIGN: ~90-91% (matching v4.5)
- Bot: ~85-90% recall
- FTP-Patator: ~95-99%
- SSH-Patator: ~95-99%
- Macro F1: Unknown (v4.5 didn't report, but should be good)

Compare to Previous Versions:
v4.5 @ 10% (weighted, sklearn):  95.51% acc, ~70s KNN, ~2 min total
v4.8.1 @ 10% (weighted, PyNN):   ???% acc, ~5s KNN, ~1 min total ← TESTING!
v4.8 @ 20% (weighted):           80.56% acc (BROKEN by extreme weights)
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, f1_score
from sklearn.decomposition import PCA
from scipy.sparse import coo_matrix
from tqdm import tqdm
from torch.utils.data import DataLoader
from torch_geometric.data import Data
from typing import Tuple
import os
import sys
import time
import json
import traceback
import platform
from datetime import datetime

# Import PyNNDescent
from pynndescent import NNDescent

# Optional: Drive mount (only in Colab)
try:
    from google.colab import drive
    print("Mounting Google Drive...")
    drive.mount('/content/drive')
except Exception:
    pass

# Device configuration with detailed GPU info
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print("\n" + "="*80)
print("🖥️  HARDWARE CONFIGURATION")
print("="*80)

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3  # GB
    cuda_version = torch.version.cuda
    gpu_capability = torch.cuda.get_device_capability(0)

    print(f"✅ GPU Detected:")
    print(f"   • Model: {gpu_name}")
    print(f"   • Memory: {gpu_memory:.1f} GB")
    print(f"   • CUDA Version: {cuda_version}")
    print(f"   • Compute Capability: {gpu_capability[0]}.{gpu_capability[1]}")
    print(f"   • Device: cuda:0")
else:
    print(f"⚠️  No GPU available - using CPU")
    print(f"   • This will be significantly slower for training")

print(f"\n🔬 Configuration (v4.8.1 - Apples-to-Apples vs v4.5):")
print(f"   • KNN Method: PyNNDescent (Approximate) ⭐ TESTING!")
print(f"   • Loss: CLASS-WEIGHTED CrossEntropyLoss ⚖️ (same as v4.5)")
print(f"   • Data: 10% sampling (283k samples) - SAME AS v4.5!")
print(f"   • k=2, PCA, all settings IDENTICAL to v4.5 except KNN")
print(f"   • Expected KNN time: ~4-5s (vs ~70s sklearn in v4.5!)")
print(f"   • Expected total time: ~60-75s (1-1.5 min)")
print(f"   • Goal: Test if PyNN's 2-5% KNN error affects accuracy")
print("="*80 + "\n")

# ======================
# Configuration
# ======================
FILE_PATH = '/content/drive/MyDrive/CIC_data.csv'
MODEL_SAVE_PATH = '/content/drive/MyDrive/uhg_ids_model_best.pth'
RESULTS_PATH = '/content/drive/MyDrive/uhg_ids_results/'
os.makedirs(RESULTS_PATH, exist_ok=True)

def get_env_info():
    return {
        'python': platform.python_version(),
        'torch': torch.__version__,
        'cuda_available': torch.cuda.is_available(),
        'device': str(device),
    }

# ===================
# UHG Geometry Helpers
# ===================

def minkowski_inner_product(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
    """Minkowski inner product: ⟨x,y⟩_M = ∑x_i*y_i - x_t*y_t"""
    spatial = (x[..., :-1] * y[..., :-1]).sum(dim=-1)
    time = x[..., -1] * y[..., -1]
    return spatial - time

def projective_normalize(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
    """Normalize projective coordinates: x² + y² - z² = -1"""
    spatial_norm_sq = (x[..., :-1] ** 2).sum(dim=-1, keepdim=True)
    z = torch.sqrt(torch.clamp(spatial_norm_sq + 1.0, min=eps))
    return torch.cat([x[..., :-1], z], dim=-1)

def uhg_quadrance_vectorized(x: torch.Tensor, y: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
    """Compute vectorized UHG quadrance between vectors"""
    numerator = minkowski_inner_product(x, y)
    denom_x = torch.clamp(-minkowski_inner_product(x, x), min=eps)
    denom_y = torch.clamp(-minkowski_inner_product(y, y), min=eps)
    cos_val = numerator / torch.sqrt(denom_x * denom_y)
    cos_val = torch.clamp(cos_val, min=-1.0+eps, max=1.0-eps)
    return 1 - cos_val**2

def verify_uhg_constraints(x: torch.Tensor, name: str = "embeddings"):
    """Verify Minkowski norm constraints"""
    norm_sq = minkowski_inner_product(x, x)
    violation = torch.abs(norm_sq + 1.0)
    max_viol = violation.max().item()
    mean_viol = violation.mean().item()
    print(f"UHG Constraint Check ({name}):")
    print(f"  Max violation: {max_viol:.6f}")
    print(f"  Mean violation: {mean_viol:.6f}")
    if max_viol > 0.01:
        print(f"  ⚠️ WARNING: Constraints violated!")
    else:
        print(f"  ✅ Constraints satisfied")

# ====================
# Data Loading
# ====================

def load_and_preprocess_data(file_path: str = FILE_PATH, sample_frac: float = 0.10) -> Tuple[torch.Tensor, torch.Tensor, dict, torch.Tensor, dict]:
    """
    v4.8.1: Load and preprocess data (WITH class weighting, 10% sampling for v4.5 comparison)

    Returns:
        node_features: torch.Tensor of shape (n_samples, n_features)
        labels_tensor: torch.Tensor of shape (n_samples,)
        label_mapping: dict mapping label names to indices
        class_weights: torch.Tensor of shape (num_classes,) - inverse frequency weights
        timings: dict of timing information
    """
    timings = {}

    print(f"\nLoading data from: {file_path}")
    t0 = time.perf_counter()
    data = pd.read_csv(file_path, low_memory=False)
    timings['csv_read'] = time.perf_counter() - t0
    print(f"  ⏱️  CSV read: {timings['csv_read']:.2f}s")

    # Strip whitespace from column names and label values (matching v4.6)
    t0 = time.perf_counter()
    data.columns = data.columns.str.strip()
    data['Label'] = data['Label'].str.strip()
    timings['column_cleanup'] = time.perf_counter() - t0

    unique_labels = data['Label'].unique()
    print(f"\nUnique labels in the dataset: {unique_labels}")
    label_counts = data['Label'].value_counts()
    print("\nLabel distribution in the dataset:")
    print(label_counts)

    # Simple random sampling
    print(f"\nApplying random sampling (frac={sample_frac})...")
    t0 = time.perf_counter()
    data_sampled = data.sample(frac=sample_frac, random_state=42)
    timings['sampling'] = time.perf_counter() - t0
    print(f"  ⏱️  Sampling: {timings['sampling']:.2f}s")

    print(f"\nSampled label distribution:")
    sampled_label_counts = data_sampled['Label'].value_counts()
    print(sampled_label_counts)

    # Convert to numeric and handle missing values
    t0 = time.perf_counter()
    data_numeric = data_sampled.apply(pd.to_numeric, errors='coerce')
    timings['to_numeric'] = time.perf_counter() - t0
    print(f"  ⏱️  Convert to numeric: {timings['to_numeric']:.2f}s")

    # Fill NaN and inf
    t0 = time.perf_counter()
    data_filled = data_numeric.fillna(data_numeric.mean())
    data_filled = data_filled.replace([np.inf, -np.inf], np.nan)
    data_filled = data_filled.fillna(data_filled.max())
    if data_filled.isnull().values.any():
        data_filled = data_filled.fillna(0)
    timings['fillna'] = time.perf_counter() - t0
    print(f"  ⏱️  Fill NaN/inf: {timings['fillna']:.2f}s")

    labels = data_sampled['Label']
    features = data_filled.drop(columns=['Label'])

    t0 = time.perf_counter()
    scaler = StandardScaler()
    features_scaled = scaler.fit_transform(features)
    timings['scaling'] = time.perf_counter() - t0
    print(f"  ⏱️  Scaling: {timings['scaling']:.2f}s")

    unique_labels = sorted(labels.unique())
    label_mapping = {label: idx for idx, label in enumerate(unique_labels)}
    labels_numeric = labels.map(label_mapping).values

    t0 = time.perf_counter()
    node_features = torch.tensor(features_scaled, dtype=torch.float32)
    labels_tensor = torch.tensor(labels_numeric, dtype=torch.long)
    timings['to_tensors'] = time.perf_counter() - t0
    print(f"  ⏱️  Convert to tensors: {timings['to_tensors']:.2f}s")

    print("\nPreprocessing complete.")
    print(f"Feature shape: {node_features.shape}")
    print(f"Number of unique labels: {len(unique_labels)}")

    # Show class distribution for reference
    class_counts = np.bincount(labels_numeric)
    print("\nClass distribution in processed data:")
    for label, idx in sorted(label_mapping.items(), key=lambda x: x[1]):
        count = class_counts[idx]
        pct = (count / len(labels_numeric)) * 100
        print(f"  {label:30s}: {count:7d} samples ({pct:5.2f}%)")

    # Compute class weights (inverse frequency)
    total_samples = len(labels_numeric)
    num_classes = len(label_mapping)
    class_weights = total_samples / (num_classes * class_counts)
    class_weights_tensor = torch.tensor(class_weights, dtype=torch.float32)

    print("\n⚖️  Class Weights (inverse frequency):")
    for label, idx in sorted(label_mapping.items(), key=lambda x: x[1]):
        weight = class_weights[idx]
        print(f"  {label:30s}: {weight:7.4f}")

    timings['total'] = sum(timings.values())
    print(f"\n⏱️  Total data loading time: {timings['total']:.2f}s")

    return node_features, labels_tensor, label_mapping, class_weights_tensor, timings

# =========================
# Graph construction (KNN)
# =========================

def create_graph_data(node_features: torch.Tensor, labels: torch.Tensor, k: int = 2, use_pca: bool = True, pca_components: int = 20) -> Tuple[Data, dict]:
    """v4.8.1: PyNNDescent approximate KNN with PCA dimensionality reduction"""
    timings = {}

    print("\nCreating graph structure...")
    t0 = time.perf_counter()
    features_np = node_features.cpu().numpy()
    timings['to_numpy'] = time.perf_counter() - t0

    # PCA for faster KNN
    if use_pca and features_np.shape[1] > pca_components:
        print(f"\nApplying PCA for faster KNN...")
        print(f"  • Original features: {features_np.shape[1]}")
        t0 = time.perf_counter()
        pca = PCA(n_components=pca_components)
        features_reduced = pca.fit_transform(features_np)
        timings['pca'] = time.perf_counter() - t0
        explained_var = pca.explained_variance_ratio_.sum()
        print(f"  • Reduced features: {features_reduced.shape[1]}")
        print(f"  • Explained variance: {explained_var:.4f} ({explained_var*100:.2f}%)")
        print(f"  ⏱️  PCA: {timings['pca']:.2f}s")
        features_for_knn = features_reduced
    else:
        features_for_knn = features_np
        timings['pca'] = 0.0

    # ===== v4.8.1: PyNNDescent Approximate KNN =====
    print(f"\n🚀 Computing KNN graph with PyNNDescent (k={k})...")
    print(f"  • Input shape: {features_for_knn.shape}")
    print(f"  • Number of samples: {features_for_knn.shape[0]:,}")
    print(f"  • Number of features: {features_for_knn.shape[1]}")
    print(f"  • Using PyNNDescent (approximate)")
    print(f"  • Algorithm: Nearest Neighbor Descent")
    print(f"  • Expected: 10-20x faster than sklearn @ 10%!")
    print(f"  • Accuracy: 95-98% of exact KNN")

    sys.stdout.flush()

    t0 = time.perf_counter()

    # Build PyNNDescent index
    print(f"  • Building NNDescent index...")
    index = NNDescent(
        features_for_knn,
        n_neighbors=k+1,  # +1 to include self, will remove later
        metric='euclidean',
        n_jobs=-1,  # Use all CPU cores
        verbose=False
    )

    # Get neighbor indices (includes self as first neighbor)
    indices, distances = index.neighbor_graph

    # Remove self-connections (first column)
    indices = indices[:, 1:]  # Skip first neighbor (self)

    # Create edge index
    num_nodes = features_for_knn.shape[0]
    row = np.repeat(np.arange(num_nodes), k)
    col = indices.flatten()

    edge_index = torch.from_numpy(
        np.vstack([row, col])
    ).long().to(device)

    timings['knn_computation'] = time.perf_counter() - t0
    print(f"  ✅ PyNNDescent KNN computation: {timings['knn_computation']:.2f}s")
    print(f"  💡 Speedup vs sklearn (est ~70s @ v4.5): ~{70/timings['knn_computation']:.1f}x!")

    timings['edge_index_creation'] = 0.0  # Already included in knn_computation
    print(f"Edge index shape: {edge_index.shape}")

    # Add homogeneous coordinate (projective)
    t0 = time.perf_counter()
    node_features_uhg = torch.cat([
        node_features.to(device),
        torch.ones(node_features.size(0), 1, device=device)
    ], dim=1)
    timings['add_homogeneous'] = time.perf_counter() - t0

    t0 = time.perf_counter()
    node_features_uhg = projective_normalize(node_features_uhg)
    timings['projective_normalize'] = time.perf_counter() - t0
    print(f"  ⏱️  UHG projection: {timings['projective_normalize']:.2f}s")

    print(f"Feature shape with homogeneous coordinate: {node_features_uhg.shape}")

    # Verify UHG constraints
    t0 = time.perf_counter()
    verify_uhg_constraints(node_features_uhg, name="initial features")
    timings['constraint_verification'] = time.perf_counter() - t0

    t0 = time.perf_counter()
    total_samples = len(node_features_uhg)
    indices_split = torch.randperm(total_samples)
    train_size = int(0.7 * total_samples)
    val_size = int(0.15 * total_samples)

    train_mask = torch.zeros(total_samples, dtype=torch.bool, device=device)
    val_mask = torch.zeros(total_samples, dtype=torch.bool, device=device)
    test_mask = torch.zeros(total_samples, dtype=torch.bool, device=device)

    train_mask[indices_split[:train_size]] = True
    val_mask[indices_split[train_size:train_size+val_size]] = True
    test_mask[indices_split[train_size+val_size:]] = True
    timings['split_creation'] = time.perf_counter() - t0

    print(f"\nTrain size: {train_mask.sum()}, Val size: {val_mask.sum()}, Test size: {test_mask.sum()}")

    timings['total'] = sum(timings.values())
    print(f"\n⏱️  Total graph construction time: {timings['total']:.2f}s")

    return Data(
        x=node_features_uhg,
        edge_index=edge_index,
        y=labels.to(device),
        train_mask=train_mask,
        val_mask=val_mask,
        test_mask=test_mask
    ).to(device), timings

# ==============================
# UHG GraphSAGE Message Passing
# ==============================

from torch_scatter import scatter_add
from torch_geometric.nn.conv import MessagePassing

class UHGMessagePassing(MessagePassing):
    def __init__(self, in_features: int, out_features: int):
        super().__init__(aggr='add')
        self.in_features = in_features
        self.out_features = out_features
        self.weight_msg = nn.Parameter(torch.Tensor(in_features, out_features))
        self.weight_node = nn.Parameter(torch.Tensor(in_features, out_features))
        self.reset_parameters()

    def reset_parameters(self):
        nn.init.xavier_uniform_(self.weight_msg)
        nn.init.xavier_uniform_(self.weight_node)

    def forward(self, x: torch.Tensor, edge_index: torch.Tensor) -> torch.Tensor:
        # x includes homogeneous coord
        # Transform node features (spatial only)
        features = x[:, :-1]
        z = x[:, -1:]
        transformed_features = features @ self.weight_node
        # Propagate using full projective vectors for weight computation
        # Pass explicit size to handle all nodes (including isolated ones)
        out = self.propagate(edge_index, x=x, size=(x.size(0), x.size(0)))
        # Combine
        out = out + transformed_features
        # Recompute time-like to maintain Minkowski norm -1
        out_full = torch.cat([out, z], dim=1)
        out_full = projective_normalize(out_full)
        return out_full

    def message(self, x_i: torch.Tensor, x_j: torch.Tensor, edge_index: torch.Tensor) -> torch.Tensor:
        # x_i, x_j are full projective vectors
        weights = torch.exp(-uhg_quadrance_vectorized(x_i, x_j))
        # Transform neighbor features (spatial only)
        messages = (x_j[:, :-1]) @ self.weight_msg
        return messages * weights.view(-1, 1)

    def aggregate(self, inputs: torch.Tensor, index: torch.Tensor, ptr=None, dim_size=None) -> torch.Tensor:
        # Sum messages per destination (with explicit dim_size to handle all nodes)
        numerator = scatter_add(inputs, index, dim=0, dim_size=dim_size)
        # Sum weights per destination (approximate by ones per feature dim)
        weights_sum = scatter_add(torch.ones_like(inputs), index, dim=0, dim_size=dim_size)
        return numerator / torch.clamp(weights_sum, min=1e-6)

class UHGGraphSAGE(nn.Module):
    def __init__(self, in_channels: int, hidden_channels: int, out_channels: int, num_layers: int, dropout: float = 0.2):
        super().__init__()
        self.layers = nn.ModuleList()
        self.dropout = nn.Dropout(dropout)
        # in_channels includes homogeneous coord
        actual_in = in_channels - 1
        self.layers.append(UHGMessagePassing(actual_in, hidden_channels))
        for _ in range(num_layers - 2):
            self.layers.append(UHGMessagePassing(hidden_channels, hidden_channels))
        self.layers.append(UHGMessagePassing(hidden_channels, out_channels))

    def forward(self, x: torch.Tensor, edge_index: torch.Tensor) -> torch.Tensor:
        h = x
        for layer in self.layers[:-1]:
            h = layer(h, edge_index)
            # Apply ReLU on spatial part only
            spatial = F.relu(h[:, :-1])
            h = torch.cat([spatial, h[:, -1:]], dim=1)
            h = self.dropout(h)
        h = self.layers[-1](h, edge_index)
        return h[:, :-1]  # logits on spatial part

# =====================
# Training / Evaluation
# =====================

def train_epoch(model: nn.Module, graph_data: Data, optimizer: torch.optim.Optimizer, criterion: nn.Module, detailed_timing: bool = False) -> Tuple[float, dict]:
    """Train one epoch with optional detailed timing"""
    model.train()
    timings = {}

    try:
        t0 = time.perf_counter()
        optimizer.zero_grad(set_to_none=True)
        if detailed_timing:
            timings['zero_grad'] = time.perf_counter() - t0

        # Single full-batch forward/backward on the static graph
        t0 = time.perf_counter()
        out = model(graph_data.x, graph_data.edge_index)
        if detailed_timing:
            timings['forward_pass'] = time.perf_counter() - t0

        t0 = time.perf_counter()
        loss = criterion(out[graph_data.train_mask], graph_data.y[graph_data.train_mask])
        if detailed_timing:
            timings['loss_computation'] = time.perf_counter() - t0

        t0 = time.perf_counter()
        loss.backward()
        if detailed_timing:
            timings['backward_pass'] = time.perf_counter() - t0

        t0 = time.perf_counter()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        if detailed_timing:
            timings['grad_clipping'] = time.perf_counter() - t0

        t0 = time.perf_counter()
        optimizer.step()
        if detailed_timing:
            timings['optimizer_step'] = time.perf_counter() - t0
            timings['total'] = sum(timings.values())

        return float(loss.item()), timings
    except Exception as e:
        print(f"Train step failure: {e}")
        traceback.print_exc()
        raise

@torch.no_grad()
def evaluate(model: nn.Module, graph_data: Data, mask: torch.Tensor, detailed_timing: bool = False) -> Tuple[float, dict]:
    """Evaluate with optional detailed timing"""
    timings = {}
    model.eval()

    t0 = time.perf_counter()
    out = model(graph_data.x, graph_data.edge_index)
    if detailed_timing:
        timings['forward_pass'] = time.perf_counter() - t0

    t0 = time.perf_counter()
    pred = out[mask].argmax(dim=1)
    acc = (pred == graph_data.y[mask]).float().mean().item()
    if detailed_timing:
        timings['prediction'] = time.perf_counter() - t0
        timings['total'] = sum(timings.values())

    return acc, timings

@torch.no_grad()
def evaluate_detailed(model: nn.Module, graph_data: Data, mask: torch.Tensor, label_mapping: dict, phase: str = "Test") -> dict:
    """Detailed per-class evaluation (fixed for missing classes)"""
    model.eval()
    out = model(graph_data.x, graph_data.edge_index)
    pred = out[mask].argmax(dim=1).cpu().numpy()
    true = graph_data.y[mask].cpu().numpy()

    # Reverse label mapping
    idx_to_label = {v: k for k, v in label_mapping.items()}

    # Only include classes that actually appear in test set
    unique_classes = np.unique(np.concatenate([true, pred]))
    target_names = [idx_to_label[i] for i in unique_classes]

    # Show which classes are missing
    all_classes = set(range(len(label_mapping)))
    present_classes = set(unique_classes)
    missing_classes = all_classes - present_classes

    print(f"\n{'='*80}")
    print(f"{phase} Set - Detailed Performance Report")
    print(f"{'='*80}")

    if missing_classes:
        print(f"\n⚠️  WARNING: {len(missing_classes)} classes not present in {phase.lower()} set:")
        for class_idx in sorted(missing_classes):
            print(f"  • {idx_to_label[class_idx]}")
        print(f"  (This is normal with small sample sizes and rare classes)")

    # Overall accuracy
    overall_acc = (pred == true).mean()
    print(f"\nOverall Accuracy: {overall_acc:.4f}")
    print(f"Classes evaluated: {len(unique_classes)}/{len(label_mapping)}")

    # Per-class metrics (only for classes present in test set)
    print("\nPer-Class Classification Report:")
    print(classification_report(true, pred, labels=unique_classes, target_names=target_names, zero_division=0, digits=4))

    # Confusion matrix (abbreviated)
    cm = confusion_matrix(true, pred)
    print("\nPer-Class Accuracy:")
    for i, label in enumerate(target_names):
        class_acc = cm[i, i] / cm[i].sum() if cm[i].sum() > 0 else 0.0
        class_samples = cm[i].sum()
        print(f"  {label:30s}: {class_acc:.4f} ({int(class_samples)} samples)")

    # Macro and weighted F1
    f1_macro = f1_score(true, pred, average='macro', zero_division=0)
    f1_weighted = f1_score(true, pred, average='weighted', zero_division=0)
    print(f"\nF1 Score (Macro):    {f1_macro:.4f}")
    print(f"\nF1 Score (Weighted): {f1_weighted:.4f}")

    return {
        'accuracy': float(overall_acc),
        'f1_macro': float(f1_macro),
        'f1_weighted': float(f1_weighted),
        'confusion_matrix': cm.tolist(),
    }

def main():
    run_started = time.perf_counter()
    run_id = datetime.now().strftime('%Y%m%dT%H%M%S')
    metrics = {
        'version': 'v4.8.1',
        'run_id': run_id,
        'env': get_env_info(),
        'paths': {
            'file_path': FILE_PATH,
            'model_save_path': MODEL_SAVE_PATH,
            'results_path': RESULTS_PATH,
        },
        'improvements': [
            'v4.8.1: PyNNDescent @ 10% data for apples-to-apples vs v4.5!',
            'v4.8.1: Testing if approximate KNN affects training quality',
            'v4.8.1: ALL settings identical to v4.5 except KNN method',
            'v4.8.1: CLASS WEIGHTED LOSS (same as v4.5)',
            'v4.8.1: 10% data sampling (283k samples, same as v4.5)',
            'PyNNDescent for 10-15x faster KNN (~5s vs ~70s)',
            'PCA dimensionality reduction for faster KNN (77 → 20 dims)',
            'k=2 neighbors (same as v4.5)',
            'Detailed per-class metrics (fixed for missing classes)',
            'UHG constraint verification',
            'Comprehensive timing instrumentation',
            'GPU detection and memory tracking',
        ],
        'data': {},
        'graph': {},
        'model': {},
        'train': {
            'epochs': [],
            'best_val': 0.0,
            'best_epoch': None,
        },
        'errors': None,
        'timing': {},
        'gpu_memory': {},
    }

    try:
        # Data loading with detailed timing (10% for v4.5 comparison, WITH class weighting)
        node_features, labels, label_mapping, class_weights, data_timings = load_and_preprocess_data(FILE_PATH, sample_frac=0.10)

        metrics['data'] = {
            'num_nodes': int(node_features.size(0)),
            'num_features': int(node_features.size(1)),
            'num_classes': int(len(label_mapping)),
            'sample_fraction': 0.10,
            'comparison_target': 'v4.5',
            'class_weighted_loss': True,
        }
        metrics['timing']['data_load'] = data_timings

        # Graph construction with detailed timing, PCA, and PyNNDescent
        graph_data, graph_timings = create_graph_data(node_features, labels, k=2, use_pca=True, pca_components=20)

        metrics['timing']['graph_build'] = graph_timings
        metrics['graph'] = {
            'num_nodes': int(graph_data.x.size(0)),
            'num_edges': int(graph_data.edge_index.size(1)),
            'k_neighbors': 2,
            'pca_enabled': True,
            'pca_components': 20,
            'knn_method': 'pynndescent',
            'knn_approximate': True,
            'train_nodes': int(graph_data.train_mask.sum().item()),
            'val_nodes': int(graph_data.val_mask.sum().item()),
            'test_nodes': int(graph_data.test_mask.sum().item()),
        }

        in_channels = graph_data.x.size(1)
        hidden_channels = 64
        out_channels = len(label_mapping)
        num_layers = 2

        model = UHGGraphSAGE(in_channels, hidden_channels, out_channels, num_layers).to(device)
        optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=1e-5)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=5)

        # CLASS-WEIGHTED CrossEntropyLoss for minority class detection
        criterion = nn.CrossEntropyLoss(weight=class_weights.to(device))
        print(f"\n✅ Using CLASS-WEIGHTED CrossEntropyLoss (same as v4.5)")
        print(f"   • Inverse frequency weighting for minority class detection")
        print(f"   • 10% data (283k samples) - IDENTICAL to v4.5 for comparison")
        print(f"   • Goal: Test if PyNNDescent (approximate) affects accuracy")

        metrics['model'] = {
            'in_channels': in_channels,
            'hidden_channels': hidden_channels,
            'out_channels': out_channels,
            'num_layers': num_layers,
            'class_weighted_loss': True,
        }

        # Track GPU memory before training
        if torch.cuda.is_available():
            torch.cuda.reset_peak_memory_stats()
            torch.cuda.empty_cache()
            mem_allocated_before = torch.cuda.memory_allocated() / 1024**3
            mem_reserved_before = torch.cuda.memory_reserved() / 1024**3
            print(f"\n💾 GPU Memory (before training):")
            print(f"   • Allocated: {mem_allocated_before:.2f} GB")
            print(f"   • Reserved:  {mem_reserved_before:.2f} GB")

        print("\nStarting training...")

        best_val_acc = 0.0
        best_epoch = 0
        patience = 10
        epochs_without_improvement = 0
        max_epochs = 200

        epoch_times = []
        train_losses = []
        val_accs = []
        test_accs = []

        for epoch in range(1, max_epochs + 1):
            t_epoch_start = time.perf_counter()

            # Detailed timing for epochs 1, 2, 50, 100
            detailed = (epoch in [1, 2, 50, 100])

            loss, train_timings = train_epoch(model, graph_data, optimizer, criterion, detailed_timing=detailed)
            val_acc, val_timings = evaluate(model, graph_data, graph_data.val_mask, detailed_timing=detailed)
            test_acc, test_timings = evaluate(model, graph_data, graph_data.test_mask, detailed_timing=detailed)

            scheduler.step(val_acc)
            current_lr = optimizer.param_groups[0]['lr']

            epoch_time = time.perf_counter() - t_epoch_start
            epoch_times.append(epoch_time)
            train_losses.append(loss)
            val_accs.append(val_acc)
            test_accs.append(test_acc)

            improved = val_acc > best_val_acc
            if improved:
                best_val_acc = val_acc
                best_epoch = epoch
                epochs_without_improvement = 0
                torch.save(model.state_dict(), MODEL_SAVE_PATH)
                improved_str = "(saved)"
            else:
                epochs_without_improvement += 1
                improved_str = ""

            # Print detailed timing for specific epochs
            if detailed:
                print(f"\n⏱️  Epoch {epoch} Detailed Timing:")
                print(f"    Train: Forward={train_timings.get('forward_pass', 0):.3f}s, "
                      f"Backward={train_timings.get('backward_pass', 0):.3f}s, "
                      f"Optimizer={train_timings.get('optimizer_step', 0):.3f}s")
                print(f"    Val:   Forward={val_timings.get('forward_pass', 0):.3f}s")
                print(f"    Test:  Forward={test_timings.get('forward_pass', 0):.3f}s")

            # Print every epoch if improved, or every 10 epochs
            if improved or epoch % 10 == 0 or epoch == max_epochs:
                print(f"Epoch {epoch:03d} | Loss {loss:.4f} | Val {val_acc:.4f} | "
                      f"Test {test_acc:.4f} | LR {current_lr:.5f} | {epoch_time:.2f}s | {improved_str}")

            if epochs_without_improvement >= patience:
                print(f"Early stopping.")
                break

        print(f"\n⏱️  Average epoch time: {np.mean(epoch_times):.2f}s")

        # Track GPU memory after training
        if torch.cuda.is_available():
            mem_allocated_peak = torch.cuda.max_memory_allocated() / 1024**3
            mem_allocated_final = torch.cuda.memory_allocated() / 1024**3
            mem_reserved_final = torch.cuda.memory_reserved() / 1024**3

            print(f"\n💾 GPU Memory Usage Summary:")
            print(f"   • Peak Allocated: {mem_allocated_peak:.2f} GB")
            print(f"   • Final Allocated: {mem_allocated_final:.2f} GB")
            print(f"   • Final Reserved: {mem_reserved_final:.2f} GB")

            metrics['gpu_memory'] = {
                'peak_allocated_gb': float(mem_allocated_peak),
                'final_allocated_gb': float(mem_allocated_final),
                'final_reserved_gb': float(mem_reserved_final),
            }

        metrics['train']['epochs'] = list(range(1, len(train_losses) + 1))
        metrics['train']['losses'] = [float(x) for x in train_losses]
        metrics['train']['val_accs'] = [float(x) for x in val_accs]
        metrics['train']['test_accs'] = [float(x) for x in test_accs]
        metrics['train']['best_val'] = float(best_val_acc)
        metrics['train']['best_epoch'] = int(best_epoch)
        metrics['train']['total_epochs'] = len(train_losses)
        metrics['train']['avg_epoch_time'] = float(np.mean(epoch_times))

        # Load best model and evaluate
        print("\nLoading best model for final evaluation...")
        model.load_state_dict(torch.load(MODEL_SAVE_PATH))

        # Verify UHG constraints after training
        print("\n" + "="*80)
        print("POST-TRAINING UHG CONSTRAINT VERIFICATION")
        print("="*80)
        model.eval()
        with torch.no_grad():
            # Check constraints after first layer
            h = graph_data.x
            h = model.layers[0](h, graph_data.edge_index)
            verify_uhg_constraints(h, name="after layer 1")

        # Final test evaluation with detailed metrics
        test_results = evaluate_detailed(model, graph_data, graph_data.test_mask, label_mapping, phase="Test")
        metrics['test'] = test_results

        print(f"\nFinal Test Accuracy: {test_results['accuracy']:.4f}")

        # Comprehensive timing breakdown
        run_ended = time.perf_counter()
        total_runtime = run_ended - run_started

        data_time = data_timings['total']
        graph_time = graph_timings['total']
        train_time = sum(epoch_times)

        print("\n" + "="*80)
        print("⏱️  COMPREHENSIVE TIMING BREAKDOWN")
        print("="*80)

        print(f"\n📊 DATA LOADING ({data_time:.2f}s total):")
        for key, val in data_timings.items():
            if key != 'total':
                pct = (val / total_runtime) * 100
                print(f"  • {key.replace('_', ' ').title():20s} {val:7.2f}s ({pct:5.1f}%)")

        print(f"\n🕸️  GRAPH CONSTRUCTION ({graph_time:.2f}s total):")
        for key, val in graph_timings.items():
            if key != 'total':
                pct = (val / total_runtime) * 100
                if key == 'pca':
                    print(f"  • PCA (77→20 dims): {val:7.2f}s ({pct:5.1f}%)")
                elif key == 'knn_computation':
                    print(f"  • KNN Computation: {val:7.2f}s ({pct:5.1f}%) 🚀 PyNNDescent")
                else:
                    print(f"  • {key.replace('_', ' ').title():20s} {val:7.2f}s ({pct:5.1f}%)")

        print(f"\n🎓 TRAINING ({train_time:.2f}s total, {(train_time/total_runtime)*100:.1f}% of runtime):")
        print(f"  • Avg Epoch Time:      {np.mean(epoch_times):.2f}s")
        print(f"  • Total Epochs:      {len(epoch_times)}")

        print(f"\n📈 HIGH-LEVEL SUMMARY:")
        print(f"  • Data Loading:       {(data_time/total_runtime)*100:5.1f}% of total runtime")
        print(f"  • Graph Building:     {(graph_time/total_runtime)*100:5.1f}% of total runtime")
        print(f"  • Training:           {(train_time/total_runtime)*100:5.1f}% of total runtime")
        print(f"  • Total Runtime:     {total_runtime:7.2f}s ({total_runtime/60:.1f} min)")
        if torch.cuda.is_available():
            print(f"  • Peak GPU Memory:   {mem_allocated_peak:.2f} GB")

        print(f"\n🚀 PYNNDESCENT PERFORMANCE:")
        print(f"  • KNN Time (PyNNDescent): {graph_timings['knn_computation']:.0f}s")
        print(f"  • Est. sklearn Time (v4.5): ~70s")
        print(f"  • Speedup:                ~{70/graph_timings['knn_computation']:.1f}x FASTER! 🚀")

        print(f"\n🔬 COMPARISON vs v4.5:")
        print(f"  • v4.5 @ 10%: sklearn KNN (~70s), 95.51% accuracy")
        print(f"  • v4.8.1 @ 10%: PyNN KNN ({graph_timings['knn_computation']:.0f}s), {test_results['accuracy']*100:.2f}% accuracy")
        print(f"  • Speedup: {70/graph_timings['knn_computation']:.1f}x faster KNN")
        print(f"  • Accuracy delta: {(test_results['accuracy']-0.9551)*100:+.2f}%")
        if abs(test_results['accuracy'] - 0.9551) < 0.005:
            print(f"  ✅ PyNNDescent has NO IMPACT on accuracy (±0.5%)!")
        elif test_results['accuracy'] > 0.9551:
            print(f"  ✅ PyNNDescent is BETTER than sklearn!")
        else:
            print(f"  ⚠️ PyNNDescent trades {(0.9551-test_results['accuracy'])*100:.2f}% accuracy for {70/graph_timings['knn_computation']:.1f}x speed")

        # Bottleneck analysis
        all_timings = []
        for category, timing_dict in [('Data', data_timings), ('Graph', graph_timings)]:
            for key, val in timing_dict.items():
                if key != 'total' and val > 0.5:  # Only show > 0.5s
                    all_timings.append((f"{category}: {key}", val))

        all_timings.sort(key=lambda x: x[1], reverse=True)
        print(f"\n🔍 BOTTLENECK ANALYSIS:")
        for i, (name, time_val) in enumerate(all_timings[:3], 1):
            pct = (time_val / total_runtime) * 100
            print(f"  {i}. {name:40s} {time_val:7.2f}s ({pct:5.1f}%)")

        metrics['timing']['total_runtime'] = float(total_runtime)
        metrics['timing']['data_time'] = float(data_time)
        metrics['timing']['graph_time'] = float(graph_time)
        metrics['timing']['train_time'] = float(train_time)

        # Save metrics
        metrics_file = os.path.join(RESULTS_PATH, f'metrics_v4.8.1_{run_id}.json')
        with open(metrics_file, 'w') as f:
            json.dump(metrics, f, indent=2)
        print(f"Saved metrics to: {metrics_file}")

        print("\n" + "="*80)
        print("UHG IDS Model v4.8.1 - Training Complete")
        print("="*80)
        print(f"Results saved to: {metrics_file}")
        print(f"\n🔬 APPLES-TO-APPLES COMPARISON SUCCESS!")
        print(f"   • Total samples: {metrics['data']['num_nodes']:,}")
        print(f"   • KNN time: {graph_timings['knn_computation']:.0f}s (vs ~70s sklearn)")
        print(f"   • Total time: {total_runtime:.0f}s ({total_runtime/60:.1f} min)")
        print(f"   • PyNNDescent: {70/graph_timings['knn_computation']:.1f}x FASTER than sklearn! 🚀")
        print(f"   • Accuracy: {test_results['accuracy']*100:.2f}% (v4.5: 95.51%)")
        print(f"\n🎯 Now compare this with v4.5 Results 3 to see PyNN's true impact!")

    except Exception as e:
        print(f"\n❌ Training failed with error: {e}")
        traceback.print_exc()
        metrics['errors'] = str(e)
        raise

if __name__ == "__main__":
    main()



Mounting Google Drive...
Mounted at /content/drive

🖥️  HARDWARE CONFIGURATION
✅ GPU Detected:
   • Model: NVIDIA L4
   • Memory: 22.2 GB
   • CUDA Version: 12.6
   • Compute Capability: 8.9
   • Device: cuda:0

🔬 Configuration (v4.8.1 - Apples-to-Apples vs v4.5):
   • KNN Method: PyNNDescent (Approximate) ⭐ TESTING!
   • Loss: CLASS-WEIGHTED CrossEntropyLoss ⚖️ (same as v4.5)
   • Data: 10% sampling (283k samples) - SAME AS v4.5!
   • k=2, PCA, all settings IDENTICAL to v4.5 except KNN
   • Expected KNN time: ~4-5s (vs ~70s sklearn in v4.5!)
   • Expected total time: ~60-75s (1-1.5 min)
   • Goal: Test if PyNN's 2-5% KNN error affects accuracy


Loading data from: /content/drive/MyDrive/CIC_data.csv
  ⏱️  CSV read: 35.89s

Unique labels in the dataset: ['BENIGN' 'DDoS' 'PortScan' 'Bot' 'Infiltration'
 'Web Attack � Brute Force' 'Web Attack � XSS'
 'Web Attack � Sql Injection' 'FTP-Patator' 'SSH-Patator' 'DoS slowloris'
 'DoS Slowhttptest' 'DoS Hulk' 'DoS GoldenEye' 'Heartbleed']

Labe