# Static IDS Failure: Concept Drift & Catastrophic Forgetting

**Experiment**: Ch·ª©ng minh s·ª± th·∫•t b·∫°i c·ªßa Static IDS khi g·∫∑p distribution shifts v√† rare attacks trong NSL-KDD test set.

---

## C·∫•u tr√∫c Notebook:

### **PH·∫¶N C∆† B·∫¢N**: Overall Evaluation
- Train on **FULL TRAIN SET** (125,973 samples - Old Data 2015)
- Test on **FULL TEST SET** (22,544 samples - New Data 2016)
- ƒêo accuracy drop, F1-score, confusion matrix
- Ph√¢n t√≠ch per-class performance

### **PH·∫¶N N√ÇNG CAO**: Period-Based Evaluation
- Chia test set th√†nh 5 periods (DoS, Probe, R2L, U2R, Mixed)
- T√≠nh **Forgetting Measure (FM)** cho t·ª´ng period
- Visualization chi ti·∫øt: Performance degradation over periods
- Analysis: Catastrophic forgetting patterns

---

## Key Hypotheses:
1. ‚úÖ Model fits well on train (DoS-heavy, R2L/U2R rare)
2. ‚ùå Model FAILS on test (R2L surge +16x, U2R +7.5x)
3. üìâ **Catastrophic Forgetting**: FM cao cho rare classes
4. üéØ **Need Adaptive Learning**: Static model kh√¥ng ph√π h·ª£p cho IDS

## 1. Setup & Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (accuracy_score, f1_score, precision_score, recall_score,
                             confusion_matrix, classification_report)
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 4)
pd.set_option('display.width', 120)

# Plot settings
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("Set2")
plt.rcParams['figure.dpi'] = 100
plt.rcParams['savefig.dpi'] = 300
plt.rcParams['font.size'] = 10

print("‚úì Libraries imported successfully!")
print(f"Random seed: {RANDOM_SEED}")
print(f"NumPy: {np.__version__} | Pandas: {pd.__version__}")

‚úì Libraries imported successfully!
Random seed: 42
NumPy: 2.4.2 | Pandas: 3.0.0


## 2. Load NSL-KDD Dataset

In [2]:
# Define column names for NSL-KDD
col_names = [
    "duration", "protocol_type", "service", "flag", "src_bytes", "dst_bytes",
    "land", "wrong_fragment", "urgent", "hot", "num_failed_logins", "logged_in",
    "num_compromised", "root_shell", "su_attempted", "num_root",
    "num_file_creations", "num_shells", "num_access_files", "num_outbound_cmds",
    "is_host_login", "is_guest_login", "count", "srv_count", "serror_rate",
    "srv_serror_rate", "rerror_rate", "srv_rerror_rate", "same_srv_rate",
    "diff_srv_rate", "srv_diff_host_rate", "dst_host_count", "dst_host_srv_count",
    "dst_host_same_srv_rate", "dst_host_diff_srv_rate",
    "dst_host_same_src_port_rate", "dst_host_srv_diff_host_rate",
    "dst_host_serror_rate", "dst_host_srv_serror_rate",
    "dst_host_rerror_rate", "dst_host_srv_rerror_rate",
    "label", "difficulty"
]

# Load datasets
train_path = "data/KDDTrain+.txt"
test_path = "data/KDDTest+.txt"

df_train = pd.read_csv(train_path, names=col_names, header=None)
df_test = pd.read_csv(test_path, names=col_names, header=None)

print(f"‚úì Dataset loaded successfully!")
print(f"\nüì¶ Train Set (Old Data - 2015):")
print(f"   Shape: {df_train.shape} ({df_train.shape[0]:,} samples, {df_train.shape[1]} features)")
print(f"\nüì¶ Test Set (New Data - 2016):")
print(f"   Shape: {df_test.shape} ({df_test.shape[0]:,} samples, {df_test.shape[1]} features)")
print(f"\nFirst 3 rows:")
df_train.head(3)

‚úì Dataset loaded successfully!

üì¶ Train Set (Old Data - 2015):
   Shape: (125973, 43) (125,973 samples, 43 features)

üì¶ Test Set (New Data - 2016):
   Shape: (22544, 43) (22,544 samples, 43 features)

First 3 rows:


Unnamed: 0,duration,protocol_type,service,flag,src_bytes,dst_bytes,land,wrong_fragment,urgent,hot,num_failed_logins,logged_in,num_compromised,root_shell,su_attempted,num_root,num_file_creations,num_shells,num_access_files,num_outbound_cmds,is_host_login,is_guest_login,count,srv_count,serror_rate,srv_serror_rate,rerror_rate,srv_rerror_rate,same_srv_rate,diff_srv_rate,srv_diff_host_rate,dst_host_count,dst_host_srv_count,dst_host_same_srv_rate,dst_host_diff_srv_rate,dst_host_same_src_port_rate,dst_host_srv_diff_host_rate,dst_host_serror_rate,dst_host_srv_serror_rate,dst_host_rerror_rate,dst_host_srv_rerror_rate,label,difficulty
0,0,tcp,ftp_data,SF,491,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.0,0.0,0.0,0.0,1.0,0.0,0.0,150,25,0.17,0.03,0.17,0.0,0.0,0.0,0.05,0.0,normal,20
1,0,udp,other,SF,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,1,0.0,0.0,0.0,0.0,0.08,0.15,0.0,255,1,0.0,0.6,0.88,0.0,0.0,0.0,0.0,0.0,normal,15
2,0,tcp,private,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,123,6,1.0,1.0,0.0,0.0,0.05,0.07,0.0,255,26,0.1,0.05,0.0,0.0,1.0,1.0,0.0,0.0,neptune,19


## 3. Label Mapping & Preprocessing

Map c√°c attack labels c·ª• th·ªÉ th√†nh 5 categories ch√≠nh:
- **Normal**: normal
- **DoS**: apache2, back, land, neptune, mailbomb, pod, processtable, smurf, teardrop, udpstorm, worm
- **Probe**: ipsweep, mscan, nmap, portsweep, saint, satan
- **R2L**: ftp_write, guess_passwd, imap, multihop, named, phf, sendmail, snmpgetattack, snmpguess, spy, warezclient, warezmaster, xlock, xsnoop
- **U2R**: buffer_overflow, loadmodule, perl, ps, rootkit, sqlattack, xterm

In [3]:
def get_attack_category(label: str) -> str:
    """Map attack labels to 5 main categories"""
    label = label.lower().strip()
    
    if 'normal' in label:
        return 'Normal'
    
    # DoS attacks
    dos_attacks = {'neptune', 'smurf', 'back', 'teardrop', 'pod', 'land',
                   'mailbomb', 'processtable', 'udpstorm', 'apache2', 'worm'}
    if label in dos_attacks:
        return 'DoS'
    
    # Probe attacks
    probe_attacks = {'satan', 'ipsweep', 'nmap', 'portsweep', 'mscan', 'saint'}
    if label in probe_attacks:
        return 'Probe'
    
    # R2L attacks
    r2l_attacks = {'guess_passwd', 'ftp_write', 'imap', 'phf', 'multihop',
                   'warezmaster', 'warezclient', 'spy', 'xlock', 'xsnoop',
                   'snmpguess', 'snmpgetattack', 'httptunnel', 'sendmail', 'named'}
    if label in r2l_attacks:
        return 'R2L'
    
    # U2R attacks
    u2r_attacks = {'buffer_overflow', 'loadmodule', 'rootkit', 'perl',
                   'sqlattack', 'xterm', 'ps'}
    if label in u2r_attacks:
        return 'U2R'
    
    return 'DoS'  # Fallback

def category_to_label(category: str) -> int:
    """Convert category to numeric label (0-4)"""
    mapping = {'Normal': 0, 'DoS': 1, 'Probe': 2, 'R2L': 3, 'U2R': 4}
    return mapping.get(category, 1)

def preprocess_nsl_kdd(df: pd.DataFrame) -> pd.DataFrame:
    """
    Preprocess NSL-KDD dataset:
    1. Map labels to 5 categories
    2. Encode categorical features
    3. Drop unnecessary columns
    4. Convert to numeric and handle missing values
    """
    df = df.copy()
    
    # Create category labels
    df["label"] = df["label"].astype(str).str.lower()
    df["category"] = df["label"].apply(get_attack_category)
    df["label_numeric"] = df["category"].apply(category_to_label).astype(int)
    
    # Encode categorical features
    categorical_cols = ["protocol_type", "service", "flag"]
    for col in categorical_cols:
        df[col] = pd.factorize(df[col].astype(str))[0]
    
    # Drop unnecessary columns
    df = df.drop(columns=["difficulty", "label"], errors='ignore')
    
    # Convert all to numeric (except category)
    for col in df.columns:
        if col not in ["category", "label_numeric"]:
            df[col] = pd.to_numeric(df[col], errors="coerce")
    
    df = df.fillna(0.0)
    return df

# Preprocess datasets
print("Processing train set...")
df_train_proc = preprocess_nsl_kdd(df_train)
print("Processing test set...")
df_test_proc = preprocess_nsl_kdd(df_test)

print("\n‚úì Preprocessing completed!")
print(f"\nüìä Train Set - Category Distribution:")
train_dist = df_train_proc['category'].value_counts()
for cat, count in train_dist.items():
    pct = count / len(df_train_proc) * 100
    print(f"   {cat}: {count:,} ({pct:.2f}%)")

print(f"\nüìä Test Set - Category Distribution:")
test_dist = df_test_proc['category'].value_counts()
for cat, count in test_dist.items():
    pct = count / len(df_test_proc) * 100
    print(f"   {cat}: {count:,} ({pct:.2f}%)")

Processing train set...
Processing test set...

‚úì Preprocessing completed!

üìä Train Set - Category Distribution:
   Normal: 67,343 (53.46%)
   DoS: 45,927 (36.46%)
   Probe: 11,656 (9.25%)
   R2L: 995 (0.79%)
   U2R: 52 (0.04%)

üìä Test Set - Category Distribution:
   Normal: 9,711 (43.08%)
   DoS: 7,460 (33.09%)
   R2L: 2,885 (12.80%)
   Probe: 2,421 (10.74%)
   U2R: 67 (0.30%)


## 4. Extract Features & Labels

In [4]:
# Extract features and labels
feature_cols = [c for c in df_train_proc.columns if c not in ["category", "label_numeric"]]

# TRAIN SET
X_train = df_train_proc[feature_cols].values.astype(np.float32)
y_train = df_train_proc["label_numeric"].values.astype(int)

# TEST SET
X_test = df_test_proc[feature_cols].values.astype(np.float32)
y_test = df_test_proc["label_numeric"].values.astype(int)

# Class names
class_names = ['Normal', 'DoS', 'Probe', 'R2L', 'U2R']

print("‚úì Features extracted successfully!")
print(f"\nüì¶ TRAIN SET:")
print(f"   X_train shape: {X_train.shape} ({X_train.shape[0]:,} samples, {X_train.shape[1]} features)")
print(f"   y_train distribution: {dict(zip(class_names, np.bincount(y_train)))}")

print(f"\nüì¶ TEST SET:")
print(f"   X_test shape: {X_test.shape} ({X_test.shape[0]:,} samples, {X_test.shape[1]} features)")
print(f"   y_test distribution: {dict(zip(class_names, np.bincount(y_test)))}")

‚úì Features extracted successfully!

üì¶ TRAIN SET:
   X_train shape: (125973, 41) (125,973 samples, 41 features)
   y_train distribution: {'Normal': np.int64(67343), 'DoS': np.int64(45927), 'Probe': np.int64(11656), 'R2L': np.int64(995), 'U2R': np.int64(52)}

üì¶ TEST SET:
   X_test shape: (22544, 41) (22,544 samples, 41 features)
   y_test distribution: {'Normal': np.int64(9711), 'DoS': np.int64(7460), 'Probe': np.int64(2421), 'R2L': np.int64(2885), 'U2R': np.int64(67)}


---
# PH·∫¶N C∆† B·∫¢N: Overall Evaluation

Train Static Random Forest on **FULL TRAIN SET** ‚Üí Test on **FULL TEST SET**

---

## 5. Train Static Random Forest Model

In [5]:
print("="*80)
print("TRAINING STATIC RANDOM FOREST ON FULL TRAIN SET")
print("="*80)

# Initialize RandomForestClassifier
static_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=20,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=RANDOM_SEED,
    n_jobs=-1,
    verbose=0
)

print(f"\nModel configuration:")
print(f"  - n_estimators: {static_model.n_estimators}")
print(f"  - max_depth: {static_model.max_depth}")
print(f"  - random_state: {RANDOM_SEED}")

# Train model
import time
print(f"\nTraining on {len(X_train):,} samples...")
start_time = time.time()
static_model.fit(X_train, y_train)
train_time = time.time() - start_time

print(f"‚úì Training completed in {train_time:.2f} seconds ({train_time/60:.2f} minutes)")

# Evaluate on train set (sanity check)
y_train_pred = static_model.predict(X_train)
train_acc = accuracy_score(y_train, y_train_pred)
train_f1_macro = f1_score(y_train, y_train_pred, average='macro', zero_division=0)
train_f1_weighted = f1_score(y_train, y_train_pred, average='weighted', zero_division=0)

print(f"\nüìä Performance on TRAIN SET (sanity check):")
print(f"  - Accuracy:        {train_acc:.4f} ({train_acc*100:.2f}%)")
print(f"  - F1-Score (macro):    {train_f1_macro:.4f}")
print(f"  - F1-Score (weighted): {train_f1_weighted:.4f}")
print(f"\n‚úÖ Model fits well on historical data (2015)")

TRAINING STATIC RANDOM FOREST ON FULL TRAIN SET

Model configuration:
  - n_estimators: 100
  - max_depth: 20
  - random_state: 42

Training on 125,973 samples...
‚úì Training completed in 1.22 seconds (0.02 minutes)

üìä Performance on TRAIN SET (sanity check):
  - Accuracy:        0.9994 (99.94%)
  - F1-Score (macro):    0.9582
  - F1-Score (weighted): 0.9994

‚úÖ Model fits well on historical data (2015)


## 6. Evaluate on TEST SET (New Data with Drift)

In [6]:
print("="*80)
print("EVALUATING ON TEST SET (New Data - 2016)")
print("="*80)

# Predict on test set
print(f"\nPredicting on {len(X_test):,} test samples...")
y_test_pred = static_model.predict(X_test)

# Calculate metrics
test_acc = accuracy_score(y_test, y_test_pred)
test_f1_macro = f1_score(y_test, y_test_pred, average='macro', zero_division=0)
test_f1_weighted = f1_score(y_test, y_test_pred, average='weighted', zero_division=0)

# Calculate accuracy drop
acc_drop = train_acc - test_acc
acc_drop_pct = (acc_drop / train_acc) * 100

print(f"\nüìä Performance on TEST SET:")
print(f"  - Accuracy:        {test_acc:.4f} ({test_acc*100:.2f}%)")
print(f"  - F1-Score (macro):    {test_f1_macro:.4f}")
print(f"  - F1-Score (weighted): {test_f1_weighted:.4f}")

print(f"\nüìâ CATASTROPHIC FORGETTING DETECTED:")
print(f"  - Train Accuracy: {train_acc:.4f} ({train_acc*100:.2f}%)")
print(f"  - Test Accuracy:  {test_acc:.4f} ({test_acc*100:.2f}%)")
print(f"  - Accuracy Drop:  {acc_drop:.4f} ({acc_drop_pct:.2f}%)")
print(f"\n‚ùå Model FAILS on evolved threats due to distribution shift!")

# Store results
results_summary = pd.DataFrame({
    'Dataset': ['Train (2015)', 'Test (2016)', 'Drop'],
    'Accuracy': [train_acc, test_acc, acc_drop],
    'F1-Macro': [train_f1_macro, test_f1_macro, train_f1_macro - test_f1_macro],
    'F1-Weighted': [train_f1_weighted, test_f1_weighted, train_f1_weighted - test_f1_weighted],
    'Samples': [f'{len(y_train):,}', f'{len(y_test):,}', '-']
})

print(f"\nüìã Summary Table:")
print(results_summary.to_string(index=False))

EVALUATING ON TEST SET (New Data - 2016)

Predicting on 22,544 test samples...

üìä Performance on TEST SET:
  - Accuracy:        0.6949 (69.49%)
  - F1-Score (macro):    0.4372
  - F1-Score (weighted): 0.6466

üìâ CATASTROPHIC FORGETTING DETECTED:
  - Train Accuracy: 0.9994 (99.94%)
  - Test Accuracy:  0.6949 (69.49%)
  - Accuracy Drop:  0.3045 (30.47%)

‚ùå Model FAILS on evolved threats due to distribution shift!

üìã Summary Table:
     Dataset  Accuracy  F1-Macro  F1-Weighted Samples
Train (2015)    0.9994    0.9582       0.9994 125,973
 Test (2016)    0.6949    0.4372       0.6466  22,544
        Drop    0.3045    0.5210       0.3528       -
