# IoT Device Identification with Adversarial Training

This notebook runs the adversarial training pipeline on Google Colab with GPU support.

**Features:**
- Automatic GPU detection
- Google Drive integration for saving results
- Clone from GitHub repository
- Run adversarial training experiments

## 1. Setup Environment

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!git clone https://github.com/yacinemkk/pfe.git /content/pfe
%cd /content/pfe

Cloning into '/content/pfe'...
remote: Enumerating objects: 68, done.[K
remote: Counting objects: 100% (68/68), done.[K
remote: Compressing objects: 100% (51/51), done.[K
remote: Total 68 (delta 21), reused 58 (delta 13), pack-reused 0 (from 0)[K
Receiving objects: 100% (68/68), 2.42 MiB | 18.51 MiB/s, done.
Resolving deltas: 100% (21/21), done.
/content/pfe


In [3]:
!pip install torch scikit-learn pandas numpy tqdm -q

In [4]:
import sys
sys.path.insert(0, '/content/pfe')

import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.10.0+cpu
CUDA available: False


## 2. Configure Paths for Google Drive

Results will be saved to your Google Drive under `/content/drive/MyDrive/pfe_results/`

In [5]:
from pathlib import Path
import config.config as config

GDRIVE_BASE = Path('/content/drive/MyDrive/pfe_results')
GDRIVE_BASE.mkdir(parents=True, exist_ok=True)

config.RESULTS_DIR = GDRIVE_BASE / 'models'
config.RESULTS_DIR.mkdir(parents=True, exist_ok=True)

print(f"Results will be saved to: {config.RESULTS_DIR}")

Results will be saved to: /content/drive/MyDrive/pfe_results/models


## 3. Data Setup

Upload your data to Google Drive or use the data in the repository.

Option A: Data already in repo under `data/` folder
Option B: Upload data to Google Drive and set path below

In [6]:
import os
from pathlib import Path

# Data is already in Google Drive at PFE/IPFIX_ML_Instances
GDRIVE_DATA = Path('/content/drive/MyDrive/PFE')
config.RAW_DATA_DIR = GDRIVE_DATA / 'IPFIX_ML_Instances'
config.PROCESSED_DATA_DIR = GDRIVE_DATA / 'processed'

print(f"Using data from: {config.RAW_DATA_DIR}")
if config.RAW_DATA_DIR.exists():
    print(f"Found {len(list(config.RAW_DATA_DIR.glob('home*_labeled.csv')))} labeled data files")
else:
    print("WARNING: Data directory not found!")

Using data from: /content/drive/MyDrive/PFE/IPFIX_ML_Instances
Found 12 labeled data files


## 4. Run Adversarial Training

Configure and run the training experiment.

In [7]:
from train_adversarial import run_experiment, compare_models

MODEL_TYPE = 'lstm'          # Options: 'lstm', 'transformer', 'cnn_lstm', 'cnn'
SEQ_LENGTH = 10              # Sequence length (try 10, 25, or 50)
ADV_METHOD = 'hybrid'        # Options: 'none', 'feature', 'pgd', 'fgsm', 'hybrid'
ADV_RATIO = 0.2              # Ratio of adversarial samples (0.0 - 1.0)
EPOCHS = 30                  # Number of training epochs
BATCH_SIZE = 64              # Batch size
MAX_FILES = None             # Limit data files (None for all)

In [None]:
results = run_experiment(
    model_type=MODEL_TYPE,
    seq_length=SEQ_LENGTH,
    adv_method=ADV_METHOD,
    adv_ratio=ADV_RATIO,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    max_files=MAX_FILES,
    save_results=True
)

print("\n" + "="*60)
print("TRAINING COMPLETE")
print("="*60)
print(f"Test Accuracy (Clean): {results['test_accuracy_clean']:.4f}")
if 'adversarial_results' in results:
    print("\nAdversarial Robustness:")
    for attack, metrics in results['adversarial_results'].items():
        print(f"  {attack}: {metrics['accuracy']:.4f}")


Experiment: LSTM | Seq=10 | Adv=hybrid
Device: cpu

Loading data with sequence length=10, stride=5
  Loading home10_labeled.csv...
  Loading home11_labeled.csv...
  Loading home12_labeled.csv...
  Loading home1_labeled.csv...
  Loading home2_labeled.csv...
  Loading home3_labeled.csv...
  Loading home4_labeled.csv...


## 5. Run Full Comparison (Optional)

Compare all models, sequence lengths, and adversarial methods.

In [None]:
RUN_COMPARISON = False  # Set to True to run full comparison

if RUN_COMPARISON:
    comparison_results = compare_models(
        seq_lengths=[10, 25],
        models=['lstm', 'transformer'],
        adv_methods=['none', 'pgd', 'hybrid'],
        epochs=20,
        max_files=None
    )

## 6. Save Final Results to Google Drive

In [None]:
import json
from datetime import datetime

final_results = {
    'timestamp': datetime.now().isoformat(),
    'experiment': results
}

results_file = GDRIVE_BASE / f"final_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
with open(results_file, 'w') as f:
    json.dump(final_results, f, indent=2, default=str)

print(f"Results saved to: {results_file}")
print("\nAll done! Check your Google Drive for saved models and results.")