# üõ°Ô∏è APT Attack Detection - Complete System Management

**Comprehensive notebook for managing the entire APT Detection system**

## üìã Table of Contents
1. Environment Setup & Verification
2. Data Preparation (MITRE, CTI, Events)
3. Engine Bootstrap & Dataset Linking
4. CTI Agent Pipeline
5. Hunting Pipeline (Demo & Real-time)
6. Training Pipeline
7. Evaluation & Metrics
8. Visualization & Analysis
9. Troubleshooting

---

## 1Ô∏è‚É£ Environment Setup & Verification

### Check Python Version & Install Dependencies

In [None]:
import sys
import subprocess
from pathlib import Path

print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")
print(f"\nCurrent directory: {Path.cwd()}")

# Check if we're in the right directory
if not (Path.cwd() / 'src').exists():
    print("\n‚ö†Ô∏è  WARNING: Not in APT-Attack-Detection directory!")
    print("Please navigate to the repo root first.")
else:
    print("\n‚úÖ In correct directory")

In [None]:
# Install core dependencies
!pip install -q -r requirements/core.txt
print("‚úÖ Core dependencies installed")

In [None]:
# Install agent dependencies
!pip install -q -r requirements/agent.txt
print("‚úÖ Agent dependencies installed")

In [None]:
# Install g4f (optional - for free LLM backend)
try:
    !pip install -q -r requirements/g4f.txt
    print("‚úÖ g4f installed (free LLM backend available)")
except:
    print("‚ö†Ô∏è  g4f installation failed (optional - you can use OpenAI instead)")

In [None]:
# Verify key imports
import networkx as nx
import yaml
import feedparser
import requests
from bs4 import BeautifulSoup

try:
    import openai
    print("‚úÖ OpenAI library available")
except:
    print("‚ö†Ô∏è  OpenAI not available (install: pip install openai)")

try:
    import g4f
    print("‚úÖ g4f available (free backend)")
except:
    print("‚ÑπÔ∏è  g4f not available (optional)")

print("\n‚úÖ All essential imports successful")

## 2Ô∏è‚É£ Data Preparation

### 2.1 Download MITRE ATT&CK STIX Data

In [None]:
import requests
from pathlib import Path

mitre_dir = Path("data/mitre")
mitre_dir.mkdir(parents=True, exist_ok=True)

stix_file = mitre_dir / "enterprise-attack.json"

if stix_file.exists():
    print(f"‚úÖ MITRE ATT&CK STIX already exists: {stix_file}")
    print(f"   Size: {stix_file.stat().st_size / 1024 / 1024:.2f} MB")
else:
    print("Downloading MITRE ATT&CK STIX...")
    url = "https://raw.githubusercontent.com/mitre/cti/master/enterprise-attack/enterprise-attack.json"
    
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    
    stix_file.write_bytes(response.content)
    print(f"‚úÖ Downloaded: {stix_file}")
    print(f"   Size: {len(response.content) / 1024 / 1024:.2f} MB")

# Verify the file
import json
stix_data = json.loads(stix_file.read_text())
techniques = [obj for obj in stix_data['objects'] if obj.get('type') == 'attack-pattern']
print(f"\n‚úÖ Loaded {len(techniques)} ATT&CK techniques")

### 2.2 Setup CTI RSS Feeds

In [None]:
from pathlib import Path

cti_dir = Path("data/cti_reports")
cti_dir.mkdir(parents=True, exist_ok=True)

rss_file = cti_dir / "rss_seeds.txt"

# Default RSS feeds for cyber threat intelligence
default_feeds = [
    "# Government & Official Sources",
    "https://www.cisa.gov/cybersecurity-advisories/all.xml",
    "https://www.us-cert.gov/ncas/current-activity.xml",
    "",
    "# Security News",
    "https://www.bleepingcomputer.com/feed/",
    "https://thehackernews.com/feeds/posts/default",
    "https://feeds.feedburner.com/TheHackersNews",
    "",
    "# Threat Research",
    "https://www.crowdstrike.com/blog/feed/",
    "https://www.fireeye.com/blog/threat-research.html/feed",
    "# Add more feeds as needed",
]

if rss_file.exists():
    print(f"‚úÖ RSS feeds file already exists: {rss_file}")
    feeds = [line.strip() for line in rss_file.read_text().splitlines() 
             if line.strip() and not line.strip().startswith('#')]
    print(f"   Contains {len(feeds)} active feeds")
else:
    rss_file.write_text("\n".join(default_feeds))
    print(f"‚úÖ Created RSS feeds file: {rss_file}")
    print(f"   Added {len([f for f in default_feeds if f and not f.startswith('#')])} default feeds")

print("\nüì° RSS Feeds:")
print(rss_file.read_text())

### 2.3 Create Sample Events (for Testing)

In [None]:
import json
from pathlib import Path
import time

events_dir = Path("runs/events")
events_dir.mkdir(parents=True, exist_ok=True)

events_file = events_dir / "events.jsonl"

# Create sample suspicious events
sample_events = [
    # Suspicious process in /tmp
    {
        "kind": "process_start",
        "ts": time.time(),
        "pid": 1234,
        "ppid": 1000,
        "exe": "/tmp/malicious_binary",
        "comm": "malicious_binary"
    },
    # File write to suspicious location
    {
        "kind": "file_op",
        "ts": time.time() + 1,
        "pid": 1234,
        "exe": "/tmp/malicious_binary",
        "comm": "malicious_binary",
        "path": "/tmp/.hidden_payload",
        "action": "WRITE"
    },
    # Network connection
    {
        "kind": "net_op",
        "ts": time.time() + 2,
        "pid": 1234,
        "exe": "/tmp/malicious_binary",
        "comm": "malicious_binary",
        "saddr": "192.168.1.100:8080"
    },
    # Normal process for comparison
    {
        "kind": "process_start",
        "ts": time.time() + 3,
        "pid": 5678,
        "ppid": 1,
        "exe": "/usr/bin/bash",
        "comm": "bash"
    },
]

with events_file.open('w') as f:
    for event in sample_events:
        f.write(json.dumps(event) + '\n')

print(f"‚úÖ Created sample events: {events_file}")
print(f"   Events count: {len(sample_events)}")
print("\nüìã Sample events preview:")
for i, ev in enumerate(sample_events[:3], 1):
    print(f"{i}. {ev['kind']}: {ev.get('exe', ev.get('path', 'N/A'))}")

## 3Ô∏è‚É£ Engine Bootstrap & Dataset Linking

### 3.1 Bootstrap GNN Engine (Manual Step)

In [None]:
from pathlib import Path

engine_dir = Path("src/engine/graph_matcher/engine_repo")

if (engine_dir / "src").exists():
    print(f"‚úÖ Engine repository exists at: {engine_dir}")
    print(f"   Files: {list(engine_dir.glob('*'))}")
else:
    print("‚ö†Ô∏è  Engine repository NOT found")
    print("\nüìù To bootstrap the engine, run:")
    print("\n  bash scripts/bootstrap_engine.sh <MEGR_APT_GIT_URL>")
    print("\n  Replace <MEGR_APT_GIT_URL> with the actual repository URL")
    print("\n‚ö†Ô∏è  Without the engine, you CAN'T run training or full hunting pipeline")
    print("   But CTI Agent pipeline will work fine!")

### 3.2 Link DARPA TC Datasets (if available)

In [None]:
import subprocess
from pathlib import Path

engine_dataset_dir = Path("src/engine/graph_matcher/engine_repo/dataset")
target_dir = Path("data/datasets")

if engine_dataset_dir.exists():
    print("Linking DARPA TC datasets...")
    result = subprocess.run(["bash", "scripts/link_tc_datasets.sh"], 
                          capture_output=True, text=True)
    print(result.stdout)
    if result.returncode == 0:
        print("‚úÖ Dataset linking completed")
    else:
        print(f"‚ö†Ô∏è  Linking failed: {result.stderr}")
else:
    print("‚ö†Ô∏è  Engine datasets not available")
    print("   This is OK for CTI Agent testing")
    print("   Required only for training and full hunting")

## 4Ô∏è‚É£ CTI Agent Pipeline

### 4.1 Configure LLM Backend

In [None]:
import os

# Option 1: Use OpenAI (requires API key)
# Uncomment and set your API key:
# os.environ['OPENAI_API_KEY'] = 'sk-...'
# os.environ['OPENAI_MODEL'] = 'gpt-4o-mini'  # or gpt-4, gpt-3.5-turbo
# llm_backend = 'openai'

# Option 2: Use g4f (free, no API key)
llm_backend = 'g4f'

print(f"ü§ñ LLM Backend: {llm_backend}")

if llm_backend == 'openai':
    if os.getenv('OPENAI_API_KEY'):
        print("‚úÖ OpenAI API key configured")
    else:
        print("‚ö†Ô∏è  OPENAI_API_KEY not set!")
        print("   Set it above or use g4f instead")
elif llm_backend == 'g4f':
    try:
        import g4f
        print("‚úÖ g4f available")
    except:
        print("‚ùå g4f not installed")
        print("   Install: pip install -r requirements/g4f.txt")

### 4.2 Run CTI Agent Pipeline

In [None]:
%%time
import subprocess

cmd = [
    "python", "-m", "src.pipeline.agent.main",
    "--rss-file", "data/cti_reports/rss_seeds.txt",
    "--stix", "data/mitre/enterprise-attack.json",
    "--out-cti", "runs/cti",
    "--out-qg", "data/query_graphs",
    "--out-seeds", "runs/cti/seeds.json",
    "--llm-backend", llm_backend,
    "--per-source-limit", "3",  # Limit to 3 items per feed for demo
]

print(f"üöÄ Running CTI Agent with {llm_backend} backend...")
print(f"Command: {' '.join(cmd)}\n")

result = subprocess.run(cmd, capture_output=True, text=True)

print("STDOUT:")
print(result.stdout)

if result.returncode != 0:
    print("\nSTDERR:")
    print(result.stderr)
else:
    print("\n‚úÖ CTI Agent completed successfully!")

### 4.3 Inspect CTI Agent Results

In [None]:
import json
from pathlib import Path

# 1. Check seeds.json
seeds_file = Path("runs/cti/seeds.json")
if seeds_file.exists():
    seeds = json.loads(seeds_file.read_text())
    print(f"üéØ CTI Seeds Summary:")
    print(f"   Techniques: {len(seeds.get('techniques', []))}")
    print(f"   Indicators: {len(seeds.get('indicators', []))}")
    
    print("\nüìã Top 5 Techniques:")
    for i, tech in enumerate(seeds.get('techniques', [])[:5], 1):
        tid = tech.get('technique_id', 'N/A')
        conf = tech.get('confidence', 0)
        print(f"   {i}. {tid} (confidence: {conf:.2f})")
    
    print("\nüîç Top 5 Indicators:")
    for i, ind in enumerate(seeds.get('indicators', [])[:5], 1):
        itype = ind.get('type', 'N/A')
        value = ind.get('value', 'N/A')[:50]
        print(f"   {i}. [{itype}] {value}")
else:
    print("‚ö†Ô∏è  seeds.json not found")

# 2. Check query graphs
qg_dir = Path("data/query_graphs")
if qg_dir.exists():
    qg_files = list(qg_dir.glob("*.json"))
    print(f"\nüìä Query Graphs: {len(qg_files)} generated")
    for qg in qg_files[:5]:
        print(f"   - {qg.name}")

# 3. Check CTI items
cti_dir = Path("runs/cti")
if cti_dir.exists():
    cti_files = list(cti_dir.glob("cti_*.json"))
    print(f"\nüì∞ CTI Items: {len(cti_files)} processed")

## 5Ô∏è‚É£ Hunting Pipeline

### 5.1 Visualize Provenance Graph (from Sample Events)

In [None]:
import sys
sys.path.insert(0, str(Path.cwd()))

from src.common.io import read_jsonl
from src.pipeline.hunting.provenance import WindowedProvenanceGraph
import networkx as nx
import matplotlib.pyplot as plt

# Build provenance graph from sample events
pg = WindowedProvenanceGraph(window_seconds=300, max_nodes=10000)

events_file = Path("runs/events/events.jsonl")
if events_file.exists():
    for ev in read_jsonl(events_file):
        pg.ingest(ev)
    
    print(f"üåê Provenance Graph:")
    print(f"   Nodes: {pg.g.number_of_nodes()}")
    print(f"   Edges: {pg.g.number_of_edges()}")
    
    # Visualize
    plt.figure(figsize=(12, 8))
    pos = nx.spring_layout(pg.g, k=2, iterations=50)
    
    # Color nodes by type
    colors = []
    for node in pg.g.nodes():
        ntype = pg.g.nodes[node].get('ntype', 'unknown')
        if ntype == 'process':
            colors.append('lightblue')
        elif ntype == 'file':
            colors.append('lightgreen')
        elif ntype == 'socket':
            colors.append('orange')
        else:
            colors.append('gray')
    
    nx.draw(pg.g, pos, node_color=colors, with_labels=True, 
            node_size=1000, font_size=8, arrows=True)
    
    plt.title("Provenance Graph Visualization")
    plt.legend(['Process (blue)', 'File (green)', 'Socket (orange)'])
    plt.tight_layout()
    plt.show()
    
    # Print node details
    print("\nüìã Nodes Detail:")
    for node, data in pg.g.nodes(data=True):
        ntype = data.get('ntype', 'unknown')
        if ntype == 'process':
            print(f"   {node} [{ntype}]: {data.get('exe', 'N/A')}")
        elif ntype == 'file':
            print(f"   {node} [{ntype}]: {data.get('path', 'N/A')}")
        elif ntype == 'socket':
            print(f"   {node} [{ntype}]: {data.get('saddr', 'N/A')}")
else:
    print("‚ö†Ô∏è  No events file found. Create sample events first.")

### 5.2 Find Seeds (CTI-based + Heuristics)

In [None]:
from src.pipeline.hunting.seeding import find_seeds

if pg.g.number_of_nodes() > 0:
    seeds = find_seeds(
        pg.g, 
        query_name="demo",
        cti_seeds_path="runs/cti/seeds.json"
    )
    
    print(f"üéØ Seed Nodes Found: {len(seeds)}")
    for i, seed in enumerate(seeds, 1):
        node_data = pg.g.nodes[seed]
        ntype = node_data.get('ntype', 'unknown')
        if ntype == 'process':
            info = node_data.get('exe', 'N/A')
        elif ntype == 'file':
            info = node_data.get('path', 'N/A')
        else:
            info = node_data.get('saddr', 'N/A')
        print(f"   {i}. {seed} [{ntype}]: {info}")
    
    if seeds:
        print("\n‚úÖ Seeds identified - ready for subgraph extraction")
    else:
        print("\n‚ö†Ô∏è  No seeds found (this is OK for benign events)")
else:
    print("‚ö†Ô∏è  No graph to search")

### 5.3 Run Full Hunting Pipeline (Requires Engine)

In [None]:
engine_exists = (Path("src/engine/graph_matcher/engine_repo/src").exists())
checkpoint_exists = len(list(Path("runs/checkpoints").glob("*.pt"))) > 0 if Path("runs/checkpoints").exists() else False

if engine_exists and checkpoint_exists:
    print("üöÄ Running hunting pipeline...")
    
    cmd = [
        "python", "-m", "src.pipeline.hunting.main",
        "--dataset", "cadets",
        "--events", "runs/events/events.jsonl",
        "--checkpoint", str(list(Path("runs/checkpoints").glob("*.pt"))[0]),
        "--query-name", "demo",
        "--cti-seeds", "runs/cti/seeds.json",
    ]
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    print(result.stdout)
    
    if result.returncode != 0:
        print("\nErrors:")
        print(result.stderr)
else:
    print("‚ö†Ô∏è  Cannot run full hunting pipeline:")
    if not engine_exists:
        print("   - Engine not installed (see section 3.1)")
    if not checkpoint_exists:
        print("   - No checkpoint found (need to train first or download pretrained)")
    print("\nüí° You can still test provenance graph and seeding (see cells above)")

## 6Ô∏è‚É£ Training Pipeline (Requires Engine + Datasets)

### 6.1 Check Training Prerequisites

In [None]:
engine_exists = (Path("src/engine/graph_matcher/engine_repo/src").exists())
datasets_exist = (Path("data/datasets/darpa_cadets").exists())

print("üîç Training Prerequisites:")
print(f"   Engine: {'‚úÖ' if engine_exists else '‚ùå'}")
print(f"   DARPA Datasets: {'‚úÖ' if datasets_exist else '‚ùå'}")

if engine_exists and datasets_exist:
    print("\n‚úÖ Ready to train!")
else:
    print("\n‚ö†Ô∏è  Missing prerequisites for training")
    if not engine_exists:
        print("   - Install engine (see section 3.1)")
    if not datasets_exist:
        print("   - Download DARPA TC datasets")

### 6.2 Run Training (if prerequisites met)

In [None]:
if engine_exists and datasets_exist:
    print("üéì Starting training...")
    
    Path("runs/checkpoints").mkdir(parents=True, exist_ok=True)
    
    cmd = [
        "python", "-m", "src.pipeline.train.trainer",
        "--dataset", "cadets",
        "--epochs", "10",  # Small number for demo
        "--save", "runs/checkpoints/demo_model.pt",
    ]
    
    print(f"Command: {' '.join(cmd)}\n")
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    print(result.stdout)
    
    if result.returncode == 0:
        print("\n‚úÖ Training completed!")
    else:
        print("\nErrors:")
        print(result.stderr)
else:
    print("‚ö†Ô∏è  Skipping training (prerequisites not met)")

## 7Ô∏è‚É£ Evaluation & Metrics

### 7.1 Evaluate CTI Agent (Custom Metrics)

In [None]:
import json
from pathlib import Path
from collections import Counter

seeds_file = Path("runs/cti/seeds.json")

if seeds_file.exists():
    seeds = json.loads(seeds_file.read_text())
    techniques = seeds.get('techniques', [])
    indicators = seeds.get('indicators', [])
    
    print("üìä CTI Agent Evaluation Metrics:\n")
    
    # 1. Quantity metrics
    print(f"1. Extraction Counts:")
    print(f"   - Techniques: {len(techniques)}")
    print(f"   - Indicators: {len(indicators)}")
    
    # 2. Technique distribution
    print(f"\n2. Technique Distribution:")
    tech_ids = [t.get('technique_id') for t in techniques]
    tech_counts = Counter(tech_ids)
    for tid, count in tech_counts.most_common(10):
        print(f"   {tid}: {count} occurrences")
    
    # 3. Confidence distribution
    print(f"\n3. Confidence Distribution:")
    confidences = [t.get('confidence', 0) for t in techniques]
    if confidences:
        import numpy as np
        print(f"   Mean: {np.mean(confidences):.3f}")
        print(f"   Median: {np.median(confidences):.3f}")
        print(f"   Std: {np.std(confidences):.3f}")
        print(f"   Min: {np.min(confidences):.3f}")
        print(f"   Max: {np.max(confidences):.3f}")
    
    # 4. Indicator types
    print(f"\n4. Indicator Types:")
    ind_types = Counter([i.get('type') for i in indicators])
    for itype, count in ind_types.items():
        print(f"   {itype}: {count}")
    
    # 5. Visualization
    if confidences:
        import matplotlib.pyplot as plt
        
        fig, axes = plt.subplots(1, 2, figsize=(14, 5))
        
        # Confidence histogram
        axes[0].hist(confidences, bins=20, edgecolor='black')
        axes[0].set_xlabel('Confidence')
        axes[0].set_ylabel('Count')
        axes[0].set_title('Technique Confidence Distribution')
        axes[0].axvline(np.mean(confidences), color='r', 
                       linestyle='--', label=f'Mean: {np.mean(confidences):.2f}')
        axes[0].legend()
        
        # Top techniques
        top_techs = tech_counts.most_common(10)
        if top_techs:
            tids, counts = zip(*top_techs)
            axes[1].barh(range(len(tids)), counts)
            axes[1].set_yticks(range(len(tids)))
            axes[1].set_yticklabels(tids)
            axes[1].set_xlabel('Occurrences')
            axes[1].set_title('Top 10 Techniques')
            axes[1].invert_yaxis()
        
        plt.tight_layout()
        plt.show()
else:
    print("‚ö†Ô∏è  No seeds.json found - run CTI Agent first")

## 8Ô∏è‚É£ System Status Dashboard

### Complete System Health Check

In [None]:
from pathlib import Path
import json

def check_file(path, desc):
    p = Path(path)
    if p.exists():
        size = p.stat().st_size if p.is_file() else "N/A"
        return f"‚úÖ {desc}: {p} ({size} bytes)"
    return f"‚ùå {desc}: {path} NOT FOUND"

def check_dir(path, desc):
    p = Path(path)
    if p.exists() and p.is_dir():
        count = len(list(p.iterdir()))
        return f"‚úÖ {desc}: {p} ({count} items)"
    return f"‚ùå {desc}: {path} NOT FOUND"

print("="*60)
print(" üõ°Ô∏è  APT ATTACK DETECTION - SYSTEM STATUS DASHBOARD")
print("="*60)

print("\nüì¶ Data Files:")
print(check_file("data/mitre/enterprise-attack.json", "MITRE ATT&CK"))
print(check_file("data/cti_reports/rss_seeds.txt", "RSS Feeds"))
print(check_file("runs/events/events.jsonl", "Sample Events"))

print("\nü§ñ Engine & Datasets:")
print(check_dir("src/engine/graph_matcher/engine_repo", "GNN Engine"))
print(check_dir("data/datasets", "DARPA Datasets"))

print("\nüìä Outputs:")
print(check_file("runs/cti/seeds.json", "CTI Seeds"))
print(check_dir("data/query_graphs", "Query Graphs"))
print(check_dir("runs/checkpoints", "Model Checkpoints"))

print("\nüîß Python Packages:")
packages = ['networkx', 'yaml', 'feedparser', 'openai', 'g4f', 'torch']
for pkg in packages:
    try:
        __import__(pkg)
        print(f"   ‚úÖ {pkg}")
    except:
        print(f"   ‚ùå {pkg}")

print("\nüö¶ Pipeline Status:")
status = []
status.append(("CTI Agent", 
               Path("data/mitre/enterprise-attack.json").exists() and 
               Path("data/cti_reports/rss_seeds.txt").exists()))
status.append(("Hunting", 
               Path("src/engine/graph_matcher/engine_repo/src").exists() and
               Path("runs/events/events.jsonl").exists()))
status.append(("Training", 
               Path("src/engine/graph_matcher/engine_repo/src").exists() and
               Path("data/datasets").exists()))

for name, ready in status:
    symbol = "‚úÖ READY" if ready else "‚ö†Ô∏è  NOT READY"
    print(f"   {symbol}: {name}")

print("\n" + "="*60)

# Summary
ready_count = sum(1 for _, ready in status if ready)
if ready_count == 3:
    print("üéâ ALL PIPELINES READY!")
elif ready_count >= 1:
    print(f"‚úÖ {ready_count}/3 pipelines ready")
else:
    print("‚ö†Ô∏è  Setup incomplete - see missing items above")
print("="*60)

## 9Ô∏è‚É£ Troubleshooting & Help

### Common Issues

In [None]:
print("""
üîß TROUBLESHOOTING GUIDE
========================

1. "OPENAI_API_KEY not set"
   Solution: Use g4f backend or set API key:
   >>> os.environ['OPENAI_API_KEY'] = 'sk-...'

2. "Engine not found"
   Solution: Bootstrap engine:
   $ bash scripts/bootstrap_engine.sh <MEGR_APT_GIT_URL>

3. "No module named 'torch'"
   Solution: Install hunting requirements:
   $ pip install -r requirements/hunting.txt

4. "g4f not available"
   Solution: Install g4f:
   $ pip install -r requirements/g4f.txt

5. "No CTI results"
   Possible causes:
   - RSS feeds unreachable (check network)
   - LLM rate limiting (wait and retry)
   - Invalid STIX data (re-download MITRE file)

6. "Graph is empty"
   Solution: Create sample events first (see section 2.3)

7. "Cannot run hunting"
   Requirements:
   - Engine installed
   - Checkpoint file (.pt)
   - Events data

üìö For more help, see:
   - README.md
   - ANALYSIS_AND_GAPS.md
   - GitHub issues
""")

---

## üìù Next Steps

1. **For CTI Agent Testing**:
   - Run sections 1-4
   - Experiment with different RSS feeds
   - Compare OpenAI vs g4f results

2. **For Full System**:
   - Bootstrap engine (section 3)
   - Obtain DARPA datasets
   - Train models (section 6)
   - Run hunting pipeline (section 5)

3. **For Evaluation**:
   - Implement `src/eval/agent_eval.py`
   - Implement `src/eval/hunting_eval.py`
   - Create ground truth datasets

---

**Version**: 1.0  
**Last Updated**: 2026-01-04  
**Author**: APT Detection Team
