Skip to content

zbuc/castalia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Castalia - AWS Inspector Vulnerability Triage Assistant

Fast manual triage interface that builds training data for future AI automation.

Overview

Phase 1 (MVP): Smart manual triage tool

  1. Rule-based pre-filtering - Auto-classify obvious cases (60-70%)
  2. AI pre-triage suggestions - AWS Bedrock suggests decisions based on historical patterns
  3. Fast review interface - Keyboard-driven triage with pre-filled suggestions
  4. Training data capture - Structured labels + rationales for ML

Phase 2 (After 500+ labels): Fine-tune DeBERTa classifier

  1. Train custom model on your triage decisions
  2. Auto-classify 80-90% of findings
  3. Manual review only for low-confidence cases

Goal: Make manual triage 10x faster while building ML dataset.

Architecture

Phase 1: Manual Triage (Week 1-4)

AWS Inspector Export
        ↓
Rule-Based Pre-Filter (auto-classify 60-70%)
        ↓
    ┌────────────────┐
    │ Needs Review   │ (30-40% of findings)
    └────────────────┘
        ↓
AI Pre-Triage (AWS Bedrock suggests decisions)
        ↓
Fast Triage UI (pre-filled suggestions, keyboard shortcuts)
        ↓
SQLite Database (decisions + rationales + accuracy tracking)
        ↓
Export Training Data (JSON for DeBERTa)

Phase 2: AI-Assisted (After 500+ labels)

AWS Inspector Export
        ↓
Rule-Based Filter (60-70%)
        ↓
Fine-tuned DeBERTa Classifier (20-30%)
        ↓
Manual Review Only (5-10% low-confidence)

Project Status

Current Phase: Phase 1 - Manual Triage Tool Timeline: 1 week to working interface Next Steps:

  • Export sample Inspector findings
  • Implement parser and rule engine
  • Build triage UI
  • Label 500+ findings to build training dataset

Quick Start

Local Development

With Justfile (recommended):

# Install just: https://just.systems
brew install just  # macOS
# or: cargo install just

# See all commands
just

# Start development (production-like with Docker)
just up

# Or start native Python (faster iteration)
just setup
just dev

Manual setup:

# Setup
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Load sample data
python scripts/load_sample_data.py

# Optional: Load with AI-ready triage history (~70% pre-triaged)
python scripts/load_sample_data.py --with-triage-history

# Start triage interface
python web/triage_ui.py

# Open http://localhost:5000
# Use keyboard shortcuts to quickly triage findings

See LOCAL_DEV.md for complete local development guide and SAMPLE_DATA.md for sample data documentation.

Production Deployment (AWS ECS)

# Navigate to Terraform directory
cd deployment/terraform

# Configure variables
cp terraform.tfvars.example terraform.tfvars
vim terraform.tfvars

# Deploy infrastructure
terraform init
terraform plan
terraform apply

# Build and push Docker image (see outputs)
# Deploy takes ~5-10 minutes

See deployment/terraform/README.md for complete deployment guide.

Project Structure

castalia/
├── README.md                       # This file
├── LOCAL_DEV.md                    # ⭐ Local development guide
├── INSPECTOR_DATA_EXPORT.md        # ⭐ Export AWS Inspector data for local dev
├── JUSTFILE_COMMANDS.md            # Quick reference for just commands
├── justfile                        # ⭐ Task automation (just commands)
├── AWS_DEPLOYMENT.md               # AWS deployment guide
├── DEPLOYMENT_SUMMARY.md           # Quick deployment reference
├── MODEL_SELECTION.md              # Analysis of model choices
├── COST_PROJECTIONS.md             # Cost analysis
├── COST_OPTIMIZATION_ANALYSIS.md   # ⭐ Architecture alternatives (Lambda vs Fargate)
├── Dockerfile                      # Production container
├── docker-compose.yml              # Local development
├── requirements.txt
├── .env.example                    # Environment variables template
├── data/
│   └── sample_inspector_findings.json  # ⭐ Sample data for local dev
├── scripts/
│   └── load_sample_data.py         # ⭐ Load sample findings into DB
├── src/
│   ├── inspector_sync.py           # AWS Inspector integration
│   ├── pre_triage_classifier.py    # AI-assisted pre-triage with AWS Bedrock ⭐
│   ├── rule_recommender.py         # Pattern detection & rule generation
│   ├── parser.py                   # Parse Inspector exports
│   ├── rules.py                    # Rule-based pre-filters
│   ├── train_deberta.py            # Phase 2: Train classifier (TODO)
│   └── integrations/               # Ticketing system plugins
│       ├── base.py                 # Plugin interface
│       ├── jira_plugin.py          # Jira integration with OAuth 2.0
│       └── plugin_manager.py       # Plugin factory
├── web/
│   ├── triage_ui.py                # Fast triage interface
│   └── templates/
│       ├── triage.html             # Main triage UI
│       ├── pre_triage_stats.html   # Pre-triage performance dashboard ⭐
│       └── rule_recommendations.html # Rule approval UI
├── output/
│   ├── triage.db                   # SQLite database
│   └── training_data.json          # Export for DeBERTa
├── deployment/
│   ├── terraform/                  # Infrastructure as Code ⭐
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   ├── README.md               # Terraform deployment guide
│   │   └── modules/
│   │       ├── vpc/
│   │       ├── ecs/
│   │       ├── efs/
│   │       ├── iam/
│   │       └── alb/
│   └── ecs/                        # Bash scripts (alternative)
│       ├── deploy.sh
│       └── setup-infrastructure.sh
└── tests/
    ├── test_rules.py
    └── test_import.py

⭐ = New files for production-like local development

Success Metrics

Phase 1 (Manual Triage Tool)

  • 10x faster triage - Keyboard shortcuts, templates, bulk actions
  • 500+ labeled examples - High-quality training dataset in 2-4 weeks
  • Rule coverage - 60%+ of findings auto-classified
  • Consistent labels - Structured decision format

Phase 2 (AI-Assisted)

  • DeBERTa accuracy - 85%+ agreement with analyst decisions
  • 95% automation - Only 5% of findings need manual review
  • Sub-second inference - <100ms per finding
  • Zero marginal cost - Self-hosted classifier

Technology Stack

Phase 1

  • Python 3.9+: Core processing
  • Flask: Fast triage UI
  • SQLite: Decision storage and analytics
  • Pandas: Data parsing and export

Phase 2 (Future)

  • Hugging Face Transformers: DeBERTa fine-tuning
  • PyTorch: Model training
  • AWS Bedrock (optional): Fallback for edge cases

Key Features

AI Pre-Triage (NEW)

  • 🤖 AWS Bedrock integration - Claude 3.5 Sonnet suggests decisions based on historical patterns
  • 💡 Pre-filled suggestions - Decision and rationale auto-populated for speed
  • 🎯 Context-aware - Uses VPC, IAM, tags, CVSS adjustments for accuracy
  • 💰 Cost-optimized - Groups similar findings (80-90% cost reduction)
  • 📊 Accuracy tracking - Monitors suggestion vs actual decisions
  • 🔒 Human-in-the-loop - All suggestions require analyst review

Triage UI

  • ⌨️ Keyboard shortcuts - Press 1/2/3 for decisions, Enter to submit
  • 📋 Rationale templates - Common patterns pre-filled
  • 🔄 Bulk operations - Apply decision to similar findings (CVE + resource type matching for safety)
  • 📊 Progress tracking - Real-time completion percentage
  • 💾 Auto-save - SQLite database prevents data loss

Rule Engine

  • Development dependencies - Auto-mark as irrelevant
  • Already patched versions - Version comparison
  • Environment mitigations - Lambda read-only FS, network isolation
  • Extensible - Easy to add new patterns
  • 🤖 Auto-learning - Recommends new rules based on triage patterns

AWS Integration

  • 🔄 Inspector API sync - Pull findings directly from AWS Inspector
  • 📝 Suppression rules - Push triage decisions back to Inspector
  • Lambda deployment - Cost-optimized serverless ($2-8/month)
  • 💾 EFS persistence - SQLite database on EFS with automatic backups
  • 🔒 Daily backups - AWS Backup with configurable retention (35 days default)
  • 🛡️ Disaster recovery - Point-in-time restore capability

Rule Recommendations

  • 🧠 Pattern detection - Learns from your triage decisions (4 pattern types)
  • 📊 Confidence scoring - Statistical validation (85%+ agreement, 5+ samples)
  • 🎯 Auto-generated code - Ready-to-use Python rules
  • Human-in-the-loop - Review and approve before activation

Jira Ticketing Integration

  • 🎫 Create Jira tickets - From vulnerability findings with one click
  • 🔗 Audit trail - Links back to AWS Inspector findings
  • 📝 Remediation tracking - Document remediation steps, track who/when
  • 📦 Bulk operations - Create individual or single bulk tickets
  • 📊 Real-time logs - Docked panel showing API interactions for debugging
  • 🔌 Plugin architecture - Extensible to other ticketing systems

Training Data Export

  • 📤 JSON format - Compatible with Hugging Face datasets
  • 🎯 Structured labels - decision + rationale + confidence
  • 📈 Analytics - Rule accuracy, time tracking, decision breakdown

Documentation

Getting Started

Deployment

Architecture & Design

Features & Design

Why This Approach?

Manual review is unavoidable for security decisions. Instead of paying for AI suggestions you have to verify anyway, this tool:

  1. Makes manual review 10x faster (keyboard UI, templates, bulk ops)
  2. Captures high-quality training data while you work
  3. Learns from your decisions - Recommends new automation rules
  4. Builds toward fully automated AI (DeBERTa) once you have labels
  5. Costs $0 for Phase 1 (no API fees)

See MODEL_SELECTION.md for detailed analysis.

How Pre-Triage Works

The system uses AI to suggest triage decisions based on historical patterns:

Day 1: Sync findings from AWS Inspector
  → python src/inspector_sync.py --pull

Day 1: Run pre-triage classification
  → python src/pre_triage_classifier.py --days 7
  → Groups similar findings (package + resource type)
  → Uses AWS Bedrock Claude to analyze with historical context
  → Generates suggestions for all findings in group

Day 1: Analyst reviews in UI
  → 💡 AI Suggestion: IRRELEVANT (85% confidence)
  → Decision and rationale pre-filled
  → Analyst confirms or modifies
  → 3-5x faster than writing from scratch

Ongoing: System learns
  → Tracks accuracy (suggested vs actual)
  → Dashboard shows calibration metrics
  → Improves over time with more decisions

Cost Optimization: Groups similar findings (e.g., 50 findings → 10 API calls = 80% cost savings)

Safety: All suggestions require human review. No auto-apply.

See Pre-Triage Classification Design for detailed algorithm and architecture.

How Rule Recommendations Work

The system analyzes your triage decisions and suggests automation rules:

Week 1-2: Manual triage builds pattern data
  → System observes: CVE-2023-12345 consistently marked "irrelevant"

Week 3: Visit Rule Recommendations
  → Suggested rule: "CVE-2023-12345 → irrelevant"
  → Confidence: 95% (19/20 decisions)
  → Review generated code → Approve

Week 3+: Future automation
  → New CVE-2023-12345 findings auto-classified
  → No manual review needed

Result: Rules learn from your actual decisions, not assumptions.

See Rule Recommendations Design for detailed algorithm and pattern types.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors