Castalia - AWS Inspector Vulnerability Triage Assistant

Fast manual triage interface that builds training data for future AI automation.

Overview

Phase 1 (MVP): Smart manual triage tool

Rule-based pre-filtering - Auto-classify obvious cases (60-70%)
AI pre-triage suggestions - AWS Bedrock suggests decisions based on historical patterns
Fast review interface - Keyboard-driven triage with pre-filled suggestions
Training data capture - Structured labels + rationales for ML

Phase 2 (After 500+ labels): Fine-tune DeBERTa classifier

Train custom model on your triage decisions
Auto-classify 80-90% of findings
Manual review only for low-confidence cases

Goal: Make manual triage 10x faster while building ML dataset.

Architecture

Phase 1: Manual Triage (Week 1-4)

AWS Inspector Export
        ↓
Rule-Based Pre-Filter (auto-classify 60-70%)
        ↓
    ┌────────────────┐
    │ Needs Review   │ (30-40% of findings)
    └────────────────┘
        ↓
AI Pre-Triage (AWS Bedrock suggests decisions)
        ↓
Fast Triage UI (pre-filled suggestions, keyboard shortcuts)
        ↓
SQLite Database (decisions + rationales + accuracy tracking)
        ↓
Export Training Data (JSON for DeBERTa)

Phase 2: AI-Assisted (After 500+ labels)

AWS Inspector Export
        ↓
Rule-Based Filter (60-70%)
        ↓
Fine-tuned DeBERTa Classifier (20-30%)
        ↓
Manual Review Only (5-10% low-confidence)

Project Status

Current Phase: Phase 1 - Manual Triage Tool Timeline: 1 week to working interface Next Steps:

Export sample Inspector findings
Implement parser and rule engine
Build triage UI
Label 500+ findings to build training dataset

Quick Start

Local Development

With Justfile (recommended):

# Install just: https://just.systems
brew install just  # macOS
# or: cargo install just

# See all commands
just

# Start development (production-like with Docker)
just up

# Or start native Python (faster iteration)
just setup
just dev

Manual setup:

# Setup
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Load sample data
python scripts/load_sample_data.py

# Optional: Load with AI-ready triage history (~70% pre-triaged)
python scripts/load_sample_data.py --with-triage-history

# Start triage interface
python web/triage_ui.py

# Open http://localhost:5000
# Use keyboard shortcuts to quickly triage findings

See LOCAL_DEV.md for complete local development guide and SAMPLE_DATA.md for sample data documentation.

Production Deployment (AWS ECS)

# Navigate to Terraform directory
cd deployment/terraform

# Configure variables
cp terraform.tfvars.example terraform.tfvars
vim terraform.tfvars

# Deploy infrastructure
terraform init
terraform plan
terraform apply

# Build and push Docker image (see outputs)
# Deploy takes ~5-10 minutes

See deployment/terraform/README.md for complete deployment guide.

Project Structure

castalia/
├── README.md                       # This file
├── LOCAL_DEV.md                    # ⭐ Local development guide
├── INSPECTOR_DATA_EXPORT.md        # ⭐ Export AWS Inspector data for local dev
├── JUSTFILE_COMMANDS.md            # Quick reference for just commands
├── justfile                        # ⭐ Task automation (just commands)
├── AWS_DEPLOYMENT.md               # AWS deployment guide
├── DEPLOYMENT_SUMMARY.md           # Quick deployment reference
├── MODEL_SELECTION.md              # Analysis of model choices
├── COST_PROJECTIONS.md             # Cost analysis
├── COST_OPTIMIZATION_ANALYSIS.md   # ⭐ Architecture alternatives (Lambda vs Fargate)
├── Dockerfile                      # Production container
├── docker-compose.yml              # Local development
├── requirements.txt
├── .env.example                    # Environment variables template
├── data/
│   └── sample_inspector_findings.json  # ⭐ Sample data for local dev
├── scripts/
│   └── load_sample_data.py         # ⭐ Load sample findings into DB
├── src/
│   ├── inspector_sync.py           # AWS Inspector integration
│   ├── pre_triage_classifier.py    # AI-assisted pre-triage with AWS Bedrock ⭐
│   ├── rule_recommender.py         # Pattern detection & rule generation
│   ├── parser.py                   # Parse Inspector exports
│   ├── rules.py                    # Rule-based pre-filters
│   ├── train_deberta.py            # Phase 2: Train classifier (TODO)
│   └── integrations/               # Ticketing system plugins
│       ├── base.py                 # Plugin interface
│       ├── jira_plugin.py          # Jira integration with OAuth 2.0
│       └── plugin_manager.py       # Plugin factory
├── web/
│   ├── triage_ui.py                # Fast triage interface
│   └── templates/
│       ├── triage.html             # Main triage UI
│       ├── pre_triage_stats.html   # Pre-triage performance dashboard ⭐
│       └── rule_recommendations.html # Rule approval UI
├── output/
│   ├── triage.db                   # SQLite database
│   └── training_data.json          # Export for DeBERTa
├── deployment/
│   ├── terraform/                  # Infrastructure as Code ⭐
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── outputs.tf
│   │   ├── README.md               # Terraform deployment guide
│   │   └── modules/
│   │       ├── vpc/
│   │       ├── ecs/
│   │       ├── efs/
│   │       ├── iam/
│   │       └── alb/
│   └── ecs/                        # Bash scripts (alternative)
│       ├── deploy.sh
│       └── setup-infrastructure.sh
└── tests/
    ├── test_rules.py
    └── test_import.py

⭐ = New files for production-like local development

Success Metrics

Phase 1 (Manual Triage Tool)

10x faster triage - Keyboard shortcuts, templates, bulk actions
500+ labeled examples - High-quality training dataset in 2-4 weeks
Rule coverage - 60%+ of findings auto-classified
Consistent labels - Structured decision format

Phase 2 (AI-Assisted)

DeBERTa accuracy - 85%+ agreement with analyst decisions
95% automation - Only 5% of findings need manual review
Sub-second inference - <100ms per finding
Zero marginal cost - Self-hosted classifier

Technology Stack

Phase 1

Python 3.9+: Core processing
Flask: Fast triage UI
SQLite: Decision storage and analytics
Pandas: Data parsing and export

Phase 2 (Future)

Hugging Face Transformers: DeBERTa fine-tuning
PyTorch: Model training
AWS Bedrock (optional): Fallback for edge cases

Key Features

AI Pre-Triage (NEW)

🤖 AWS Bedrock integration - Claude 3.5 Sonnet suggests decisions based on historical patterns
💡 Pre-filled suggestions - Decision and rationale auto-populated for speed
🎯 Context-aware - Uses VPC, IAM, tags, CVSS adjustments for accuracy
💰 Cost-optimized - Groups similar findings (80-90% cost reduction)
📊 Accuracy tracking - Monitors suggestion vs actual decisions
🔒 Human-in-the-loop - All suggestions require analyst review

Triage UI

⌨️ Keyboard shortcuts - Press 1/2/3 for decisions, Enter to submit
📋 Rationale templates - Common patterns pre-filled
🔄 Bulk operations - Apply decision to similar findings (CVE + resource type matching for safety)
📊 Progress tracking - Real-time completion percentage
💾 Auto-save - SQLite database prevents data loss

Rule Engine

✅ Development dependencies - Auto-mark as irrelevant
✅ Already patched versions - Version comparison
✅ Environment mitigations - Lambda read-only FS, network isolation
✅ Extensible - Easy to add new patterns
🤖 Auto-learning - Recommends new rules based on triage patterns

AWS Integration

🔄 Inspector API sync - Pull findings directly from AWS Inspector
📝 Suppression rules - Push triage decisions back to Inspector
⚡ Lambda deployment - Cost-optimized serverless ($2-8/month)
💾 EFS persistence - SQLite database on EFS with automatic backups
🔒 Daily backups - AWS Backup with configurable retention (35 days default)
🛡️ Disaster recovery - Point-in-time restore capability

Rule Recommendations

🧠 Pattern detection - Learns from your triage decisions (4 pattern types)
📊 Confidence scoring - Statistical validation (85%+ agreement, 5+ samples)
🎯 Auto-generated code - Ready-to-use Python rules
✅ Human-in-the-loop - Review and approve before activation

Jira Ticketing Integration

🎫 Create Jira tickets - From vulnerability findings with one click
🔗 Audit trail - Links back to AWS Inspector findings
📝 Remediation tracking - Document remediation steps, track who/when
📦 Bulk operations - Create individual or single bulk tickets
📊 Real-time logs - Docked panel showing API interactions for debugging
🔌 Plugin architecture - Extensible to other ticketing systems

Training Data Export

📤 JSON format - Compatible with Hugging Face datasets
🎯 Structured labels - decision + rationale + confidence
📈 Analytics - Rule accuracy, time tracking, decision breakdown

Documentation

Getting Started

Local Development - Production-like local dev with Justfile
Sample Data - Sample findings and triage history for testing AI features
AWS Inspector Data Export - Export production findings for local development
Deployment Summary - Quick start for AWS deployment

Deployment

Terraform Guide - Infrastructure as Code (recommended)
AWS Deployment Guide - Detailed ECS setup with bash scripts

Architecture & Design

Model Selection Analysis - Why manual triage first, then DeBERTa
Cost Projections - ROI analysis and pricing scenarios
Cost Optimization Analysis - AWS architecture alternatives for low-traffic deployments (Lambda vs Fargate)

Features & Design

Bulk Apply Design - Resource-type-aware bulk triage operations with safety features
Rule Recommendations Design - ML-driven pattern detection and rule automation
Pre-Triage Quick Start - ⭐ Get AI suggestions in 5 minutes
Pre-Triage Classification Design - AI-assisted triage with AWS Bedrock and historical patterns

Why This Approach?

Manual review is unavoidable for security decisions. Instead of paying for AI suggestions you have to verify anyway, this tool:

Makes manual review 10x faster (keyboard UI, templates, bulk ops)
Captures high-quality training data while you work
Learns from your decisions - Recommends new automation rules
Builds toward fully automated AI (DeBERTa) once you have labels
Costs $0 for Phase 1 (no API fees)

See MODEL_SELECTION.md for detailed analysis.

How Pre-Triage Works

The system uses AI to suggest triage decisions based on historical patterns:

Day 1: Sync findings from AWS Inspector
  → python src/inspector_sync.py --pull

Day 1: Run pre-triage classification
  → python src/pre_triage_classifier.py --days 7
  → Groups similar findings (package + resource type)
  → Uses AWS Bedrock Claude to analyze with historical context
  → Generates suggestions for all findings in group

Day 1: Analyst reviews in UI
  → 💡 AI Suggestion: IRRELEVANT (85% confidence)
  → Decision and rationale pre-filled
  → Analyst confirms or modifies
  → 3-5x faster than writing from scratch

Ongoing: System learns
  → Tracks accuracy (suggested vs actual)
  → Dashboard shows calibration metrics
  → Improves over time with more decisions

Cost Optimization: Groups similar findings (e.g., 50 findings → 10 API calls = 80% cost savings)

Safety: All suggestions require human review. No auto-apply.

See Pre-Triage Classification Design for detailed algorithm and architecture.

How Rule Recommendations Work

The system analyzes your triage decisions and suggests automation rules:

Week 1-2: Manual triage builds pattern data
  → System observes: CVE-2023-12345 consistently marked "irrelevant"

Week 3: Visit Rule Recommendations
  → Suggested rule: "CVE-2023-12345 → irrelevant"
  → Confidence: 95% (19/20 decisions)
  → Review generated code → Approve

Week 3+: Future automation
  → New CVE-2023-12345 findings auto-classified
  → No manual review needed

Result: Rules learn from your actual decisions, not assumptions.

See Rule Recommendations Design for detailed algorithm and pattern types.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.op/plugins		.op/plugins
backend		backend
data		data
deployment		deployment
frontend		frontend
lambda		lambda
output		output
scripts		scripts
src		src
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
AWS_DEPLOYMENT.md		AWS_DEPLOYMENT.md
BULK_APPLY_DESIGN.md		BULK_APPLY_DESIGN.md
CLAUDE.md		CLAUDE.md
COST_OPTIMIZATION_ANALYSIS.md		COST_OPTIMIZATION_ANALYSIS.md
COST_PROJECTIONS.md		COST_PROJECTIONS.md
DEPLOYMENT_SUMMARY.md		DEPLOYMENT_SUMMARY.md
Dockerfile		Dockerfile
Dockerfile.lambda		Dockerfile.lambda
INSPECTOR_DATA_EXPORT.md		INSPECTOR_DATA_EXPORT.md
JUSTFILE_COMMANDS.md		JUSTFILE_COMMANDS.md
LICENSE		LICENSE
LOCAL_DEV.md		LOCAL_DEV.md
MODEL_SELECTION.md		MODEL_SELECTION.md
PRE_TRIAGE_DESIGN.md		PRE_TRIAGE_DESIGN.md
PRE_TRIAGE_QUICKSTART.md		PRE_TRIAGE_QUICKSTART.md
README.md		README.md
RULE_RECOMMENDATIONS_DESIGN.md		RULE_RECOMMENDATIONS_DESIGN.md
SETUP.md		SETUP.md
docker-compose.yml		docker-compose.yml
justfile		justfile
requirements.txt		requirements.txt
sync_handler.py		sync_handler.py

Folders and files

Latest commit

History

Repository files navigation

Castalia - AWS Inspector Vulnerability Triage Assistant

Overview

Architecture

Phase 1: Manual Triage (Week 1-4)

Phase 2: AI-Assisted (After 500+ labels)

Project Status

Quick Start

Local Development

Production Deployment (AWS ECS)

Project Structure

Success Metrics

Phase 1 (Manual Triage Tool)

Phase 2 (AI-Assisted)

Technology Stack

Phase 1

Phase 2 (Future)

Key Features

AI Pre-Triage (NEW)

Triage UI

Rule Engine

AWS Integration

Rule Recommendations

Jira Ticketing Integration

Training Data Export

Documentation

Getting Started

Deployment

Architecture & Design

Features & Design

Why This Approach?

How Pre-Triage Works

How Rule Recommendations Work

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages