Fast manual triage interface that builds training data for future AI automation.
Phase 1 (MVP): Smart manual triage tool
- Rule-based pre-filtering - Auto-classify obvious cases (60-70%)
- AI pre-triage suggestions - AWS Bedrock suggests decisions based on historical patterns
- Fast review interface - Keyboard-driven triage with pre-filled suggestions
- Training data capture - Structured labels + rationales for ML
Phase 2 (After 500+ labels): Fine-tune DeBERTa classifier
- Train custom model on your triage decisions
- Auto-classify 80-90% of findings
- Manual review only for low-confidence cases
Goal: Make manual triage 10x faster while building ML dataset.
AWS Inspector Export
↓
Rule-Based Pre-Filter (auto-classify 60-70%)
↓
┌────────────────┐
│ Needs Review │ (30-40% of findings)
└────────────────┘
↓
AI Pre-Triage (AWS Bedrock suggests decisions)
↓
Fast Triage UI (pre-filled suggestions, keyboard shortcuts)
↓
SQLite Database (decisions + rationales + accuracy tracking)
↓
Export Training Data (JSON for DeBERTa)
AWS Inspector Export
↓
Rule-Based Filter (60-70%)
↓
Fine-tuned DeBERTa Classifier (20-30%)
↓
Manual Review Only (5-10% low-confidence)
Current Phase: Phase 1 - Manual Triage Tool Timeline: 1 week to working interface Next Steps:
- Export sample Inspector findings
- Implement parser and rule engine
- Build triage UI
- Label 500+ findings to build training dataset
With Justfile (recommended):
# Install just: https://just.systems
brew install just # macOS
# or: cargo install just
# See all commands
just
# Start development (production-like with Docker)
just up
# Or start native Python (faster iteration)
just setup
just devManual setup:
# Setup
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Load sample data
python scripts/load_sample_data.py
# Optional: Load with AI-ready triage history (~70% pre-triaged)
python scripts/load_sample_data.py --with-triage-history
# Start triage interface
python web/triage_ui.py
# Open http://localhost:5000
# Use keyboard shortcuts to quickly triage findingsSee LOCAL_DEV.md for complete local development guide and SAMPLE_DATA.md for sample data documentation.
# Navigate to Terraform directory
cd deployment/terraform
# Configure variables
cp terraform.tfvars.example terraform.tfvars
vim terraform.tfvars
# Deploy infrastructure
terraform init
terraform plan
terraform apply
# Build and push Docker image (see outputs)
# Deploy takes ~5-10 minutesSee deployment/terraform/README.md for complete deployment guide.
castalia/
├── README.md # This file
├── LOCAL_DEV.md # ⭐ Local development guide
├── INSPECTOR_DATA_EXPORT.md # ⭐ Export AWS Inspector data for local dev
├── JUSTFILE_COMMANDS.md # Quick reference for just commands
├── justfile # ⭐ Task automation (just commands)
├── AWS_DEPLOYMENT.md # AWS deployment guide
├── DEPLOYMENT_SUMMARY.md # Quick deployment reference
├── MODEL_SELECTION.md # Analysis of model choices
├── COST_PROJECTIONS.md # Cost analysis
├── COST_OPTIMIZATION_ANALYSIS.md # ⭐ Architecture alternatives (Lambda vs Fargate)
├── Dockerfile # Production container
├── docker-compose.yml # Local development
├── requirements.txt
├── .env.example # Environment variables template
├── data/
│ └── sample_inspector_findings.json # ⭐ Sample data for local dev
├── scripts/
│ └── load_sample_data.py # ⭐ Load sample findings into DB
├── src/
│ ├── inspector_sync.py # AWS Inspector integration
│ ├── pre_triage_classifier.py # AI-assisted pre-triage with AWS Bedrock ⭐
│ ├── rule_recommender.py # Pattern detection & rule generation
│ ├── parser.py # Parse Inspector exports
│ ├── rules.py # Rule-based pre-filters
│ ├── train_deberta.py # Phase 2: Train classifier (TODO)
│ └── integrations/ # Ticketing system plugins
│ ├── base.py # Plugin interface
│ ├── jira_plugin.py # Jira integration with OAuth 2.0
│ └── plugin_manager.py # Plugin factory
├── web/
│ ├── triage_ui.py # Fast triage interface
│ └── templates/
│ ├── triage.html # Main triage UI
│ ├── pre_triage_stats.html # Pre-triage performance dashboard ⭐
│ └── rule_recommendations.html # Rule approval UI
├── output/
│ ├── triage.db # SQLite database
│ └── training_data.json # Export for DeBERTa
├── deployment/
│ ├── terraform/ # Infrastructure as Code ⭐
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── README.md # Terraform deployment guide
│ │ └── modules/
│ │ ├── vpc/
│ │ ├── ecs/
│ │ ├── efs/
│ │ ├── iam/
│ │ └── alb/
│ └── ecs/ # Bash scripts (alternative)
│ ├── deploy.sh
│ └── setup-infrastructure.sh
└── tests/
├── test_rules.py
└── test_import.py
⭐ = New files for production-like local development
- 10x faster triage - Keyboard shortcuts, templates, bulk actions
- 500+ labeled examples - High-quality training dataset in 2-4 weeks
- Rule coverage - 60%+ of findings auto-classified
- Consistent labels - Structured decision format
- DeBERTa accuracy - 85%+ agreement with analyst decisions
- 95% automation - Only 5% of findings need manual review
- Sub-second inference - <100ms per finding
- Zero marginal cost - Self-hosted classifier
- Python 3.9+: Core processing
- Flask: Fast triage UI
- SQLite: Decision storage and analytics
- Pandas: Data parsing and export
- Hugging Face Transformers: DeBERTa fine-tuning
- PyTorch: Model training
- AWS Bedrock (optional): Fallback for edge cases
- 🤖 AWS Bedrock integration - Claude 3.5 Sonnet suggests decisions based on historical patterns
- 💡 Pre-filled suggestions - Decision and rationale auto-populated for speed
- 🎯 Context-aware - Uses VPC, IAM, tags, CVSS adjustments for accuracy
- 💰 Cost-optimized - Groups similar findings (80-90% cost reduction)
- 📊 Accuracy tracking - Monitors suggestion vs actual decisions
- 🔒 Human-in-the-loop - All suggestions require analyst review
- ⌨️ Keyboard shortcuts - Press 1/2/3 for decisions, Enter to submit
- 📋 Rationale templates - Common patterns pre-filled
- 🔄 Bulk operations - Apply decision to similar findings (CVE + resource type matching for safety)
- 📊 Progress tracking - Real-time completion percentage
- 💾 Auto-save - SQLite database prevents data loss
- ✅ Development dependencies - Auto-mark as irrelevant
- ✅ Already patched versions - Version comparison
- ✅ Environment mitigations - Lambda read-only FS, network isolation
- ✅ Extensible - Easy to add new patterns
- 🤖 Auto-learning - Recommends new rules based on triage patterns
- 🔄 Inspector API sync - Pull findings directly from AWS Inspector
- 📝 Suppression rules - Push triage decisions back to Inspector
- ⚡ Lambda deployment - Cost-optimized serverless ($2-8/month)
- 💾 EFS persistence - SQLite database on EFS with automatic backups
- 🔒 Daily backups - AWS Backup with configurable retention (35 days default)
- 🛡️ Disaster recovery - Point-in-time restore capability
- 🧠 Pattern detection - Learns from your triage decisions (4 pattern types)
- 📊 Confidence scoring - Statistical validation (85%+ agreement, 5+ samples)
- 🎯 Auto-generated code - Ready-to-use Python rules
- ✅ Human-in-the-loop - Review and approve before activation
- 🎫 Create Jira tickets - From vulnerability findings with one click
- 🔗 Audit trail - Links back to AWS Inspector findings
- 📝 Remediation tracking - Document remediation steps, track who/when
- 📦 Bulk operations - Create individual or single bulk tickets
- 📊 Real-time logs - Docked panel showing API interactions for debugging
- 🔌 Plugin architecture - Extensible to other ticketing systems
- 📤 JSON format - Compatible with Hugging Face datasets
- 🎯 Structured labels - decision + rationale + confidence
- 📈 Analytics - Rule accuracy, time tracking, decision breakdown
- Local Development - Production-like local dev with Justfile
- Sample Data - Sample findings and triage history for testing AI features
- AWS Inspector Data Export - Export production findings for local development
- Deployment Summary - Quick start for AWS deployment
- Terraform Guide - Infrastructure as Code (recommended)
- AWS Deployment Guide - Detailed ECS setup with bash scripts
- Model Selection Analysis - Why manual triage first, then DeBERTa
- Cost Projections - ROI analysis and pricing scenarios
- Cost Optimization Analysis - AWS architecture alternatives for low-traffic deployments (Lambda vs Fargate)
- Bulk Apply Design - Resource-type-aware bulk triage operations with safety features
- Rule Recommendations Design - ML-driven pattern detection and rule automation
- Pre-Triage Quick Start - ⭐ Get AI suggestions in 5 minutes
- Pre-Triage Classification Design - AI-assisted triage with AWS Bedrock and historical patterns
Manual review is unavoidable for security decisions. Instead of paying for AI suggestions you have to verify anyway, this tool:
- Makes manual review 10x faster (keyboard UI, templates, bulk ops)
- Captures high-quality training data while you work
- Learns from your decisions - Recommends new automation rules
- Builds toward fully automated AI (DeBERTa) once you have labels
- Costs $0 for Phase 1 (no API fees)
See MODEL_SELECTION.md for detailed analysis.
The system uses AI to suggest triage decisions based on historical patterns:
Day 1: Sync findings from AWS Inspector
→ python src/inspector_sync.py --pull
Day 1: Run pre-triage classification
→ python src/pre_triage_classifier.py --days 7
→ Groups similar findings (package + resource type)
→ Uses AWS Bedrock Claude to analyze with historical context
→ Generates suggestions for all findings in group
Day 1: Analyst reviews in UI
→ 💡 AI Suggestion: IRRELEVANT (85% confidence)
→ Decision and rationale pre-filled
→ Analyst confirms or modifies
→ 3-5x faster than writing from scratch
Ongoing: System learns
→ Tracks accuracy (suggested vs actual)
→ Dashboard shows calibration metrics
→ Improves over time with more decisions
Cost Optimization: Groups similar findings (e.g., 50 findings → 10 API calls = 80% cost savings)
Safety: All suggestions require human review. No auto-apply.
See Pre-Triage Classification Design for detailed algorithm and architecture.
The system analyzes your triage decisions and suggests automation rules:
Week 1-2: Manual triage builds pattern data
→ System observes: CVE-2023-12345 consistently marked "irrelevant"
Week 3: Visit Rule Recommendations
→ Suggested rule: "CVE-2023-12345 → irrelevant"
→ Confidence: 95% (19/20 decisions)
→ Review generated code → Approve
Week 3+: Future automation
→ New CVE-2023-12345 findings auto-classified
→ No manual review needed
Result: Rules learn from your actual decisions, not assumptions.
See Rule Recommendations Design for detailed algorithm and pattern types.
This project is licensed under the MIT License - see the LICENSE file for details.