An engineering-first, local-first harness for transparent, repeatable, and version-controlled LLM-powered vulnerability investigation. Built for security teams who refuse to accept black-box AI verdicts.
LLMs are being embedded into detection & response workflows at an accelerating pace — but most teams adopt them as opaque black boxes. You can't inspect the reasoning, reproduce a verdict, version-control the logic, or evaluate whether the agent actually improved your security posture.
ThIOClaw is built on five engineering principles:
| Principle | How ThIOClaw Delivers |
|---|---|
| Transparency | Every LLM tool call, evidence lookup, and verdict is logged in structured JSONL and OpenTelemetry spans. Full reconstruction of any investigation. |
| Repeatability | Tier 1 signal scoring is deterministic pandas math — same input, same output, every time. The LLM reasons on top of reproducible foundations. |
| Configurability | Signal rules, weights, verdict thresholds, and exploit chain descriptions are YAML files. The agent's prompt and tools are Python source code. No hidden dashboards. |
| Version Control | Detection logic, agent behavior, and signal rules all live in Git. Changes produce clean diffs. Review, approve, and roll back with standard engineering workflows. |
| Measurability | Run the same investigation with different models, prompts, or tool configs. Compare Tier 1 baselines against Tier 2 LLM verdicts. Prometheus metrics track agent performance over time. |
This project is for security teams who want engineering control and observability into their journey into LLM-powered SecOps.
git clone https://github.com/tej-nik/ThIOClaw.git
cd ThIOClaw
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtThIOClaw uses LiteLLM for its control plane, meaning you can plug in any LLM provider (Ollama, Anthropic, OpenAI, AWS Bedrock, etc.) without changing code.
For complete privacy, run the LLM locally on your Mac:
ollama serve # Start the server
ollama pull llama3.1:8b # Pull the default modelExport your provider's API key and set the model via THIOCLAW_MODEL:
# Anthropic Claude 3.5 Sonnet
export ANTHROPIC_API_KEY="sk-ant-..."
export THIOCLAW_MODEL="claude-3-5-sonnet-20241022"
# OpenAI GPT-4o
export OPENAI_API_KEY="sk-proj-..."
export THIOCLAW_MODEL="gpt-4o"Auth uses the boto3 default credential chain (env vars, ~/.aws/credentials profile, or IAM role). Region must be set explicitly:
export AWS_REGION_NAME="us-east-1"
export AWS_PROFILE="default" # or rely on env-var credentials
export THIOCLAW_MODEL="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"Works for both Claude on Vertex and Gemini. Requires a service-account key file:
export VERTEXAI_PROJECT="your-gcp-project"
export VERTEXAI_LOCATION="us-central1"
export GOOGLE_APPLICATION_CREDENTIALS="/abs/path/to/service-account.json"
export THIOCLAW_MODEL="vertex_ai/gemini-1.5-pro"
# or: export THIOCLAW_MODEL="vertex_ai/claude-3-5-sonnet@20240620"Provider plumbing lives in
scripts/thioclaw_agent/providers.py. To add a new provider, register aProviderResolutionfactory there. See.env.examplefor the full set of supported environment variables.
ThIOClaw ships two parallel implementations of the Tier 2 agent loop, selectable per run:
# Raw LiteLLM tool-calling loop (default)
export THIOCLAW_FRAMEWORK=litellm-direct
# Strands SDK loop (AgentCore-native, multi-agent primitives, MCP support)
export THIOCLAW_FRAMEWORK=strandsBoth implementations share the same provider routing and verdict contract. The framework axis in openclaw-bench/models.yaml lets you compare them side-by-side per model.
# Single investigation cycle using local sample data
python -m harness.orchestrator --raw-telemetry local --once
# Investigate a specific CVE
python -m harness.orchestrator --cve CVE-2026-31431 --once
# Continuous monitoring loop (default: every 300s)
python -m harness.orchestrator --raw-telemetry localDuring execution, the agent may pause and ask for your approval:
[ThIOClaw Agent] Proposing Query Execution:
Rationale: The existing Q6 staging query missed /var/tmp paths...
Performance Impact: Medium — scanning ~50,000 file events
Query: SELECT * FROM file_events WHERE path LIKE '/var/tmp/%'
Approve execution? (y/N): _
This is the Human-In-The-Loop (HITL) gate — the agent articulates why it needs the query and what the cost is, and you decide whether to approve.
Two orthogonal axes determine where telemetry comes from and how it's shaped:
1. Data location (--raw-telemetry) — where the harness reads events from:
| Flag | Source | Credentials |
|---|---|---|
--raw-telemetry local |
data/events.json |
None |
--raw-telemetry s3 |
S3 bucket via data/s3_manifest.json |
~/.aws/credentials named profile |
2. Collector format (telemetry.event_source in harness.yaml) — what event-stream format the harness ingests:
| Value | Collector | Notes |
|---|---|---|
osquery (default) |
osquery process_events, socket_events, kernel_module_events, file_events, etc. |
Bundled sample data (data/sample_events.json) ships in this format. All six Q1-Q6 reference queries target it. |
auditd |
Linux kernel auditd via auditctl rules + ausearch/auparse |
Mirror coverage via SigmaHQ rules at rules-emerging-threats/2026/Exploits/CVE-2026-31431/. Validation runbook: runbooks/CVE-2026-31431_sigma_validation.md. |
both |
Union of the two | For environments running both collectors. The data plane is source-agnostic — it scores whatever DataFrame is fed in. |
Per-signal source support is declared in signals/<CVE-ID>.yaml via supported_sources: on each rule.
- Edit
data/s3_manifest.jsonwith your bucket, region, and key paths - Configure your
~/.aws/credentialswith the named profile - Set
aws.profile_nameinharness.yamlif using a non-default profile
ThIOClaw/
├── CLAUDE.md # Comprehensive project guide & design rationale
├── README.md # This file
├── harness.yaml # Main harness configuration
├── targets.yaml # CVE investigation targets
├── requirements.txt # Python dependencies
│
├── data_plane/ # DATA PLANE — Modular investigation scripts
│ └── cve_2026_31431.py # Deterministic pandas analysis (Q1–Q6)
│
├── scripts/ # CONTROL PLANE — LLM Agent
│ ├── thioclaw.py # CLI entry point
│ └── thioclaw_agent/
│ ├── agent.py # Agentic loop (Ollama + tool calling + HITL)
│ ├── prompts.py # System prompt
│ └── tools.py # Tool definitions + implementations
│
├── signals/ # Signal rule definitions (YAML)
│ └── CVE-2026-31431.yaml # Rules, weights, verdict logic, LLM context
│
├── queries/ # Reference Detection Queries (e.g., SQL, KQL)
│ └── CVE-2026-31431/ # Example query files
│
├── harness/ # Orchestrator engine
│ ├── orchestrator.py # CLI + run loop + concurrent dispatch
│ ├── config.py # Typed dataclasses from YAML
│ └── ingester.py # CSV → SQLite inventory ingestion
│
├── observability/ # Instrumentation
│ ├── logger.py # Thread-safe structured JSONL logger
│ ├── metrics.py # Prometheus metrics via OpenTelemetry
│ └── traces.py # OTel distributed tracing
│
├── data/ # Sample data (replace with real telemetry)
│ ├── sample_events.json # 9 events simulating a full exploit chain
│ ├── sample_inventory.csv # 8 workloads with mixed statuses
│ └── s3_manifest.json # S3 config template (no secrets)
│
├── findings/ # Output: YAML findings + JSONL log (gitignored)
├── docs/ # Output: per-run Markdown + HTML reports
├── logs/ # Output: Structured JSONL logs (gitignored)
└── tests/ # Unit tests (pytest)
graph TD
A["Inventory Telemetry (e.g. auditd/EDR)"] -->|Ingester| B["inventory.db (SQLite)"]
C["Event Telemetry (Local/S3/API)"] -->|Data Plane| D["data_plane/<cve_id>.py"]
B -->|vulnerable workloads| D
D -->|"Deterministic queries (e.g. pandas)"| E["Tier 1: Deterministic Signal Scoring"]
E -->|tier1.json| F["Tier 2: ThIOClaw LLM Agent"]
F <-->|"HITL Approval Gates"| G(("Analyst Terminal"))
F --> H["findings/*.yaml"]
F --> I["docs/*.md + *.html"]
style E fill:#ff8c00,color:#fff
style F fill:#4488ff,color:#fff
style G fill:#22aa44,color:#fff
- Ingest — Host inventory telemetry is loaded into SQLite. Workloads matching
trigger_assessments(e.g.,vulnerable_or_not_confirmed_fixed) are selected for investigation. - Tier 1 (Data Plane) — A modular Python script runs deterministic queries against raw telemetry events. Each query checks for a specific exploitation indicator. Signals are scored using configurable weights to produce a deterministic verdict:
exploited,suspicious,benign, orinconclusive. - Tier 2 (Control Plane) — The ThIOClaw LLM agent receives the Tier 1 results and the CVE's theoretical exploit chain. It correlates evidence, requests deeper telemetry inspection, and can propose new queries — but must get analyst approval via the terminal before executing them.
- Output — Findings are persisted as YAML, Markdown, and HTML per run. All events are instrumented with OpenTelemetry.
ThIOClaw ships with a complete investigation for CVE-2026-31431 (Linux kernel algif_aead local privilege escalation) including sample telemetry that simulates a full exploit chain.
| Query | Signal | Weight | Tier |
|---|---|---|---|
| Q1 | algif_aead module loaded in inventory |
0.3 | Suspicious |
| Q2 | Unprivileged AF_ALG socket opens |
0.5 | Suspicious |
| Q3 | UID escalation after AF_ALG open (primary) |
1.0 | Exploited |
| Q4 | Root shell from non-root parent | 0.9 | Exploited |
| Q5 | algif_aead module load events |
0.4 | Suspicious |
| Q6 | Exploit staging in /tmp, /dev/shm, memfd |
0.6 | Suspicious |
Verdict logic: exploited if any exploited-tier signal fires AND total weight ≥ 1.0. suspicious if total ≥ 0.5. benign if total = 0. inconclusive otherwise.
Auditd-shaped coverage — the same exploit chain is also covered by three SigmaHQ rules contributed in SigmaHQ/sigma#6052 (AF_ALG socket creation, algif_aead module load, splice on setuid path). Reproduce and validate against live audit telemetry with runbooks/CVE-2026-31431_sigma_validation.md.
| Layer | Implementation | Endpoint / Location |
|---|---|---|
| Structured logs | Thread-safe JSONL writer | logs/agent_runs.jsonl |
| Metrics | OpenTelemetry → Prometheus | http://localhost:9090/metrics |
| Traces | OpenTelemetry spans | stdout (default) or OTLP gRPC endpoint |
ThIOClaw is designed to be generic. Add a new CVE target in three steps:
- Define the target — Add an entry to
targets.yamlwith the CVE-ID, data plane script module path, and signals file. - Configure signals — Create
signals/<CVE-ID>.yamlwith weighted rules and anagent_contextblock describing the exploit chain for the LLM. - Write the data plane — Create
data_plane/<cve_id>.pyimplementing therun_investigation()function with pandas queries specific to the vulnerability's telemetry fingerprint.
See CLAUDE.md for detailed instructions and design rationale.
pytest tests/ -v
pytest tests/ -v --cov=harness --cov=observability --cov-report=term-missingSee LICENSE for details.