ThIOClaw — Vulnerability Investigation Harness

An engineering-first, local-first harness for transparent, repeatable, and version-controlled LLM-powered vulnerability investigation. Built for security teams who refuse to accept black-box AI verdicts.

Why ThIOClaw?

LLMs are being embedded into detection & response workflows at an accelerating pace — but most teams adopt them as opaque black boxes. You can't inspect the reasoning, reproduce a verdict, version-control the logic, or evaluate whether the agent actually improved your security posture.

ThIOClaw is built on five engineering principles:

Principle	How ThIOClaw Delivers
Transparency	Every LLM tool call, evidence lookup, and verdict is logged in structured JSONL and OpenTelemetry spans. Full reconstruction of any investigation.
Repeatability	Tier 1 signal scoring is deterministic pandas math — same input, same output, every time. The LLM reasons on top of reproducible foundations.
Configurability	Signal rules, weights, verdict thresholds, and exploit chain descriptions are YAML files. The agent's prompt and tools are Python source code. No hidden dashboards.
Version Control	Detection logic, agent behavior, and signal rules all live in Git. Changes produce clean diffs. Review, approve, and roll back with standard engineering workflows.
Measurability	Run the same investigation with different models, prompts, or tool configs. Compare Tier 1 baselines against Tier 2 LLM verdicts. Prometheus metrics track agent performance over time.

This project is for security teams who want engineering control and observability into their journey into LLM-powered SecOps.

macOS Setup & Quick Start

1. Setup the Environment

git clone https://github.com/tej-nik/ThIOClaw.git
cd ThIOClaw
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Configure the LLM (Local or Cloud)

ThIOClaw uses LiteLLM for its control plane, meaning you can plug in any LLM provider (Ollama, Anthropic, OpenAI, AWS Bedrock, etc.) without changing code.

Option A: Local (Ollama)

For complete privacy, run the LLM locally on your Mac:

ollama serve                    # Start the server
ollama pull llama3.1:8b         # Pull the default model

Option B: Direct cloud (Anthropic / OpenAI)

Export your provider's API key and set the model via THIOCLAW_MODEL:

# Anthropic Claude 3.5 Sonnet
export ANTHROPIC_API_KEY="sk-ant-..."
export THIOCLAW_MODEL="claude-3-5-sonnet-20241022"

# OpenAI GPT-4o
export OPENAI_API_KEY="sk-proj-..."
export THIOCLAW_MODEL="gpt-4o"

Option C: AWS Bedrock

Auth uses the boto3 default credential chain (env vars, ~/.aws/credentials profile, or IAM role). Region must be set explicitly:

export AWS_REGION_NAME="us-east-1"
export AWS_PROFILE="default"          # or rely on env-var credentials
export THIOCLAW_MODEL="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"

Option D: Google Vertex AI

Works for both Claude on Vertex and Gemini. Requires a service-account key file:

export VERTEXAI_PROJECT="your-gcp-project"
export VERTEXAI_LOCATION="us-central1"
export GOOGLE_APPLICATION_CREDENTIALS="/abs/path/to/service-account.json"
export THIOCLAW_MODEL="vertex_ai/gemini-1.5-pro"
# or: export THIOCLAW_MODEL="vertex_ai/claude-3-5-sonnet@20240620"

Provider plumbing lives in scripts/thioclaw_agent/providers.py. To add a new provider, register a ProviderResolution factory there. See .env.example for the full set of supported environment variables.

Selecting the agent framework

ThIOClaw ships two parallel implementations of the Tier 2 agent loop, selectable per run:

# Raw LiteLLM tool-calling loop (default)
export THIOCLAW_FRAMEWORK=litellm-direct

# Strands SDK loop (AgentCore-native, multi-agent primitives, MCP support)
export THIOCLAW_FRAMEWORK=strands

Both implementations share the same provider routing and verdict contract. The framework axis in openclaw-bench/models.yaml lets you compare them side-by-side per model.

3. Run an Investigation

# Single investigation cycle using local sample data
python -m harness.orchestrator --raw-telemetry local --once

# Investigate a specific CVE
python -m harness.orchestrator --cve CVE-2026-31431 --once

# Continuous monitoring loop (default: every 300s)
python -m harness.orchestrator --raw-telemetry local

4. What to Expect

During execution, the agent may pause and ask for your approval:

[ThIOClaw Agent] Proposing Query Execution:
Rationale: The existing Q6 staging query missed /var/tmp paths...
Performance Impact: Medium — scanning ~50,000 file events
Query: SELECT * FROM file_events WHERE path LIKE '/var/tmp/%'

Approve execution? (y/N): _

This is the Human-In-The-Loop (HITL) gate — the agent articulates why it needs the query and what the cost is, and you decide whether to approve.

Telemetry Sources

Two orthogonal axes determine where telemetry comes from and how it's shaped:

1. Data location (--raw-telemetry) — where the harness reads events from:

Flag	Source	Credentials
`--raw-telemetry local`	`data/events.json`	None
`--raw-telemetry s3`	S3 bucket via `data/s3_manifest.json`	`~/.aws/credentials` named profile

2. Collector format (telemetry.event_source in harness.yaml) — what event-stream format the harness ingests:

Value	Collector	Notes
`osquery` (default)	osquery `process_events`, `socket_events`, `kernel_module_events`, `file_events`, etc.	Bundled sample data (`data/sample_events.json`) ships in this format. All six Q1-Q6 reference queries target it.
`auditd`	Linux kernel auditd via `auditctl` rules + `ausearch`/`auparse`	Mirror coverage via SigmaHQ rules at `rules-emerging-threats/2026/Exploits/CVE-2026-31431/`. Validation runbook: `runbooks/CVE-2026-31431_sigma_validation.md`.
`both`	Union of the two	For environments running both collectors. The data plane is source-agnostic — it scores whatever DataFrame is fed in.

Per-signal source support is declared in signals/<CVE-ID>.yaml via supported_sources: on each rule.

S3 Setup

Edit data/s3_manifest.json with your bucket, region, and key paths
Configure your ~/.aws/credentials with the named profile
Set aws.profile_name in harness.yaml if using a non-default profile

Project Structure

ThIOClaw/
├── CLAUDE.md                          # Comprehensive project guide & design rationale
├── README.md                          # This file
├── harness.yaml                       # Main harness configuration
├── targets.yaml                       # CVE investigation targets
├── requirements.txt                   # Python dependencies
│
├── data_plane/                        # DATA PLANE — Modular investigation scripts
│   └── cve_2026_31431.py              #   Deterministic pandas analysis (Q1–Q6)
│
├── scripts/                           # CONTROL PLANE — LLM Agent
│   ├── thioclaw.py                    #   CLI entry point
│   └── thioclaw_agent/
│       ├── agent.py                   #   Agentic loop (Ollama + tool calling + HITL)
│       ├── prompts.py                 #   System prompt
│       └── tools.py                   #   Tool definitions + implementations
│
├── signals/                           # Signal rule definitions (YAML)
│   └── CVE-2026-31431.yaml            #   Rules, weights, verdict logic, LLM context
│
├── queries/                           # Reference Detection Queries (e.g., SQL, KQL)
│   └── CVE-2026-31431/               #   Example query files
│
├── harness/                           # Orchestrator engine
│   ├── orchestrator.py                #   CLI + run loop + concurrent dispatch
│   ├── config.py                      #   Typed dataclasses from YAML
│   └── ingester.py                    #   CSV → SQLite inventory ingestion
│
├── observability/                     # Instrumentation
│   ├── logger.py                      #   Thread-safe structured JSONL logger
│   ├── metrics.py                     #   Prometheus metrics via OpenTelemetry
│   └── traces.py                      #   OTel distributed tracing
│
├── data/                              # Sample data (replace with real telemetry)
│   ├── sample_events.json             #   9 events simulating a full exploit chain
│   ├── sample_inventory.csv           #   8 workloads with mixed statuses
│   └── s3_manifest.json               #   S3 config template (no secrets)
│
├── findings/                          # Output: YAML findings + JSONL log (gitignored)
├── docs/                              # Output: per-run Markdown + HTML reports
├── logs/                              # Output: Structured JSONL logs (gitignored)
└── tests/                             # Unit tests (pytest)

Architecture

graph TD
    A["Inventory Telemetry (e.g. auditd/EDR)"] -->|Ingester| B["inventory.db (SQLite)"]
    C["Event Telemetry (Local/S3/API)"] -->|Data Plane| D["data_plane/<cve_id>.py"]
    B -->|vulnerable workloads| D
    D -->|"Deterministic queries (e.g. pandas)"| E["Tier 1: Deterministic Signal Scoring"]
    E -->|tier1.json| F["Tier 2: ThIOClaw LLM Agent"]
    F <-->|"HITL Approval Gates"| G(("Analyst Terminal"))
    F --> H["findings/*.yaml"]
    F --> I["docs/*.md + *.html"]

    style E fill:#ff8c00,color:#fff
    style F fill:#4488ff,color:#fff
    style G fill:#22aa44,color:#fff

How It Works

Ingest — Host inventory telemetry is loaded into SQLite. Workloads matching trigger_assessments (e.g., vulnerable_or_not_confirmed_fixed) are selected for investigation.
Tier 1 (Data Plane) — A modular Python script runs deterministic queries against raw telemetry events. Each query checks for a specific exploitation indicator. Signals are scored using configurable weights to produce a deterministic verdict: exploited, suspicious, benign, or inconclusive.
Tier 2 (Control Plane) — The ThIOClaw LLM agent receives the Tier 1 results and the CVE's theoretical exploit chain. It correlates evidence, requests deeper telemetry inspection, and can propose new queries — but must get analyst approval via the terminal before executing them.
Output — Findings are persisted as YAML, Markdown, and HTML per run. All events are instrumented with OpenTelemetry.

Bundled Example: CVE-2026-31431

ThIOClaw ships with a complete investigation for CVE-2026-31431 (Linux kernel algif_aead local privilege escalation) including sample telemetry that simulates a full exploit chain.

Query	Signal	Weight	Tier
Q1	`algif_aead` module loaded in inventory	0.3	Suspicious
Q2	Unprivileged `AF_ALG` socket opens	0.5	Suspicious
Q3	UID escalation after `AF_ALG` open (primary)	1.0	Exploited
Q4	Root shell from non-root parent	0.9	Exploited
Q5	`algif_aead` module load events	0.4	Suspicious
Q6	Exploit staging in `/tmp`, `/dev/shm`, memfd	0.6	Suspicious

Verdict logic: exploited if any exploited-tier signal fires AND total weight ≥ 1.0. suspicious if total ≥ 0.5. benign if total = 0. inconclusive otherwise.

Auditd-shaped coverage — the same exploit chain is also covered by three SigmaHQ rules contributed in SigmaHQ/sigma#6052 (AF_ALG socket creation, algif_aead module load, splice on setuid path). Reproduce and validate against live audit telemetry with runbooks/CVE-2026-31431_sigma_validation.md.

Observability

Layer	Implementation	Endpoint / Location
Structured logs	Thread-safe JSONL writer	`logs/agent_runs.jsonl`
Metrics	OpenTelemetry → Prometheus	`http://localhost:9090/metrics`
Traces	OpenTelemetry spans	stdout (default) or OTLP gRPC endpoint

Investigating Any Vulnerability

ThIOClaw is designed to be generic. Add a new CVE target in three steps:

Define the target — Add an entry to targets.yaml with the CVE-ID, data plane script module path, and signals file.
Configure signals — Create signals/<CVE-ID>.yaml with weighted rules and an agent_context block describing the exploit chain for the LLM.
Write the data plane — Create data_plane/<cve_id>.py implementing the run_investigation() function with pandas queries specific to the vulnerability's telemetry fingerprint.

See CLAUDE.md for detailed instructions and design rationale.

Running Tests

pytest tests/ -v
pytest tests/ -v --cov=harness --cov=observability --cov-report=term-missing

License

See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ThIOClaw — Vulnerability Investigation Harness

Why ThIOClaw?

macOS Setup & Quick Start

1. Setup the Environment

2. Configure the LLM (Local or Cloud)

Option A: Local (Ollama)

Option B: Direct cloud (Anthropic / OpenAI)

Option C: AWS Bedrock

Option D: Google Vertex AI

Selecting the agent framework

3. Run an Investigation

4. What to Expect

Telemetry Sources

S3 Setup

Project Structure

Architecture

How It Works

Bundled Example: CVE-2026-31431

Observability

Investigating Any Vulnerability

Running Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
data_plane		data_plane
docs		docs
findings		findings
harness		harness
logs		logs
observability		observability
queries/CVE-2026-31431		queries/CVE-2026-31431
runbooks		runbooks
scripts		scripts
signals		signals
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
harness.yaml		harness.yaml
requirements.txt		requirements.txt
targets.yaml		targets.yaml

Folders and files

Latest commit

History

Repository files navigation

ThIOClaw — Vulnerability Investigation Harness

Why ThIOClaw?

macOS Setup & Quick Start

1. Setup the Environment

2. Configure the LLM (Local or Cloud)

Option A: Local (Ollama)

Option B: Direct cloud (Anthropic / OpenAI)

Option C: AWS Bedrock

Option D: Google Vertex AI

Selecting the agent framework

3. Run an Investigation

4. What to Expect

Telemetry Sources

S3 Setup

Project Structure

Architecture

How It Works

Bundled Example: CVE-2026-31431

Observability

Investigating Any Vulnerability

Running Tests

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages