Skip to content

veezbo/aegis

Repository files navigation

Aegis

Agent Execution Governance with Incremental Scoring

Aegis is an open-source sensitivity-gating framework for Claude Code. It scores every tool action using three independent signals — the agent's own self-assessment, the skill/MCP author's declared risk, and context-aware historical user preferences — and requires confirmation when the combined score exceeds a threshold.

Who is this for?

Skill Builders

If you build Claude Code skills or MCP servers with real-world side effects (sending emails, executing trades, modifying infrastructure), using the Aegis framework gives your users granular, adaptive permission control without them having to choose between "confirm everything" and "approve everything."

Users

If you use Claude Code skills or MCP servers that take actions on your behalf, Aegis learns from your contextual feedback over time — approving routine actions faster while continuing to check in on the ones you find sensitive. You can also take skills you're already using and make them Aegis-compatible with the provided script!

Quick Start

git clone https://github.com/veezbo/aegis.git
cd aegis
./install.sh   # requires uv (https://docs.astral.sh/uv/)

This registers two hooks in ~/.claude/settings.json, installs the /flag feedback skill, and optionally configures the user preference signal if ANTHROPIC_API_KEY is available.

To uninstall: ./uninstall.sh

Demo

Try Aegis with a mock email management skill in 2 minutes:

cd demo
./start.sh

Then open Claude Code in the demo/ directory and try things like "any emails from alice?" or "reply to the cfo letting him know ill have the data by eod". Low-risk actions auto-approve; high-risk ones trigger confirmation. If you give Claude Code feedback, you can see it start auto-approving more or requiring confirmation more. See demo/README.md for details.

How It Works

Three independent signals are computed for every tool call and summed into a composite score:

Signal Source What it captures
S1 — Agent Self-Assessment [SENS:...] in Bash description Claude's own risk rating for this specific call
S2 — Skill-Declared Sensitivity action-sensitivity in SKILL.md The skill author's risk rating per action
S3 — Historical User Preferences Embed + KNN history search + Haiku Personalized, contextual scoring from your past decisions

Decision logic:

  • Composite ≥ threshold (default 1.0) → require confirmation
  • Otherwise → auto-approve
Example S1 S2 S3 Composite Decision Why
Search emails 0.0 0.0 0.0 0.0 allow Read-only, no risk from any signal
Read portfolio 0.1 0.1 0.0 0.2 allow Low-risk read; agent and skill agree
Reply to email 0.2 0.5 0.3 1.0 ask Skill and agent say this reply is moderate risk; user has previously indicated long replies as somewhat sensitive
Reply to email 0.1 0.5 0.1 0.7 ask Skill says emails are always somewhat risky; user has previously indicated similar quick replies to friends should be auto-approved
Transfer funds 0.1 0.6 0.1 0.8 ask Agent thinks it's routine, skill marks transfers as risky, but user has expressed auto-approving transfers to bills checking account
Transfer funds 0.1 0.6 0.8 1.5 ask Agent thinks it's routine, skills marks transfers as risky, but user has expressed requiring confirmation for transfers away from savings account

S3 is skipped when no skill is active (plain Read/Edit/Bash calls) or when S1+S2 already resolve the decision. It requires ANTHROPIC_API_KEY; without it, S3 returns 0.0 and scoring relies on S1+S2 only.

flowchart TD
    TC["Tool call"] --> PLAN{"Plan mode?"}
    PLAN -->|yes| ALLOW_PLAN["allow"]
    PLAN -->|no| S["Compute S1 + S2 + S3"]
    S --> T{"composite >= threshold?"}
    T -->|yes| ASK["ASK"]
    T -->|no| ALLOW["ALLOW"]
    ASK --> LOG["Log + index"]
    ALLOW --> LOG
    LOG --> POST["PostToolUse: feedback context"]
Loading

Making Your Skill Aegis-Compatible

Run the converter on any existing SKILL.md:

./aegis_convert.py path/to/SKILL.md

This analyzes your skill, adds an action-sensitivity map to the frontmatter, and appends a standard Aegis section that tells Claude how to tag its calls. See examples/skills/deploy-manager/ for a before/after comparison on an infrastructure management skill.

You can also do it manually:

1. Add action-sensitivity to your frontmatter:

---
name: email-manager
description: Manage emails via API
action-sensitivity:
  search_emails: 0.0
  read_email: 0.1
  reply_email: 0.5
  send_email: 0.6
  delete_email: 0.6
  block_sender: 0.5
default-sensitivity: 0.5
---

These are custom fields — Claude Code ignores them, and Aegis reads them directly.

2. Add the Aegis section to the skill body (copy-paste as-is):

## Aegis

For every Bash tool call, prefix your `description` field with a sensitivity tag:

`[SENS:<score>|<action_name>|<reason>] <normal description>`

- `score`: 0.0-1.0, how risky this specific call is given its actual parameters
- `action_name`: the matching action name from the list above
- `reason`: what makes this instance more or less risky

Example: `[SENS:0.7|send_email|sending to external recipient] Send partnership proposal`

See demo/.claude/skills/email-manager/SKILL.md for a full working example.

Feedback Loop

Aegis learns from two feedback mechanisms:

Borderline auto-approvals (composite 0.7–0.99): The PostToolUse hook injects context so Claude can mention /flag if the user later indicates the decision was wrong.

/flag command: Report wrong decisions directly. Select which recent actions were incorrectly handled and explain why. This feedback is recorded in history and retrieved by S3 for future scoring of similar actions.

> /flag

Configuration

~/.claude/aegis.toml overrides config/default.toml:

[gate]
ask_threshold = 1.0       # composite sum that triggers "ask"

[signal3]
enabled = true            # requires ANTHROPIC_API_KEY
api_key = "sk-ant-..."
model = "claude-haiku-4-5"

[embeddings]
db_path = "~/.claude/aegis-embeddings.db"
retrieval_k = 10

[history]
path = "~/.claude/aegis-history.jsonl"
max_entries = 10000

Project Structure

aegis/
  hooks/
    pre_tool_use.py         # Signals 1-3, scoring, history logging
    post_tool_use.py        # Feedback context injection
    flag_cli.py             # CLI for /flag skill
    lib/                    # Scoring, skill parsing, embeddings, LLM assessor
  skills/
    flag/SKILL.md           # /flag feedback command
  examples/skills/
    example-trading/        # Demo skill with action-sensitivity
    deploy-manager/         # Before/after aegis_convert.py conversion
  demo/                     # Try Aegis with a mock email skill
  config/default.toml      # Default thresholds
  aegis_convert.py         # Skill migration script
  install.sh / uninstall.sh
  tests/

License

MIT

About

Agent Execution Governance with Incremental Scoring — sensitivity gating system for Claude Code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors