Narrow 100+ skills down to the right 5 β deterministic triggers, fuzzy matching, semantic search, and rank fusion. Zero core modification.
Hermes Agent loads every installed skill into the system prompt as a flat list. When you have 50+ skills:
- The LLM picks wrong β overlapping descriptions confuse selection
- You burn tokens β 5,000β10,000 tokens per turn just for the skill list
- Rarely-used skills become invisible β buried at the bottom of a long list
Eagle Eye is a zero-invasive plugin that acts as an intelligent pre-filter. Before each API call, it narrows the skill list to the top-5 most relevant candidates and injects them as a lightweight hint.
User Query
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β L1: Hard Triggers β
β Deterministic keyword matching (3-tier) β
β Hit β Inject full SKILL.md instantly β
β Miss β β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β L2: FTS5 BM25 (text similarity) β
β L3: Synonym Dict (domain knowledge) β
β L4: Dense Embedding (semantic similarity) β
β L5: RRF Fusion (rank combination) β
β β
β Score β₯ threshold β Inject skill hints β
β Score < threshold β Silent (LLM decides) β
βββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
LLM Final Decision
Not every query needs a skill. "What should I eat for dinner?" is best answered by the LLM's general knowledge β not by loading a restaurant-finder skill. Eagle Eye's confidence gate prevents forced matches.
L1 (hard triggers) is 100% precise β if the user types "debug", the debugging skill loads instantly with no probability involved. L2βL5 handles the long tail where fuzzy, semantic matching adds value.
L2βL5 returns candidates, not conclusions. The LLM retains final authority to load a skill, combine multiple skills, or ignore the hint entirely. The retrieval system doesn't override the LLM's judgment.
If sentence-transformers isn't installed, L4 degrades gracefully β L1+L2+L3 still work. If jieba is missing, L1+L4 still work. The system never crashes; it always falls back to a working subset.
# 1. Clone
git clone https://github.com/willingning-coder/eagle-eye.git
cd eagle-eye
# 2. Generate config from your local skill library
python scripts/generate_config.py
# 3. Review and customize
# - Edit _HARD_TRIGGERS in src/skill_retriever.py
# - Edit src/skill_synonyms.yaml
# (See PROMPTS.md for LLM-assisted generation)
# 4. Install
bash scripts/install.sh
# 5. Restart Hermes
hermes gateway restartEagle Eye ships with minimal example data. The real power comes from generating your own configuration based on your installed skills.
# Scan your skills and generate template configs
python scripts/generate_config.py
# Or just list what was found
python scripts/generate_config.py --scan-only| Component | File | What to do |
|---|---|---|
| Hard Triggers | src/skill_retriever.py β _HARD_TRIGGERS |
Add (keyword, skill-name) tuples. More specific first. |
| Synonym Dictionary | src/skill_synonyms.yaml |
Map natural language terms to skills. 5β15 per skill. |
| Embedding Model | HERMES_EMBEDDING_MODEL env var |
Swap to a different sentence-transformers model. |
Use the prompts in PROMPTS_EN.md or PROMPTS_CN.md with any LLM to generate high-quality triggers and synonyms from your skill list.
| Variable | Default | Description |
|---|---|---|
HERMES_DISABLE_SKILL_RETRIEVAL |
(unset) | Set 1 to disable entirely |
HERMES_SKILL_RETRIEVAL_TOP_K |
5 |
Number of skills to return |
HERMES_EMBEDDING_MODEL |
shibing624/text2vec-base-chinese-paraphrase |
Embedding model for L4 |
| Metric | Value |
|---|---|
| L1 real-world accuracy | ~90% |
| Functional test accuracy | 100% |
| Query latency (cached) | ~20ms |
| First-call latency | ~11s (model loading) |
| Memory footprint | ~403MB (with embedding) |
See ARCHITECTURE.md for a deep technical dive covering:
- Layer-by-layer algorithm analysis with code
- RRF fusion math and why it beats score normalization
- Confidence gate design philosophy
- Failure mode matrix and degradation hierarchy
- Latency and memory profiling
eagle-eye/
βββ src/
β βββ skill_retriever.py # Core 5-layer retrieval engine
β βββ skill_synonyms.yaml # Synonym dictionary (template)
β βββ plugin.py # Hermes plugin (pre_llm_call hook)
β βββ plugin.yaml # Plugin manifest
βββ scripts/
β βββ generate_config.py # Auto-generate config from your skills
β βββ install.sh # One-command installation
βββ templates/
β βββ hard_triggers.example.py # Trigger format reference
βββ README.md # This file (English)
βββ README_CN.md # δΈζζζ‘£
βββ ARCHITECTURE.md # Technical deep dive
βββ PROMPTS_EN.md # LLM prompts for config generation (English)
βββ PROMPTS_CN.md # LLM prompts for config generation (δΈζ)
βββ CHANGELOG.md # Version history
βββ LICENSE # MIT
| Package | Required? | Purpose |
|---|---|---|
jieba |
Yes | Chinese tokenization for L2βL3 |
sentence-transformers |
Optional | Dense embedding for L4 (graceful fallback if missing) |
numpy |
Optional | Numerical operations for L4 |
Contributions are welcome! Areas where help is especially valuable:
- Trigger/synonym quality: Share your
_HARD_TRIGGERSandskill_synonyms.yamlconfigurations - Embedding model benchmarks: Test alternative models and report accuracy
- Multi-language support: Extend triggers and synonyms beyond Chinese/English
- Bug reports: Edge cases in fuzzy matching, false positives/negatives