Skip to content

thepradip/HarnessAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  HarnessAgent

Run SQL agents, code assistants, or research bots on any LLM. Bring your own framework. Memory, safety, and failure recovery come included.

Python FastAPI Tests License Models


What is this?

Think about what actually happens when you run an AI agent in production. The LLM call needs to work. It needs to not cost $500 a day. It needs to not loop forever when the API is slow. It needs to remember context from three messages ago. It needs to not crash your app when one provider goes down.

HarnessAgent handles all of that. You write the task. It handles the rest.

What you see What happens under the hood
AI answers your question Picks the healthiest LLM, checks the budget, falls back if the provider fails
AI runs a SQL query Validates the input schema, checks safety rules, executes, logs the result
AI remembers past context Short-term in Redis, long-term in a vector DB
AI finds relevant info fast Graph RAG: entity extraction plus BFS traversal, 83% fewer tokens than naive vector search
AI gets better after failures Hermes loop: samples errors, proposes a prompt fix, evaluates it, applies if the score clears 70%
One provider goes down Circuit breaker opens after 5 failures, auto-recovers after 60 seconds

Architecture

graph TB
    subgraph CLIENT["๐ŸŒ Client Layer"]
        UI[Web / CLI / SDK]
        API[REST API POST /runs]
    end

    subgraph HARNESS["โš™๏ธ HarnessAgent Core"]
        RUNNER[AgentRunner lifecycle manager]

        subgraph AGENTS["๐Ÿค– Agent Layer"]
            BASE[BaseAgent run loop]
            SQL[SQLAgent]
            CODE[CodeAgent]
            LG[LangGraph Adapter]
            AG[AutoGen Adapter]
            CR[CrewAI Adapter]
        end

        subgraph MEMORY["๐Ÿง  Memory System"]
            STM[Short-Term Redis]
            LTM[Long-Term Qdrant / Chroma / Weaviate]
            GRAPH[Knowledge Graph NetworkX / Neo4j]
            RAG[Graph RAG Engine]
        end

        subgraph LLM["๐Ÿ”ฎ LLM Router"]
            ROUTER[Health-aware Circuit-broken Router]
            ANT[Claude]
            OAI[GPT-4o / GPT-5]
            LOCAL[vLLM / SGLang / llama.cpp]
        end

        subgraph TOOLS["๐Ÿ”ง Tool System"]
            REG[Tool Registry]
            MCP[MCP Servers]
            SQL2[SQL Tools]
            CODE2[Code Sandbox]
            FILE[File Tools]
        end

        subgraph SAFETY["๐Ÿ›ก๏ธ Safety"]
            GUARD[Guardrail Pipeline]
            HITL[Human-in-the-Loop]
            RATE[Rate Limiter]
            CB[Circuit Breaker]
        end
    end

    subgraph OBS["๐Ÿ“Š Observability"]
        MLFLOW[MLflow Traces]
        OTEL[OpenTelemetry]
        PROM[Prometheus]
        GRAFANA[Grafana Dashboard]
    end

    subgraph IMPROVE["๐Ÿ”„ Self-Improvement"]
        HERMES[Hermes Loop]
        ERR[Error Collector]
        PATCH[Patch Generator]
        EVAL[Evaluator]
    end

    UI --> API --> RUNNER --> BASE
    BASE --> LLM & MEMORY & TOOLS & SAFETY
    BASE --> OBS
    BASE -.->|failures| IMPROVE
    IMPROVE -.->|better prompts| AGENTS
    ROUTER --> ANT & OAI & LOCAL
    STM & LTM & GRAPH --> RAG
    REG --> MCP & SQL2 & CODE2 & FILE
Loading

Features

Feature Description
๐Ÿ”€ LLM Routing Claude, GPT-5, o4-mini, vLLM, SGLang, llama.cpp with automatic health-aware fallback
๐Ÿง  3-Tier Memory Redis (hot) then vector DB (warm) then knowledge graph (structured)
๐Ÿ“‰ Graph RAG 83% token reduction via multi-hop graph traversal vs naive vector search
๐Ÿ”Œ Framework Adapters LangGraph, AutoGen, CrewAI plug in without rewriting your agents
๐Ÿ›ก๏ธ Safety Pipeline PII redaction, injection detection, tool policy, loop detection, budget enforcement
๐Ÿ” Hermes Loop Analyzes failures, proposes prompt patches, evaluates them, applies if the score is good
๐Ÿ‘ค Human-in-the-Loop Agent pauses on risky actions, waits for approval, then continues or stops
โšก Circuit Breaker Opens after 5 failures, self-heals after 60 seconds
๐Ÿ’ฐ Cost Tracking Per-run, per-tenant USD cost with hard monthly caps
๐Ÿ”’ Code Sandbox Docker-isolated execution for code agents, 256MB limit, no network
๐Ÿ“Š Observability MLflow agent traces, OTel infra spans, Prometheus metrics, Grafana dashboards
๐Ÿงฉ MCP Connect any MCP server over stdio or SSE

LLM Support

Provider Models Tool Calling Prompt Caching Cost per 1M input tokens
๐ŸŸฃ Anthropic Sonnet 4.6, Haiku 4.5, Opus 4.7 Native Yes $0.25 to $15
๐ŸŸข OpenAI GPT-4o, GPT-4o-mini, GPT-5, GPT-5-mini, o1, o3, o4-mini Native Auto $0.15 to $75
๐Ÿ”ต vLLM Any HuggingFace model Native No Free (self-hosted)
๐ŸŸก SGLang Any HuggingFace model Native No Free (self-hosted)
๐Ÿ”ด llama.cpp Any GGUF quantized model ReAct text injection No Free (CPU / Metal)
๐ŸŸ  Ollama Any Ollama model Native No Free (local)

No GPU? llama.cpp runs on any Mac or CPU machine. Tool calling works through ReAct text injection when native function calling is not available.


Quick Start

# 1. Clone and install
git clone https://github.com/thepradip/HarnessAgent.git
cd HarnessAgent
poetry install

# 2. Configure (set at least one API key, or a local model URL)
cp .env.example .env

# 3. Start infrastructure (Redis, Qdrant, Neo4j, MLflow, Prometheus, Grafana)
docker compose up -d

# 4. Start the API and worker
make api      # terminal 1, FastAPI on port 8000
make worker   # terminal 2, async agent worker

# 5. Run your first agent
curl -X POST http://localhost:8000/runs \
  -H "Content-Type: application/json" \
  -d '{"agent_type": "sql", "task": "How many users signed up this week?"}'

# Watch steps in real time
curl http://localhost:8000/runs/{run_id}/steps

No API key? Use llama.cpp locally:

# Put a GGUF model in ./models/ then:
docker compose --profile local-cpu up -d llamacpp
# Add to .env: LLAMACPP_BASE_URL=http://localhost:8080

Use Cases

SQL Data Agent โ€” Ask business questions in plain English. The agent reads your schema into a knowledge graph, writes safe SELECT queries, and returns formatted results with PII redacted.

Code Assistant โ€” Give it a ticket or a spec. It reads your workspace, writes the code, lints it, runs it in a Docker sandbox, and fixes errors until it passes.

Research Agent โ€” Feed it documents or URLs. It ingests them into the vector store and knowledge graph, then answers multi-hop questions with citations.

Multi-Agent Pipeline โ€” Chain specialists through the planner: a researcher feeds a coder, which feeds a reviewer. All agents share the same memory pool.

Existing Framework โ€” Already using LangGraph, AutoGen, or CrewAI? Drop your graph or crew into the adapter. You get traces, cost tracking, circuit breaking, and safety without changing a line of your agent logic.


Project Structure

HarnessAgent/
โ”œโ”€โ”€ src/harness/
โ”‚   โ”œโ”€โ”€ agents/          # BaseAgent loop, SQLAgent, CodeAgent
โ”‚   โ”œโ”€โ”€ adapters/        # LangGraph, AutoGen, CrewAI wrappers
โ”‚   โ”œโ”€โ”€ api/             # FastAPI routes, JWT auth, SSE streaming
โ”‚   โ”œโ”€โ”€ core/            # Config, circuit breaker, cost tracker, rate limiter
โ”‚   โ”œโ”€โ”€ eval/            # Datasets, runners, scorers for Hermes evaluation
โ”‚   โ”œโ”€โ”€ filesystem/      # Isolated workspaces, Docker sandbox, checkpoints
โ”‚   โ”œโ”€โ”€ improvement/     # Hermes loop, error collector, patch generator
โ”‚   โ”œโ”€โ”€ ingestion/       # PDF/HTML/MD loaders, chunker, knowledge graph extraction
โ”‚   โ”œโ”€โ”€ llm/             # Anthropic, OpenAI, local providers, router, factory
โ”‚   โ”œโ”€โ”€ memory/          # Redis, vector backends, graph, Graph RAG engine
โ”‚   โ”œโ”€โ”€ messaging/       # Redis Streams inter-agent bus
โ”‚   โ”œโ”€โ”€ observability/   # MLflow tracer, OTel spans, Prometheus metrics, audit log
โ”‚   โ”œโ”€โ”€ orchestrator/    # AgentRunner, HITL manager, planner, scheduler
โ”‚   โ”œโ”€โ”€ prompts/         # Versioned prompt store, patch application
โ”‚   โ”œโ”€โ”€ safety/          # Guardrail pipeline factory and per-tenant policies
โ”‚   โ”œโ”€โ”€ tools/           # Tool registry, MCP client, SQL / code / file tools
โ”‚   โ””โ”€โ”€ workers/         # RQ agent worker, Hermes background scheduler
โ”œโ”€โ”€ configs/             # Model capabilities, MCP server definitions
โ”œโ”€โ”€ docs/                # Architecture diagrams and full reference docs
โ”œโ”€โ”€ infra/               # Prometheus scrape config, OTel collector, Grafana
โ”œโ”€โ”€ tests/               # 96 unit tests, 2 integration test suites
โ”œโ”€โ”€ docker-compose.yml   # Full infrastructure: Redis, Qdrant, Neo4j, MLflow, Grafana
โ”œโ”€โ”€ Dockerfile           # Multi-stage: api, worker, hermes targets
โ”œโ”€โ”€ Makefile             # install, test, lint, api, worker, hermes, docker-up/down
โ””โ”€โ”€ pyproject.toml       # Poetry dependencies and tooling

Tech Stack

Layer Technology Notes
API FastAPI + uvicorn Async by default, SSE for step streaming
LLM anthropic + openai SDKs Both support streaming and native tool calling
Short-term memory Redis Conversation history, pub/sub, task queue
Long-term memory Qdrant / ChromaDB / Weaviate Chroma for dev (zero infra), Qdrant for prod
Knowledge graph NetworkX / Neo4j NetworkX in-process for dev, Neo4j for production
Agent tracing MLflow LLM-native spans, experiment tracking, eval metrics
Infra tracing OpenTelemetry Vendor-neutral, exports to Jaeger or Tempo
Metrics Prometheus + Grafana 15 pre-defined metrics, pre-built dashboard
Safety Guardrail 3-stage pipeline: input, intermediate, output
Workers RQ + Redis Same Redis connection, no extra broker needed
Deployment Docker Compose Scale workers independently with replicas

Dashboards

Once docker compose up -d is running:

Dashboard URL Credentials
MLflow Traces http://localhost:5000 none
Grafana http://localhost:3000 admin / harness_admin
Prometheus http://localhost:9090 none
Qdrant UI http://localhost:6333/dashboard none
Neo4j Browser http://localhost:7474 neo4j / harnesspassword

Configuration

Everything goes in .env. Copy .env.example and set what you need.

# Cloud LLMs
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
OPENAI_MODELS=gpt-4o-mini          # comma-separated, e.g. gpt-4o-mini,gpt-4o

# Local LLMs (no API key needed)
VLLM_BASE_URL=http://localhost:8000
LLAMACPP_BASE_URL=http://localhost:8080

# Memory backends (chroma is default, zero setup)
VECTOR_BACKEND=chroma              # chroma | qdrant | weaviate
GRAPH_BACKEND=networkx             # networkx | neo4j

# Hermes self-improvement
HERMES_AUTO_APPLY=false            # keep this off until you trust it
HERMES_PATCH_SCORE_THRESHOLD=0.7

# Cost and safety
COST_BUDGET_USD_PER_TENANT=100.0
RATE_LIMIT_RPM=60

Full reference: docs/guides/CONFIGURATION.md


Testing

# Run unit tests
PYTHONPATH=src python3 -m pytest tests/unit/

# Run integration tests (needs SQLite, no Docker required)
PYTHONPATH=src python3 -m pytest tests/integration/

# With coverage
PYTHONPATH=src python3 -m pytest tests/ --cov=src/harness --cov-report=term-missing

Current: 96 unit tests passing, 0 failures.


Documentation


Future Scope

Planned improvements focused on making HarnessAgent more efficient at scale.

Area Feature Expected Impact
Token Efficiency Adaptive context compression โ€” summarize stale history with a small model before appending to new prompts 40โ€“60% token reduction on long sessions
Cost Optimization Semantic response caching โ€” skip LLM calls when a sufficiently similar query was answered recently Up to 30% cost savings on repetitive workloads
Cost Optimization Batch inference mode โ€” route low-urgency tasks through Anthropic/OpenAI Batch APIs at 50% list price 50% cost reduction for async pipelines
Routing ML-based predictive model selection โ€” learn per-task-type patterns to auto-select the cheapest sufficient model Eliminates over-provisioned Opus/GPT-5 usage
Memory Differential re-indexing โ€” re-embed only modified chunks on ingestion, not the full corpus Faster incremental ingestion at scale
Parallelism Streaming pipeline overlap โ€” start tool execution while the LLM is still generating, cut latency per step Lower end-to-end agent step latency
Multi-Agent Shared tool execution pool โ€” deduplicate identical tool calls across concurrent agents in the same run Fewer redundant DB and API round-trips
Hermes Cost-aware patch targeting โ€” rank prompt candidates by token spend, optimize the most expensive patterns first Better ROI from self-improvement cycles
Scheduling Fair-share multi-tenant scheduler โ€” priority queues and resource caps to prevent noisy-neighbor budget spikes Predictable per-tenant cost and latency
Extensibility Plugin SDK โ€” first-class API for registering custom LLM providers, memory backends, and tool namespaces Faster integration of new models and datastores
Observability Automated cost anomaly alerts โ€” Prometheus rule + Grafana annotation when a run exceeds per-step cost threshold Catch runaway agents before they exhaust budgets
Safety Streaming guardrail evaluation โ€” evaluate guardrail rules token-by-token instead of waiting for full output Interrupt unsafe responses earlier, reduce wasted tokens

Contributing

Fork, branch off main, write tests for anything new, open a PR.

git checkout -b feat/your-feature
PYTHONPATH=src python3 -m pytest tests/unit/
ruff check src/ tests/

Things that would be useful: new LLM provider adapters, additional vector backends, more tool integrations, Kubernetes Helm chart, and examples for specific use cases.


License

MIT. See LICENSE.


Architecture ย |ย  Quick Start ย |ย  Components ย |ย  Issues

About

Production-grade multi-agent AI harness. Run any LLM, any framework. Memory, safety, self-improvement built in.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages