Skip to content

zarathustracode/agenta

Repository files navigation

Agenta: Human-in-the-Loop ML Agent Framework

⚡ Competitive multi-agent framework (PPO vs. deterministic) where the best-scoring model wins
🏆 Human feedback loops for evaluation and selection
📡 Event streaming and replay via NATS JetStream
🛡️ Guardrails for operational safety


🚀 Architecture

Core Principle: Multiple agent models compete on the same tasks. The model with highest human ratings handles production traffic, regardless of whether it's ML-based or rule-based.

graph TD
    A[Environment: ERP/Market/Drone] --> B[NATS JetStream]
    B --> C[Model Competition Arena]
    
    C --> D1[PPO Agent PyTorch]
    C --> D2[Deterministic Rules]
    C --> D3[Future Models...]
    
    D1 --> E[Best Model Selection]
    D2 --> E
    D3 --> E
    
    E --> F[Risk Manager Validation]
    F --> G[Action Execution FastAPI]
    G --> B
    
    H[Human Dashboard] --> I[Feedback & Ratings]
    I --> B
    
    B --> J[SQLite Metadata]
    J --> K[Model Registry Hyperparams]
    
    B --> L[Parquet Export]
    L --> M[DuckDB Analytics]
    
    K --> N[Training Pipeline]
    N --> D1
Loading

Key Components:

  • NATS JetStream: Event streaming, replay, integration for training/monitoring
  • SQLite: Lightweight registry for runs, results, deployments, guardrails
  • FastAPI: Operational API for actions, status, deployments, guardrails
  • Streamlit: Dashboard for runs, models, deployments, and feedback

🧩 Components (MVP)

1. Environment

  • Wrap ERP, trading, or drone data as Python API clients.
  • Start with dummy data generator (asyncio loop with JSON events).

2. Perception Layer

  • Input validation + normalization with Pydantic.
  • Transform into Polars/Pandas DataFrames.

3. Reasoning Agent

  • First version: simple rule-based agent (if/else strategies).
  • Upgrade later: PyTorch (RL or LLM-based reasoning).

4. Tactical Execution

  • Planning: OR-tools or networkx.
  • Defines “how to execute” strategies.

5. Risk Manager

  • Configurable guardrails in YAML.
  • Pure Python checks (limits, thresholds).

6. Operational Layer

  • FastAPI endpoints for:
    • POST /act → execute actions in ERP/trading API
    • GET /status → query results

7. Message Bus

  • NATS JetStream for event-driven communication and replay.

8. Memory & Logging

  • Parquet (PyArrow) for structured logs.
  • JSON lines for raw debug traces.

9. Agent Registry

  • SQLite as lightweight registry (deployments, arena runs/results, guardrails).

10. Human-in-the-loop

  • Streamlit app with simplified views:
    • Runs (Training, Benchmark)
    • Models (browse/evaluate)
    • Deployments & Guardrails
    • Monitor & Feedback (actions + ratings)
    • Feedback currently stored in UI state; integrate SQLite persistence next.

11. Learning Engine

  • Simple retraining loop in Python:
    • Load feedback from DuckDB
    • Fine-tune or retrain models
    • Register new version

🛠️ Tech Stack (Python)

  • Core Services: FastAPI, asyncio
  • Data: Pandas, SQLite
  • Message Bus: NATS JetStream
  • ML: PyTorch + Gymnasium; Stable-Baselines3 optional
  • Dashboard: Streamlit

📦 Getting Started

# Create virtual environment (Windows)
python -m venv .venv

# Install dependencies into the venv using uv (recommended)
uv pip install -r requirements.txt -p .\.venv\Scripts\python.exe

Run API

uv run -p .\.venv\Scripts\python.exe -m uvicorn api.main:app --reload --host 0.0.0.0 --port 8000

Run Dashboard

uv run -p .\.venv\Scripts\python.exe -m streamlit run ui\dashboard_simple.py

🛤️ Roadmap

  • v1 (Current): Rule-based prototype + human feedback collection + benchmark registry (SQLite)
  • v2: PPO agents (PyTorch), train on feedback, deterministic critic for reward shaping
  • v3: NATS JetStream integration for full event streaming and replay
  • v4: Persist decisions/feedback in SQLite; improve dashboard analytics
  • v5: React dashboard (replace Streamlit)

🔎 Alpha status & direction

Alpha-stage with a clear direction:

  • Maintain deterministic and ML agents in parallel (baseline, critic, fallback)
  • Human-in-the-loop feedback drives training and selection
  • Prefer sparse methods for efficiency where applicable (PPO/SAC variants)

See also: docs/focus_breakthrough.md.

Directories & Files

api/ └── main.py # FastAPI entrypoint with /act and /status endpoints

core/ ├── environment.py # Dummy ERP/trading/drone data generator (asyncio) ├── perception.py # Pydantic validation + Pandas/Polars normalization ├── reasoning.py # Rule-based agent (if/else strategies) ├── tactical.py # OR-tools / networkx planning ├── risk_manager.py # Guardrails from YAML configs └── config.yaml # Risk thresholds, limits

bus/ └── nats_client.py # NATS JetStream client

memory/ └── registry.py # SQLite registry (arena runs/results, deployments, guardrails)

ui/ └── dashboard_simple.py # Streamlit app (simplified dashboard)

learning/ └── (agents/trainers) # PPO/SAC trainers and evaluation utilities

tests/ ├── test_api.py ├── test_reasoning.py ├── test_risk_manager.py └── test_end_to_end.py

docs/ ├── architecture.mmd └── trade_prompt.txt

requirements.txt # Dependencies README.md # Already provided


🌍 Use Cases

  • ERP optimization (packing, routing, scheduling)
  • Trading decision support (simulate, validate, human override)
  • Robotics/drones (path planning with human corrections)

📜 License

MIT (you own it, hack it, ship it).


🧪 IBKR Login Smoke Test

Run the interactive brokers login simulation to verify credentials, JetStream wiring, and SQLite persistence end-to-end:

uv run -p .\.venv\Scripts\python.exe scripts\ibkr_login_cli.py --prefix IBKR --latency 0.1
  • Provide credentials via IBKR_USERNAME, IBKR_PASSWORD, and optional IBKR_ACCOUNT environment variables (or adjust the --prefix).
  • Override the database path with --db-path or AGENTA_DB_PATH.
  • Point at an existing NATS cluster using --nats nats://localhost:4222.

The CLI prints the login session, JetStream acknowledgement, and the JSON payload emitted to the ENVIRONMENT_EVENTS stream.

About

Multi-agent ML framework: PPO vs deterministic competition, NATS JetStream, human feedback loops.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages