Skip to content

ywc668/agentroll

Repository files navigation

🎯 AgentRoll

Kubernetes-native progressive delivery for AI agents in production

The missing layer between agent development frameworks and reliable production operations.

License Stars Issues Status


The Problem

AI agent frameworks (LangGraph, CrewAI, OpenAI Agents SDK) help you build agents. Cloud platforms help you run them. But nothing helps you safely ship changes to agents already in production.

Today, most teams deploy agents the same way they deploy microservices β€” docker push then pray. But agents are fundamentally different:

  • 4 layers change simultaneously: prompt, model version, tool configurations, and memory β€” a 2-word prompt change can break production
  • Non-deterministic behavior: the same input can trigger different tool calls and reasoning paths every time
  • No meaningful unit tests: traditional pass/fail assertions don't work when outputs vary per run
  • Unpredictable costs: one agent task can consume 10x-100x more tokens than another
  • Rollback is structurally harder: stateful agents modify external systems (databases, APIs, emails) that can't be simply reverted

The result? 70% of regulated enterprises rebuild their agent stack every 3 months. Teams manually eyeball evaluation results. Nobody knows if the new version is actually better until users complain.

The Solution

AgentRoll brings evaluation-gated progressive delivery to AI agent deployments on Kubernetes. Think of it as Argo Rollouts meets agent-aware intelligence.

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  New Agent   β”‚
                    β”‚  Version     β”‚
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                    β”‚  5% Canary  │──── Eval: hallucination rate, tool success,
                    β”‚             β”‚     cost-per-task, latency
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                           β”‚ βœ… Pass
                    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                    β”‚ 20% Canary  │──── Eval: same metrics, larger sample
                    β”‚             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                           β”‚ βœ… Pass
                    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                    β”‚ 50% Canary  │──── Eval: cost comparison vs baseline
                    β”‚             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                           β”‚ βœ… Pass
                    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                    β”‚ 100% Stable β”‚
                    β”‚             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

            ❌ Any step fails β†’ automatic rollback

Key Features

⚠️ AgentRoll is in early alpha. We're building in public. Features below represent our roadmap β€” check the status column for current availability.

Feature Description Status
AgentDeployment CRD Declare your agent's complete deployable config as a Kubernetes custom resource πŸ”¨ Building
Evaluation-Gated Canary Progressive rollout with agent-quality gates (hallucination rate, tool success rate, cost-per-task) πŸ”¨ Building
Argo Rollouts Integration Built on top of Argo Rollouts β€” not reinventing the wheel πŸ”¨ Building
Agent AnalysisTemplates Pre-built quality metric templates for common agent patterns πŸ“‹ Planned
Langfuse Integration Out-of-the-box agent trace data as canary analysis source πŸ“‹ Planned
OTel Observability Auto-injected OpenTelemetry sidecar for agent tracing πŸ“‹ Planned
Grafana Dashboards Pre-built dashboards for agent-specific metrics πŸ“‹ Planned
Composite Versioning Track prompt + model + tools + memory as a single versioned entity πŸ“‹ Planned
Cost-Aware Scaling KEDA-based autoscaling with queue-depth metrics and token budgets πŸ—“οΈ Future
MCP Tool Lifecycle Manage MCP tool server versions alongside agents πŸ—“οΈ Future
Multi-Agent Coordination Coordinated canary deployments across dependent agents πŸ—“οΈ Future

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      User Interface                        β”‚
β”‚           kubectl  /  Helm  /  ArgoCD  /  CI/CD            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   AgentRoll Operator                        β”‚
β”‚                                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚    CRD       β”‚  β”‚   Rollout    β”‚  β”‚    Analysis      β”‚ β”‚
β”‚  β”‚  Controller  β”‚  β”‚   Manager    β”‚  β”‚    Engine        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚         β”‚                 β”‚                    β”‚           β”‚
β”‚         β–Ό                 β–Ό                    β–Ό           β”‚
β”‚  AgentDeployment    Argo Rollouts       Langfuse / OTel   β”‚
β”‚  CRD               (rollout engine)     (data sources)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Kubernetes Cluster                        β”‚
β”‚                                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Agent Pod  β”‚  β”‚ Agent Pod  β”‚  β”‚  OTel Sidecar        β”‚ β”‚
β”‚  β”‚ v1 (stable)β”‚  β”‚ v2 (canary)β”‚  β”‚  (per pod)           β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Prometheus  /  Grafana  /  Langfuse                 β”‚ β”‚
β”‚  β”‚  (agent metrics collection & visualization)          β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Start

🚧 Coming soon. AgentRoll is currently in active development.

# Install AgentRoll operator (coming soon)
helm repo add agentroll https://ywc668.github.io/agentroll
helm install agentroll agentroll/agentroll-operator

# Deploy your first agent with progressive delivery
kubectl apply -f examples/basic-agent-deployment.yaml

AgentDeployment CRD (Preview)

apiVersion: agentroll.dev/v1alpha1
kind: AgentDeployment
metadata:
  name: customer-support-agent
spec:
  container:
    image: myregistry/support-agent:v2.1.0
    env:
      - name: LLM_PROVIDER
        value: anthropic
      - name: LLM_MODEL
        value: claude-sonnet-4-20250514

  agentMeta:
    promptVersion: "abc123"
    modelVersion: "claude-sonnet-4-20250514"
    toolDependencies:
      - name: crm-mcp-server
        version: ">=1.2.0"

  rollout:
    strategy: canary
    steps:
      - setWeight: 5
        pause: { duration: 5m }
        analysis:
          templateRef: agent-quality-check
      - setWeight: 20
        pause: { duration: 10m }
        analysis:
          templateRef: agent-quality-check
      - setWeight: 100

  rollback:
    onFailedAnalysis: true
    onCostSpike:
      threshold: 200%

  observability:
    langfuse:
      endpoint: "https://langfuse.internal"
    opentelemetry:
      enabled: true

  scaling:
    minReplicas: 2
    maxReplicas: 10
    metric: queue-depth
    targetValue: 5

Why Not Just Use...?

Tool What it does well What it doesn't do
Argo Rollouts Progressive delivery for any K8s workload Doesn't understand agent health metrics (hallucination rate, tool success, cost-per-task)
LangSmith Deploy Deep LangGraph integration Commercial license required; LangGraph only; no progressive delivery
Kagent K8s-native agent CRDs Focused on SRE/DevOps agents, not general agent deployment lifecycle
AWS AgentCore Fully managed agent runtime Vendor lock-in; no progressive delivery; no open-source
Plain K8s Deployment Simple, well-understood No canary, no eval gates, no agent-aware rollback

AgentRoll = Argo Rollouts' progressive delivery + agent-aware quality signals + framework-agnostic design.

Roadmap

  • Phase 0 (Current) β€” Project setup, CRD design, community foundation
  • Phase 1 β€” MVP: AgentDeployment CRD + Argo Rollouts integration + Langfuse analysis
  • Phase 2 β€” Production hardening: multi-framework validation, Terraform modules, security
  • Phase 3 β€” Ecosystem: MCP tool lifecycle, A2A coordination, KEDA scaling, multi-agent deployment

See our detailed roadmap for more information.

Contributing

We welcome contributions! AgentRoll is in its earliest stages β€” now is the best time to get involved and shape the project's direction.

Background & Motivation

This project was born from real-world experience managing release orchestration for cloud infrastructure at scale, combined with deep research into the AI agent deployment landscape. Key observations:

  • 57% of organizations now have agents in production, but most deploy them like traditional microservices
  • 70% of regulated enterprises rebuild their agent stack every 3 months
  • Agent frameworks assume you'll solve deployment yourself β€” because it's genuinely hard
  • The CNCF ecosystem is actively embracing agent infrastructure (Kagent, Agent Sandbox, KubeCon Agentics Day 2026)
  • No open-source tool treats agents as first-class deployable units with evaluation-gated progressive delivery

For a deep dive into the landscape research, see our Architecture Decision Records.

License

MIT


Built with β˜• and conviction that AI agents deserve the same deployment rigor as microservices.

About

Kubernetes-native progressive delivery and lifecycle orchestration for AI agents in production. The missing layer between agent frameworks and reliable operations.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors