Kubernetes-native progressive delivery for AI agents in production
The missing layer between agent development frameworks and reliable production operations.
AI agent frameworks (LangGraph, CrewAI, OpenAI Agents SDK) help you build agents. Cloud platforms help you run them. But nothing helps you safely ship changes to agents already in production.
Today, most teams deploy agents the same way they deploy microservices β docker push then pray. But agents are fundamentally different:
- 4 layers change simultaneously: prompt, model version, tool configurations, and memory β a 2-word prompt change can break production
- Non-deterministic behavior: the same input can trigger different tool calls and reasoning paths every time
- No meaningful unit tests: traditional pass/fail assertions don't work when outputs vary per run
- Unpredictable costs: one agent task can consume 10x-100x more tokens than another
- Rollback is structurally harder: stateful agents modify external systems (databases, APIs, emails) that can't be simply reverted
The result? 70% of regulated enterprises rebuild their agent stack every 3 months. Teams manually eyeball evaluation results. Nobody knows if the new version is actually better until users complain.
AgentRoll brings evaluation-gated progressive delivery to AI agent deployments on Kubernetes. Think of it as Argo Rollouts meets agent-aware intelligence.
βββββββββββββββ
β New Agent β
β Version β
ββββββββ¬βββββββ
β
ββββββββΌβββββββ
β 5% Canary βββββ Eval: hallucination rate, tool success,
β β cost-per-task, latency
ββββββββ¬βββββββ
β β
Pass
ββββββββΌβββββββ
β 20% Canary βββββ Eval: same metrics, larger sample
β β
ββββββββ¬βββββββ
β β
Pass
ββββββββΌβββββββ
β 50% Canary βββββ Eval: cost comparison vs baseline
β β
ββββββββ¬βββββββ
β β
Pass
ββββββββΌβββββββ
β 100% Stable β
β β
βββββββββββββββ
β Any step fails β automatic rollback
β οΈ AgentRoll is in early alpha. We're building in public. Features below represent our roadmap β check the status column for current availability.
| Feature | Description | Status |
|---|---|---|
| AgentDeployment CRD | Declare your agent's complete deployable config as a Kubernetes custom resource | π¨ Building |
| Evaluation-Gated Canary | Progressive rollout with agent-quality gates (hallucination rate, tool success rate, cost-per-task) | π¨ Building |
| Argo Rollouts Integration | Built on top of Argo Rollouts β not reinventing the wheel | π¨ Building |
| Agent AnalysisTemplates | Pre-built quality metric templates for common agent patterns | π Planned |
| Langfuse Integration | Out-of-the-box agent trace data as canary analysis source | π Planned |
| OTel Observability | Auto-injected OpenTelemetry sidecar for agent tracing | π Planned |
| Grafana Dashboards | Pre-built dashboards for agent-specific metrics | π Planned |
| Composite Versioning | Track prompt + model + tools + memory as a single versioned entity | π Planned |
| Cost-Aware Scaling | KEDA-based autoscaling with queue-depth metrics and token budgets | ποΈ Future |
| MCP Tool Lifecycle | Manage MCP tool server versions alongside agents | ποΈ Future |
| Multi-Agent Coordination | Coordinated canary deployments across dependent agents | ποΈ Future |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interface β
β kubectl / Helm / ArgoCD / CI/CD β
ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β AgentRoll Operator β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ β
β β CRD β β Rollout β β Analysis β β
β β Controller β β Manager β β Engine β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββββ¬ββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β AgentDeployment Argo Rollouts Langfuse / OTel β
β CRD (rollout engine) (data sources) β
ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β Kubernetes Cluster β
β β
β ββββββββββββββ ββββββββββββββ ββββββββββββββββββββββββ β
β β Agent Pod β β Agent Pod β β OTel Sidecar β β
β β v1 (stable)β β v2 (canary)β β (per pod) β β
β ββββββββββββββ ββββββββββββββ ββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Prometheus / Grafana / Langfuse β β
β β (agent metrics collection & visualization) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π§ Coming soon. AgentRoll is currently in active development.
# Install AgentRoll operator (coming soon)
helm repo add agentroll https://ywc668.github.io/agentroll
helm install agentroll agentroll/agentroll-operator
# Deploy your first agent with progressive delivery
kubectl apply -f examples/basic-agent-deployment.yamlapiVersion: agentroll.dev/v1alpha1
kind: AgentDeployment
metadata:
name: customer-support-agent
spec:
container:
image: myregistry/support-agent:v2.1.0
env:
- name: LLM_PROVIDER
value: anthropic
- name: LLM_MODEL
value: claude-sonnet-4-20250514
agentMeta:
promptVersion: "abc123"
modelVersion: "claude-sonnet-4-20250514"
toolDependencies:
- name: crm-mcp-server
version: ">=1.2.0"
rollout:
strategy: canary
steps:
- setWeight: 5
pause: { duration: 5m }
analysis:
templateRef: agent-quality-check
- setWeight: 20
pause: { duration: 10m }
analysis:
templateRef: agent-quality-check
- setWeight: 100
rollback:
onFailedAnalysis: true
onCostSpike:
threshold: 200%
observability:
langfuse:
endpoint: "https://langfuse.internal"
opentelemetry:
enabled: true
scaling:
minReplicas: 2
maxReplicas: 10
metric: queue-depth
targetValue: 5| Tool | What it does well | What it doesn't do |
|---|---|---|
| Argo Rollouts | Progressive delivery for any K8s workload | Doesn't understand agent health metrics (hallucination rate, tool success, cost-per-task) |
| LangSmith Deploy | Deep LangGraph integration | Commercial license required; LangGraph only; no progressive delivery |
| Kagent | K8s-native agent CRDs | Focused on SRE/DevOps agents, not general agent deployment lifecycle |
| AWS AgentCore | Fully managed agent runtime | Vendor lock-in; no progressive delivery; no open-source |
| Plain K8s Deployment | Simple, well-understood | No canary, no eval gates, no agent-aware rollback |
AgentRoll = Argo Rollouts' progressive delivery + agent-aware quality signals + framework-agnostic design.
- Phase 0 (Current) β Project setup, CRD design, community foundation
- Phase 1 β MVP: AgentDeployment CRD + Argo Rollouts integration + Langfuse analysis
- Phase 2 β Production hardening: multi-framework validation, Terraform modules, security
- Phase 3 β Ecosystem: MCP tool lifecycle, A2A coordination, KEDA scaling, multi-agent deployment
See our detailed roadmap for more information.
We welcome contributions! AgentRoll is in its earliest stages β now is the best time to get involved and shape the project's direction.
- π Report bugs
- π‘ Request features
- π¬ Join discussions
- π Read contributing guide
This project was born from real-world experience managing release orchestration for cloud infrastructure at scale, combined with deep research into the AI agent deployment landscape. Key observations:
- 57% of organizations now have agents in production, but most deploy them like traditional microservices
- 70% of regulated enterprises rebuild their agent stack every 3 months
- Agent frameworks assume you'll solve deployment yourself β because it's genuinely hard
- The CNCF ecosystem is actively embracing agent infrastructure (Kagent, Agent Sandbox, KubeCon Agentics Day 2026)
- No open-source tool treats agents as first-class deployable units with evaluation-gated progressive delivery
For a deep dive into the landscape research, see our Architecture Decision Records.
Built with β and conviction that AI agents deserve the same deployment rigor as microservices.