A configurable, declarative-first AI Agent framework. Build single agents or multi-agent pipelines from YAML or fluent Python — with built-in memory, skills, sandboxed code execution, and an HTTP server out of the box.
- Why AgentX?
- Quick Start
- Two Ways to Build an Agent
- Multi-Agent Orchestration
- The 60-Second Tutorial: Real LLM, Real Tools, Real Memory
- Architecture
- Modules
- Configuration Reference
- Running as an HTTP Server
- Memory
- Skills
- Sandbox
- API Cheatsheet
- FAQ
- Roadmap
- Contributing
Most agent frameworks force you to pick one style: either pure Python code (verbose, hard to share) or pure YAML/JSON (rigid, hard to extend). AgentX treats declarative YAML and fluent Python as equally first-class. You can:
- Describe an agent in YAML and ship it as a config file.
- Override any field programmatically when you need flexibility.
- Compose agents into pipelines with
ChainAgent/ParallelAgent/CycleAgent— powered by LangGraph when available, with a pure-Python fallback otherwise. - Plug in any OpenAI-compatible model (the official API, vLLM, Ollama, LM Studio, your own gateway).
- Add layered memory (working / session / long-term) without writing storage glue code.
- Run untrusted code in a sandbox (subprocess or Docker) with resource limits and a security policy.
If you want a clean, batteries-included framework that doesn't lock you into any specific cloud or model vendor — AgentX is for you.
# Core only — single agents, tools, FastAPI server, memory
pip install -e .
# With LangGraph-powered multi-agent orchestration (recommended)
pip install -e ".[langgraph]"
# Everything (orchestration + dev tooling)
pip install -e ".[dev]"import asyncio
from agentx import AgentBuilder, OpenAICompatProvider, Runner, InMemorySessionService
def get_weather(city: str) -> str:
"""Return the weather for ``city``."""
return f"{city}: sunny, 25°C"
async def main():
agent = (
AgentBuilder("weather_assistant")
.set_description("A helpful weather assistant.")
.set_model(OpenAICompatProvider(
api_key="sk-...", # any OpenAI-compatible key
base_url="https://api.openai.com/v1", # or any compatible endpoint
default_model="gpt-4o-mini",
))
.set_instruction("You answer weather questions concisely.")
.add_function_tool(get_weather)
.build()
)
runner = Runner(agent=agent, session_service=InMemorySessionService())
response = await runner.run(
"What's the weather in Berlin?",
session_id="s1",
user_id="u1",
)
print(response.final_text)
asyncio.run(main())That's it. No global setup, no servers to start.
agents/researcher.yaml:
id: researcher
name: "Researcher"
description: "Answer research-style questions concisely with sources."
agent_type: llm
model: gpt-4o-mini
system_prompt: |
You are a careful researcher. Always cite the sources you use.
Keep answers under 200 words.
# Optional external prompt file (relative to the YAML file's location)
# system_prompt_file: prompts/researcher.md
max_iterations: 6
parallel_tool_calls: true
max_history_messages: 20
tools:
- web_search
- read_url
metadata:
topics: ["research", "academic"]from agentx import (
AgentConfigLoader,
OpenAICompatProvider,
Runner,
InMemorySessionService,
create_function_tool,
)
# 1. Load YAML → AgentConfig
cfg = AgentConfigLoader.from_yaml("agents/researcher.yaml")
# 2. Define how tool ids resolve into tool objects
TOOLS = {
"web_search": create_function_tool(my_web_search),
"read_url": create_function_tool(my_read_url),
}
# 3. Build the agent — inject a model + tool resolver
loader = AgentConfigLoader(base_dir="agents") # for relative system_prompt_file
agent = loader.build(
cfg,
model=OpenAICompatProvider(default_model="gpt-4o-mini", api_key="sk-..."),
tool_resolver=lambda name: TOOLS.get(name),
)
# 4. Run
runner = Runner(agent=agent, session_service=InMemorySessionService())
response = await runner.run("Why is Python's GIL controversial?", session_id="s1", user_id="u1")Loading multiple agents at once:
registry = AgentConfigLoader.load_many("agents/") # {id: AgentConfig}
agents = {cfg.id: loader.build(cfg, model=model, tool_resolver=...) for cfg in registry.values()}from agentx import AgentBuilder, OpenAICompatProvider
agent = (
AgentBuilder("my_assistant")
.set_description("A general-purpose assistant.")
.set_model(OpenAICompatProvider(
api_key="sk-...",
default_model="gpt-4o-mini",
))
.set_instruction("You are helpful, concise and honest.")
.add_function_tool(lookup_user)
.add_function_tool(send_email)
.set_max_history(20)
.set_parallel_tool_calls(True)
.build()
)The builder and the YAML loader produce equivalent LlmAgent instances —
pick whichever fits your workflow.
Three primitives cover most multi-agent patterns. All three:
- Implement the same
BaseAgentinterface, so they nest freely. - Use LangGraph if
agentx[langgraph]is installed. - Fall back to a pure-Python implementation otherwise — no breakage.
from agentx import ChainAgent, AgentBuilder
planner = AgentBuilder("planner").set_instruction("Outline the article.").set_model(model).build()
writer = AgentBuilder("writer").set_instruction("Expand the outline.").set_model(model).build()
editor = AgentBuilder("editor").set_instruction("Polish for clarity.").set_model(model).build()
article_pipeline = ChainAgent(
name="article_pipeline",
sub_agents=[planner, writer, editor],
)The output of each sub-agent is fed as the user message to the next.
from agentx import ParallelAgent
panel = ParallelAgent(
name="review_panel",
sub_agents=[code_reviewer, perf_reviewer, style_reviewer],
)
# Optional aggregator — combines all branches' final texts
panel = ParallelAgent(
name="review_panel",
sub_agents=[...],
aggregator=lambda outputs: my_merge(outputs),
)All sub-agents run concurrently on the same input. Events from every branch are streamed in real time; a final aggregated text is emitted at the end.
from agentx import CycleAgent
def good_enough(iteration, events):
return iteration >= 1 and any("LGTM" in (ev.text or "") for ev in events[-5:])
reflector = CycleAgent(
name="self_reflect",
sub_agent=critic_then_revise,
max_iterations=4,
stop_condition=good_enough,
)The same orchestration is just YAML:
id: article_pipeline
name: "Article pipeline"
agent_type: chain
sub_agents: [planner, writer, editor]id: review_panel
agent_type: parallel
sub_agents: [code_reviewer, perf_reviewer, style_reviewer]id: reflector
agent_type: cycle
sub_agents: [critic_then_revise]
max_cycles: 4When loading, pass an agents_registry so sub-agent ids can resolve:
loader = AgentConfigLoader(base_dir="agents")
configs = AgentConfigLoader.load_many("agents/")
# Build leaf agents first, then orchestrators
agents = {}
for cfg in configs.values():
if cfg.agent_type == "llm":
agents[cfg.id] = loader.build(cfg, model=model, tool_resolver=resolve_tool)
for cfg in configs.values():
if cfg.agent_type in ("chain", "parallel", "cycle"):
agents[cfg.id] = loader.build(cfg, agents_registry=agents)A complete example showing real LLM + tools + memory + streaming + server.
import asyncio
import os
from agentx import (
AgentBuilder,
ApplicationBuilder,
OpenAICompatProvider,
Runner,
InMemorySessionService,
Settings,
)
# -- 1. A real tool ---------------------------------------------------------
async def search_arxiv(query: str, top_k: int = 3) -> str:
"""Search arXiv for the given query and return summaries."""
import httpx
async with httpx.AsyncClient() as c:
r = await c.get(
"http://export.arxiv.org/api/query",
params={"search_query": f"all:{query}", "max_results": top_k},
timeout=15,
)
return r.text # simplified — parse XML in production
# -- 2. Model ---------------------------------------------------------------
model = OpenAICompatProvider(
api_key=os.environ["OPENAI_API_KEY"],
base_url="https://api.openai.com/v1",
default_model="gpt-4o-mini",
)
# -- 3. Agent ---------------------------------------------------------------
agent = (
AgentBuilder("paper_finder")
.set_description("Finds and summarizes academic papers.")
.set_model(model)
.set_instruction(
"You help users find academic papers. Use the search_arxiv tool when "
"needed. Reply in markdown with a bullet list of the most relevant papers."
)
.add_function_tool(search_arxiv)
.set_max_history(20)
.build()
)
# -- 4. Runner with session memory ------------------------------------------
runner = Runner(agent=agent, session_service=InMemorySessionService())
# -- 5. Run with streaming events --------------------------------------------
async def main():
async for ev in runner.run_stream(
"Find recent papers on speculative decoding for LLM inference.",
session_id="alice-1",
user_id="alice",
):
if ev.type.value == "tool_call":
print(f" → calling {ev.tool_name}({ev.tool_args})")
elif ev.type.value == "text" and ev.content:
print(ev.content.text, end="", flush=True)
elif ev.type.value == "done":
print("\n[done]")
asyncio.run(main())app = (
ApplicationBuilder()
.set_config(Settings(app_name="paper-finder"))
.set_agent(agent)
.set_runner(runner)
.build()
)
app.run() # http://0.0.0.0:8000
# POST /api/chat/stream → SSE
# POST /api/chat → OpenAI-compatible JSON┌────────────────────────────────────────────────────────────┐
│ Application │
│ (ApplicationBuilder + Bootstrap glue) │
├────────────────────────────────────────────────────────────┤
│ Config Layer │
│ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Settings │ │ Bootstrap │ │ Plugins │ │
│ └──────────┘ └───────────┘ └──────────┘ │
├────────────────────────────────────────────────────────────┤
│ Core capabilities │
│ ┌───────┐ ┌────────┐ ┌─────────┐ ┌───────┐ ┌─────────┐ │
│ │ Agent │ │ Runner │ │ Session │ │ Model │ │ Tool │ │
│ └───────┘ └────────┘ └─────────┘ └───────┘ └─────────┘ │
│ ┌──────┐ ┌───────────┐ ┌──────────┐ ┌────────┐ │
│ │ CLI │ │ Callbacks │ │ Server │ │ Memory │ │
│ └──────┘ └───────────┘ └──────────┘ └────────┘ │
├────────────────────────────────────────────────────────────┤
│ Advanced capabilities │
│ ┌─────────┐ ┌───────┐ ┌──────────┐ │
│ │ Sandbox │ │ Skill │ │ MetaFlow │ │
│ └─────────┘ └───────┘ └──────────┘ │
├────────────────────────────────────────────────────────────┤
│ Multi-agent orchestration: LangGraph (optional) │
└────────────────────────────────────────────────────────────┘
User message
│
▼
Runner.run_stream(message, session_id, user_id)
│
▼ (loads history from SessionService)
Agent.run_stream(messages)
│
▼
LLM call ──► tool_call? ──yes──► execute_tool() ──► tool result fed back
│ │ │
│ └──no──► final TEXT event ◄─────────────┘
│
▼
DONE event
│
▼ (persists exchange to SessionService)
yields events back to caller
| Module | Path | Description |
|---|---|---|
| Config | agentx/config/ |
Pydantic-based settings, YAML + env override |
| Bootstrap | agentx/bootstrap.py |
Wires components from Settings into a ComponentRegistry |
| Application | agentx/application.py |
Top-level Application + ApplicationBuilder facade |
| Agent | agentx/agent/ |
BaseAgent, LlmAgent, AgentBuilder, AgentConfig, orchestration |
| Runner | agentx/runner/ |
Drives a single agent run end-to-end with a SessionService |
| Session | agentx/session/ |
In-memory / JSONL / event-log session services |
| Model | agentx/model/ |
LLMProvider, OpenAICompatProvider, factory & registry |
| Tool | agentx/tool/ |
FunctionTool, HttpTool, global ToolRegistry |
| Memory | agentx/memory/ |
Working / session / long-term memory with pluggable backends |
| Server | agentx/server/ |
FastAPI server with /health, /api/chat, /api/chat/stream (SSE) |
| CLI | agentx/cli/ |
CLI tool registry with safety checks |
| Callbacks | agentx/callbacks/ |
Lifecycle hooks (before/after agent / model / tool) |
| Plugins | agentx/plugins/ |
Plugin system for extending the framework |
| Sandbox | agentx/sandbox/ |
Subprocess + Docker code execution with security policy |
| Skill | agentx/skill/ |
Reusable skills loaded from filesystem / Git / artifact repos |
| MetaFlow | agentx/metaflow/ |
Self-iterating engine: reflection + strategy optimization |
| Filter | agentx/filter/ |
Filter chain (request / response interceptors) |
| Context | agentx/context/ |
Per-request context (headers, trace ids, user ids) with header policy |
| Security | agentx/security/ |
CredentialVault for secret resolution |
A typical config.yaml:
app_name: my-agent-service
env: prod
debug: false
log_level: info
llm:
default_model: gpt-4o-mini
providers:
openai:
api_key: ${OPENAI_API_KEY}
base_url: https://api.openai.com/v1
local:
api_key: not-needed
base_url: http://localhost:11434/v1 # Ollama / LM Studio / vLLM …
model_registry:
gpt-4o-mini: {provider: openai, model_id: gpt-4o-mini}
llama3: {provider: local, model_id: llama3}
server:
host: 0.0.0.0
port: 8000
features:
enable_cli: true
enable_skills: true
enable_sandbox: false
enable_metaflow: false
skill:
skills_dir: ./skills
sandbox:
default_executor: local # local | docker
work_dir: /tmp/agentx-sandbox
security:
max_execution_time: 30
max_memory_mb: 512
network_access: falseEnvironment variables override file values: AGENTX_LLM__DEFAULT_MODEL=gpt-4o,
AGENTX_SERVER__PORT=9000, etc.
from agentx import ApplicationBuilder
app = (
ApplicationBuilder()
.set_config_path("config.yaml")
.set_agent(my_agent)
.set_runner(my_runner)
.build()
)
app.run() # blocking — uvicorn, host/port from configFor testing or custom hosting:
fastapi_app = app.build_app() # FastAPI instance
# mount under your own ASGI server, gunicorn workers, etc.| Method | Path | Description |
|---|---|---|
| GET | /health |
health check (returns framework + app metadata) |
| POST | /api/chat |
non-streaming chat (OpenAI-compatible JSON) |
| POST | /api/chat/stream |
SSE streaming chat with one JSON payload per event |
{ "type": "tool_call", "agent": "paper_finder", "tool": "search_arxiv", "args": {"query": "..."} }
{ "type": "tool_result", "agent": "paper_finder", "tool": "search_arxiv", "result": "..." }
{ "type": "text", "agent": "paper_finder", "text": "Here are some papers..." }
{ "type": "done", "agent": "paper_finder" }If you want full control over request → response (e.g. add auth, rate limit, custom routing), pass a handler:
async def my_handler(body: dict, request) -> str:
user_id = request.headers.get("x-user-id", "")
msg = body["messages"][-1]["content"]
# … custom logic …
return await runner.run(msg, user_id=user_id, session_id=...).then_text()
app = ApplicationBuilder().set_chat_handler(my_handler).build()The memory module is layered:
| Layer | Lifetime | Typical backend(s) | What it stores |
|---|---|---|---|
| Working | Within a single task | filesystem (canvas + evidence) | task plan, intermediate results, tool outputs |
| Session | Within a session | in-memory / Mem0 / custom HTTP | chat history, page views, real-time signals |
| Long-term | Across sessions | Mem0 / custom HTTP | user preferences, profile tags |
Working memory ships with token-pressure-aware injection — when the prompt budget gets tight the canvas is automatically pruned (full → trimmed → metadata-only).
from agentx.memory import MemoryConfig, MemoryService
from agentx.memory.config import (
SessionMemoryConfig,
UserMemoryConfig,
WorkingMemoryConfig,
)
cfg = MemoryConfig(
enabled=True,
working=WorkingMemoryConfig(storage_root="/data/agentx/working"),
session=SessionMemoryConfig(enabled=True, backend="inmemory"),
user=UserMemoryConfig(enabled=False), # plug in Mem0 or your own backend
)
service = MemoryService(cfg)A skill is a reusable, declarative unit of work — a script + a
SKILL.md. Skills can be loaded from:
- Local filesystem — drop a directory next to your project
- Git repositories — auto clone/pull, TS skills compiled via esbuild
- Generic artifact repositories — versioned bundles via HTTP
from agentx.skill import SkillRepository, SkillManager, register_skill_tools
repo = SkillRepository()
repo.discover("./skills") # scans for SKILL.md files
mgr = SkillManager(repository=repo)
# Auto-register four standard tools (skill_load / run / list_docs / select_docs)
register_skill_tools(mgr)Once registered, the agent can:
- List available skill metadata.
- Load a skill on demand.
- Run it inside a sandbox.
Run untrusted code safely:
from agentx.sandbox import SandboxManager, SecurityPolicy
manager = SandboxManager(
security=SecurityPolicy(max_memory_mb=256, network_access=False),
default_executor="docker", # local | docker
)
result = await manager.execute(
"print(sum(range(10)))",
language="python",
timeout=10,
)
print(result.output) # "45\n"
print(result.executor_type) # "docker" or "local"Built-in backends:
| Backend | Class | Use case |
|---|---|---|
local |
LocalExecutorBackend |
dev / test (subprocess; not a real sandbox) |
docker |
DockerExecutorBackend |
Docker CLI with memory / CPU / network limits |
Plug your own (firecracker, e2b, wasm, …) via manager.register_backend().
from agentx import (
BaseAgent, # abstract base — implement run_async / run_stream
LlmAgent, # standard single LLM agent
AgentBuilder, # fluent builder for LlmAgent
AgentConfig, # declarative schema (pydantic)
AgentConfigLoader, # load YAML / dict → AgentConfig → Agent
ChainAgent, # sequential pipeline
ParallelAgent, # concurrent fan-out
CycleAgent, # iterative loop
)from agentx import (
Runner, # drives a single agent run
InMemorySessionService, # default in-memory session
JsonlSessionService, # JSONL-backed sessions
SessionEventLog, SessionEvent, SessionEventType,
)from agentx import (
LLMProvider, # base provider protocol
OpenAICompatProvider, # default OpenAI-compatible client
LLMResponse, LLMStreamChunk,
ProviderRegistry, # global provider registry
ModelRegistry, ModelEntry, # named model entries
ModelFactory, # build providers from Settings
)from agentx import (
FunctionTool, HttpTool, # built-in tool types
ToolRegistry, # global named registry
create_function_tool, # convenience: function → FunctionTool
register_tool, get_tool, # registry helpers
)from agentx import (
Settings,
ApplicationBuilder, Application,
bootstrap, ComponentRegistry,
)
from agentx.server import LocalServer, create_local_app| Type | When | Payload fields |
|---|---|---|
text |
LLM produced text | content (Content), agent_name |
tool_call |
LLM requested a tool | tool_name, tool_args |
tool_result |
Tool execution finished | tool_name, tool_result |
agent_step |
Sub-agent switch in orchestration | agent_name, metadata |
error |
Recoverable runtime error | content, agent_name |
done |
Run finished | agent_name |
cancelled |
Caller cancelled (e.g. SSE disconnect) | — |
status / resource / thinking / recommend / metadata / heartbeat |
Business event extensions (optional) | varies |
Q: Do I need LangGraph?
No. ChainAgent / ParallelAgent / CycleAgent ship with a pure-Python
fallback. Install agentx[langgraph] only if you want the LangGraph runtime
(graph compilation, conditional edges, checkpoints, etc.).
Q: Which LLM providers are supported?
Anything that speaks the OpenAI Chat Completions API: official OpenAI,
Anthropic via gateway, vLLM, Ollama, LM Studio, Together, Groq, your own
proxy, etc. Bring your own client by implementing
async def chat(messages, tools=None, **kw) -> {"content", "tool_calls"}.
Q: How do I plug in a different memory backend?
Implement the MemoryBackend protocol and pass it into MemoryConfig.
Built-ins: InMemoryBackend, Mem0Backend. There is also an HttpBackendClient
helper for talking to remote services over plain HTTP.
Q: Is the sandbox really safe?
The local backend uses an unsandboxed subprocess and is for development
only. The docker backend gives you containerization with memory / CPU /
network limits — a real isolation boundary, but still not a substitute for
a hardened runtime if you're running adversarial code. For high-isolation
needs implement BaseExecutorBackend over firecracker, gVisor, or wasm.
Q: How do I add lifecycle hooks?
Use CallbackManager from agentx.callbacks. Hooks: before_agent,
after_agent, before_model, after_model, before_tool, after_tool.
Q: Can I use this for production?
The framework is functional and tested (153 tests, all passing). It is at
0.4.x — APIs may still evolve before 1.0. Follow the CHANGELOG
and pin a version.
- Anthropic / Gemini native providers (in addition to OpenAI-compat)
- Built-in MCP server / client
- Persistent memory backends (Postgres, Redis, SQLite)
- Trace exports (OpenTelemetry + Phoenix / Langfuse compatible)
- More orchestration patterns (graph routing, debate / vote)
- Web UI for inspecting agent runs
git clone https://github.com/willhaosky/AgentX.git
cd AgentX
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# Lint + format
ruff check agentx/ tests/ examples/
ruff format agentx/ tests/ examples/
# Run the tests
pytest
pytest --cov=agentx -qThe demos use a tiny in-process mock model so they need no API keys:
python examples/programmatic_demo.py
python examples/declarative_demo.py
python examples/pipeline_demo.pyAgentX/
├── agentx/
│ ├── __init__.py # 99 public exports
│ ├── application.py # Application + ApplicationBuilder
│ ├── bootstrap.py # Wires Settings → ComponentRegistry
│ ├── agent/
│ │ ├── base.py # BaseAgent
│ │ ├── llm_agent.py # LlmAgent
│ │ ├── builder.py # AgentBuilder
│ │ ├── config.py # AgentConfig + AgentConfigLoader
│ │ ├── chain_agent.py # ChainAgent (LangGraph + fallback)
│ │ ├── parallel_agent.py # ParallelAgent (LangGraph + fallback)
│ │ ├── cycle_agent.py # CycleAgent (LangGraph + fallback)
│ │ ├── langgraph_state.py # shared LangGraph state schema
│ │ └── types.py # AgentEvent / Content / Part / RunResponse
│ ├── runner/ # Runner — drives single agent runs
│ ├── session/ # SessionService implementations
│ ├── model/ # Provider / Factory / Registry
│ ├── tool/ # FunctionTool / HttpTool / ToolRegistry
│ ├── memory/ # Working / Session / Long-term memory
│ ├── server/ # FastAPI server (LocalServer)
│ ├── config/ # Pydantic Settings
│ ├── cli/ # CLI tool manager + safety checks
│ ├── callbacks/ # Lifecycle callbacks
│ ├── plugins/ # Plugin system
│ ├── sandbox/ # SandboxManager + executor backends
│ ├── skill/ # SkillRepository / SkillManager / loaders
│ ├── metaflow/ # Self-iterating engine
│ ├── filter/ # Filter chain
│ ├── context/ # Request context + header policy
│ └── security/ # CredentialVault
├── examples/
│ ├── agents/ # YAML agent definitions
│ ├── declarative_demo.py # YAML → Agent
│ ├── programmatic_demo.py # AgentBuilder → Agent
│ ├── pipeline_demo.py # Chain / Parallel / Cycle
│ └── _mock_model.py # offline mock LLM used by demos
├── tests/ # 153 tests
├── docs/ # Output format / request context guides
├── pyproject.toml
├── LICENSE
└── README.md
PRs are welcome — please read CONTRIBUTING.md and the Code of Conduct before submitting.
A good first PR could be:
- More provider implementations (Anthropic, Gemini, Cohere)
- A persistent
SessionService(Postgres / Redis / SQLite) - More worked examples in
examples/ - Documentation improvements
MIT © willhaosky