Decompose complex tasks. Dispatch parallel workers. Evolve better strategies from every run.
Web2BigTable is a bi-level multi-agent framework for web-to-table search — given a natural-language query and a target schema, it autonomously searches the open web and returns a structured table whose rows are entities, whose columns are the requested attributes, and whose cells are independently verified against web sources. It handles both wide search (broad-coverage tasks that assemble many consistent rows across heterogeneous sources) and deep search (single complex queries resolved by chaining indirect clues across many hops).
An upper-level orchestrator decomposes the task and dispatches sub-problems to lower-level worker agents that solve them in parallel and coordinate through a shared workspace to reduce redundant exploration and reconcile conflicting evidence. The system is self-evolving through a closed-loop run-verify-reflect process that jointly refines how tasks are decomposed and how sub-tasks are solved — adaptation is mediated through persistent, human-readable external memory, leaving the underlying LLMs frozen throughout.
Benchmark Results · Install · Quick Start · Key Features · Why It Matters · Ecosystem · Citation
We evaluate Web2BigTable on two challenging benchmarks:
- WideSearch — a benchmark for complex, multi-step information retrieval tasks requiring parallel search, data extraction, and structured output across diverse domains.
- XBench-DeepSearch — a benchmark for evaluating deep research capabilities on real-world questions requiring multi-hop reasoning and comprehensive web search.
|
WideSearch Performance landscape on WideSearch (Avg@4). Position encodes Row F1 (x) and Item F1 (y); label encodes Success Rate. Dashed lines show frontier single-agent Item F1. Web2BigTable dominates all three metrics simultaneously. |
|
XBench-DeepSearch Accuracy on XBench-DeepSearch. Web2BigTable (73.0%) surpasses all open-source agentic models and rivals frontier proprietary systems. |
|
System Architecture Three-stage architecture of Web2BigTable. Stage 1 (Orchestrate): an Orchestrator LLM reads decomposition strategies from Strategy Memory So through a Skill Router and partitions the user query into N subtasks (each with instruction and output schema). Stage 2 (Execute): parallel worker agents resolve execution skills from a Shared Skill Bank Sw (dynamic retrieval + self-repair) and coordinate asynchronously through a Shared Workboard me — file-locked, tag-partitioned — to avoid duplicated work and fill coverage gaps; partial outputs are aggregated into the structured BigTable. Stage 3 (Evolve, training only — red arrows): a Run-Verify-Reflect loop contrasts system output against a gold reference, clusters error patterns, and refines/modularises both decomposition skills (written back to So) and execution skills (written back to Sw). At inference time (black arrows), Stages 1–2 run with frozen So and Sw and no reflection. |
|
Training (Self-Evolving) Flow Training flow of Web2BigTable over one episode k. For each training query qk, Stage 1 reads the long-term orchestrator skills So and decomposes qk into subtasks τ. Stage 2 dispatches the subtasks to N parallel workers, which read execution skills from Sw and read/write the short-term workboard me until convergence. Stage 3 verifies the aggregated output Xk against the gold reference, produces the structured reflection rok+1, and consolidates it into both So (via Mo) and Sw (via Mw). Episodes are processed sequentially: the bottom black loop moves from episode k to k+1 without replanning within an episode. After K episodes, the two skill banks (So*, Sw*) are frozen and returned as the training output, then used unchanged during inference. |
|
Inference Flow Inference flow of Web2BigTable on an unseen user query q. Using the trained skill banks So* and Sw* as frozen read-only inputs, Stage 1 decomposes q into subtasks τ. Stage 2 runs N parallel workers that resolve execution skills from Sw* and coordinate through the shared workboard me (per-query, short-term); their partial outputs {xi} are aggregated into the structured big table X. No verification, reflection, or memory update is performed: the system runs a single forward pass and returns X. |
Core question. Web2BigTable is not about building yet another chatbot wrapper. It is about how to decompose hard tasks into parallel subtasks, coordinate workers effectively, and evolve better decomposition strategies from every run.
|
Decompose intelligently Evolved orchestrator skills route tasks to the best decomposition strategy — split by entity, time, category, rank, or dependency. |
Execute in parallel Up to 10 Memento-S workers run concurrently, coordinating through a shared workboard to avoid redundant work and merge partial results. |
Evolve from experience Decomposition strategies are evolved from past task executions — the system clusters task patterns and generates specialised orchestrator skills automatically. |
| Feature | Why it matters |
|---|---|
| Multi-agent orchestration via MCP | An orchestrator agent decomposes tasks and dispatches subtasks to parallel workers through a FastMCP server, enabling true concurrent execution rather than sequential tool calls. |
| Learned decomposition strategies | Decomposition strategies (task-router + 11 decompose-* patterns) are learned from task experience, so the system continuously improves how it breaks down different types of tasks. |
| Shared workboard coordination | Workers read and edit a shared markdown workboard for inter-agent communication — claim sections, post partial results, and avoid duplicate work without central locking. |
| Semantic skill routing | BM25 + sentence-transformer embeddings + LLM selection ensure each worker picks the best skill for its subtask, even as the skill library grows. |
| Ops-based execution engine | Workers use a JSON ops architecture (not function calling) with filesystem, terminal, web, workboard, and meta operations, enabling fine-grained multi-round execution within each skill. |
| Textual TUI | A rich terminal interface for submitting tasks, inspecting per-worker execution steps, viewing live workboard state, and reading the final synthesised output. |
Web2BigTable is organised as a three-stage Orchestrate → Execute → Evolve pipeline. The first two stages run on every query; the third runs only during training and writes its lessons back into persistent external memory.
| Stage | What it means |
|---|---|
| Orchestrate | An Orchestrator LLM reads decomposition strategies from Strategy Memory So via a Skill Router, picks the best-matching pattern for the incoming query, and partitions it into N self-contained subtasks (instruction + output schema). |
| Execute | Parallel worker agents resolve execution skills from a Shared Skill Bank Sw with dynamic retrieval and self-repair, and coordinate asynchronously through a file-locked, tag-partitioned Shared Workboard me — claiming sections, posting partial findings, and reconciling conflicts as the global state evolves. Outputs are aggregated into a structured BigTable. |
| Evolve (training only) | A Run-Verify-Reflect loop contrasts system output against a gold reference, clusters error patterns, and refines/modularises both decomposition skills (written back to So) and execution skills (written back to Sw). At inference time the two memories are frozen and read-only. |
The key difference from prior multi-agent systems is bi-level co-evolution: most frameworks adapt either how to plan (decomposition) or how to act (execution skills), but not both. Web2BigTable jointly refines them through the same closed-loop reflection, mediated entirely by persistent, human-readable external memory — the underlying LLMs are never fine-tuned.
curl -sSL https://raw.githubusercontent.com/Web2BigTable/Web2BigTable/main/install.sh | bash|
One command to install, one command to launch. The installer sets up dependencies, downloads router assets, configures API keys, and creates the |
The installer will:
- Install
uv(if not present) - Clone the repository
- Install all dependencies (Memento-S + orchestrator)
- Download router assets (skill catalog + optional embeddings)
- Configure
.envinteractively (API keys) - Create the
web2bigtablecommand
git clone https://github.com/Web2BigTable/Web2BigTable.git
cd Web2BigTable
# Install Memento-S worker dependencies
cd Memento-S && uv sync --python 3.12 && cd ..
# Install orchestrator dependencies
uv sync --python 3.12Create a .env file in the project root:
OPENROUTER_API_KEY=sk-or-...
OPENROUTER_MODEL=anthropic/claude-sonnet-4-5
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
SERPER_API_KEY=...Then launch:
web2bigtableConfiguration
All configuration is centralised in environment variables. Key settings:
| Variable | Default | Description |
|---|---|---|
OPENROUTER_API_KEY |
— | API key for LLM calls (required) |
OPENROUTER_MODEL |
anthropic/claude-sonnet-4-5 |
Model for Memento-S workers |
OPENROUTER_BASE_URL |
https://openrouter.ai/api/v1 |
LLM API base URL |
SERPER_API_KEY |
— | API key for web search skill (serper.dev) |
MAX_WORKERS |
10 |
Max parallel workers per task |
SEMANTIC_ROUTER_ENABLED |
true |
Enable semantic skill pre-filtering |
SEMANTIC_ROUTER_TOP_K |
4 |
Number of candidate skills for LLM routing |
SKILL_DYNAMIC_FETCH_ENABLED |
true |
Auto-fetch missing skills from catalog |
DEBUG |
false |
Enable debug logging |
WORKSPACE_DIR |
Memento-S/workspace |
Workboard location shown in TUI |
| Skill | Description |
|---|---|
filesystem |
Read, write, edit, search, and manage files and directories |
terminal |
Execute shell commands with safety checks |
web-search |
Google search via Serper + URL fetching |
uv-pip-install |
Python package management via uv/pip |
skill-creator |
Dynamically create new skills at runtime |
Workers automatically select the best skill for each subtask via semantic routing (BM25 + embeddings + LLM). If no existing skill matches, the system can dynamically fetch or create new skills on demand.
web2bigtable- Submit tasks directly from the interface (
Ctrl+Enteror Run Task) - Session-scoped worker list with per-worker status (
live/finished) - Click any worker row to inspect execution steps and events
- Live workboard view showing real-time inter-worker coordination
- Final orchestrator output panel
| Shortcut | Action |
|---|---|
Ctrl+Enter |
Run task |
r |
Refresh worker list |
c |
Copy final output to clipboard |
q |
Quit |
Project structure
Web2BigTable/
├── tui_app.py # Textual TUI — primary interface
├── main.py # Standalone entry point (non-TUI)
├── install.sh # One-click installer
├── pyproject.toml # Root project (orchestrator deps + entry point)
├── orchestrator/
│ ├── orchestrator_agent.py # LangChain orchestrator agent
│ └── mcp_server.py # FastMCP server (execute_subtasks + workboard)
├── orchestrator_skills/ # Auto-generated decomposition strategies
│ ├── task-router/ # Routes queries to decompose strategies
│ ├── workboard/ # Shared workboard coordination
│ ├── decompose-split-by-entity/ # Split by entity/brand
│ ├── decompose-split-by-time-period/ # Split by chronological range
│ ├── decompose-split-by-category/ # Split by categorical dimension
│ ├── decompose-split-by-rank-segment/# Split by rank ranges
│ ├── decompose-annual-rank-stats/ # Annual ranking statistics
│ ├── decompose-comparative-data-extraction/ # Comparative data extraction
│ ├── decompose-constrained-set-search/ # Constrained set search
│ ├── decompose-entity-benchmarking/ # Entity benchmarking
│ ├── decompose-geographic-registries/# Geographic registry lookup
│ ├── decompose-linear-multi-hop-dependency/ # Linear multi-hop dependency
│ ├── decompose-multimedia-source-verification/ # Multimedia source verification
│ └── decompose-temporal-event-logs/ # Temporal event log extraction
├── Memento-S/ # Worker agent (submodule)
│ ├── core/
│ │ ├── agent/memento_s_agent.py # Worker agent class
│ │ ├── config.py # Configuration & constants
│ │ ├── router.py # Skill routing (BM25 + embeddings + LLM)
│ │ ├── llm.py # LLM wrapper (OpenRouter)
│ │ ├── skill_engine/ # Planning, execution, bridge ops
│ │ └── tools/ # Tool implementations
│ └── skills/ # Built-in skill definitions
├── figures/ # README figures
├── docs/ # Documentation
└── logs/ # Worker trajectory logs (*.jsonl)
Tech stack
| Layer | Technology |
|---|---|
| Interface | Textual (TUI) |
| Orchestration | LangChain + MCP (Model Context Protocol) |
| Worker framework | Memento-S (ops-based skill execution) |
| LLM access | OpenRouter (multi-provider) |
| Skill routing | BM25 (jieba) + sentence-transformers (BAAI/bge-m3) + LLM selection |
| MCP transport | FastMCP (stdio) |
| Coordination | Shared workboard (thread-safe markdown read/write/edit) |
| Execution | uv sandbox + subprocess isolation |
| Async runtime | asyncio |
| Build and packaging | uv + hatchling |
| Problem | Solution |
|---|---|
| Skills not found | Check that Memento-S/skills/ exists and skill catalog is downloaded. |
| API timeout | Increase the model timeout or switch to a faster model in .env. |
| Import errors | Make sure both virtual environments are active: Memento-S and root. |
| Web search fails | Check whether SERPER_API_KEY is configured in .env. |
| Workers stuck | Check logs/worker-*.jsonl for error details. Increase MAX_WORKERS if tasks queue. |
| Workboard conflicts | Workers use tagged sections — check .workboard.md for malformed edits. |
Web2BigTable is part of the broader Memento project family.
| Resource | Link | Description |
|---|---|---|
| Memento Homepage | memento.run | The hub for all Memento series projects and research |
| Memento-Skills | GitHub | Single-agent self-evolving skill framework |
| Web2BigTable | GitHub | Multi-agent orchestration with self-improving decomposition (this repo) |
| Discord Community | Join Discord | Discussion, Q&A, feature requests, and collaboration |
If you find Web2BigTable useful in your research, please cite:
@misc{huang2026web2bigtablebilevelmultiagentllm,
title={Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction},
author={Yuxuan Huang and Yihang Chen and Zhiyuan He and Yuxiang Chen and Ka Yiu Lee and Huichi Zhou and Weilin Luo and Meng Fang and Jun Wang},
year={2026},
eprint={2604.27221},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2604.27221},
}点击展开中文摘要
Web2BigTable 是一个多智能体协作系统,核心思路是将复杂任务分解为可并行执行的子任务,由多个 Memento-S 工作智能体同时处理,并通过共享工作板(workboard)进行协调。
系统围绕 路由 → 分解 → 执行 → 合成 的在线流程构建。编排智能体(Orchestrator)通过 task-router 识别任务类型,匹配最佳的 decompose-* 分解策略,将任务拆分为独立子任务;工作智能体通过语义路由选择最佳技能并行执行,通过共享 workboard 进行协调;最后编排智能体聚合结果,生成最终响应。
在 WideSearch 基准测试中,Web2BigTable 在 Row F1(63.5)、Item F1(80.1)和 Success Rate(38.5)三项指标上全面超越 o3-high、Gemini 2.5 Pro、Claude Sonnet 4 等前沿基线。在 XBench-DeepSearch 上达到 73.0% 准确率,超越所有开源智能体模型,接近前沿商业系统。
MIT





