An autonomous multi-agent system that discovers causal questions from the clinical literature and generates target trial emulation protocols -- complete with analysis code, independent review loops, and optional execution against a configured database. The coordinator agent drives the entire pipeline using its own judgment, not hardcoded logic.
flowchart TB
classDef coord fill:#d4e6f1,stroke:#2874a6,stroke-width:2px,color:#000
classDef agent fill:#fdebd0,stroke:#b9770e,color:#000
classDef mcp fill:#d5f5e3,stroke:#239b56,color:#000
classDef files fill:#f4f6f6,stroke:#7b7d7d,stroke-dasharray: 3 3,color:#000
COORD["<b>Coordinator Agent</b><br/><i>long-lived Claude Code session</i><br/><br/>Reads COORDINATOR.md<br/>Launches sub-agents<br/>Reads output files<br/>Decides: advance / revise / backtrack"]:::coord
WORKER["<b>Worker Agents</b> · <i>claude -p</i><br/><br/>WORKER.md<br/>Search PubMed · Query DB<br/>Write protocols · Generate R<br/>Execute R (online mode)"]:::agent
REVIEWER["<b>Reviewer Agents</b> · <i>claude -p</i><br/><br/>REVIEW.md<br/>Verify PMIDs · Check methods<br/>Check SQL/R · Conventions<br/>Write critiques"]:::agent
REPORTER["<b>Report Writers</b> · <i>claude -p</i><br/><br/>REPORT_WRITER.md<br/>Read results JSON<br/>Write per-protocol reports"]:::agent
MCP["<b>MCP Servers</b><br/><br/>PubMed · Datasource Registry<br/>RxNorm · Clinical Codes (LOINC / HCPCS / ICD-10)<br/>R Executor <i>(online mode only)</i>"]:::mcp
FILES[("<b>Shared output files</b><br/>literature_scan · evidence_gaps<br/>feasibility · protocols/*<br/>coordinator_log · agent_state")]:::files
COORD -->|spawns| WORKER
COORD -->|spawns| REVIEWER
COORD -->|spawns| REPORTER
WORKER -->|writes| FILES
REVIEWER -->|writes| FILES
REPORTER -->|writes| FILES
FILES -->|reads| COORD
WORKER -.->|tool calls| MCP
REVIEWER -.->|tool calls| MCP
No hardcoded state machine. The coordinator agent decides when work is good enough to advance, when it needs revision, and when to backtrack. It evaluates sub-agent output against objective acceptance criteria defined in COORDINATOR.md, but the judgment and routing are the agent's own.
Independent review. Every reviewer runs in a fresh Claude Code session with no access to the worker's reasoning -- only the output files. This prevents anchoring and enables genuine error detection.
The coordinator runs as a long-lived Claude Code session. It launches
sub-agents (workers, reviewers, and report writers) by calling claude -p
in bash, reads their output files, and decides what to do next.
Phase 0 -- Data Source Onboarding (if a database is configured)
Auto-generates schema dump and data profile if they are missing. In online
mode the coordinator uses dump_schema() and run_profiler() from the R
executor MCP server. In offline mode the schema dump and profile must
already exist on disk.
Phase 1 -- Literature Discovery Worker searches PubMed using a three-pass strategy (broad landscape, targeted per-question, citation chaining), extracts PICO questions, and identifies evidence gaps. Reviewer verifies PMIDs, runs independent searches, and stress-tests "no prior studies" claims.
Phase 2 -- Feasibility Assessment Worker checks whether the configured database (or public datasets) can support each question, using the data profile for realistic sample size estimates and variable availability. In online mode, workers can query the live database to validate feasibility assumptions.
Phase 3 -- Protocol Generation & Review Worker writes full target trial emulation protocols with runnable R analysis scripts. Reviewer checks methods, code correctness, database conventions compliance, and statistical pitfalls.
Phase 4 -- Execution & Reporting
In online mode, the coordinator runs R analysis scripts against the live
database via the R executor and collects structured results
(protocol_NN_results.json). In offline mode, it writes a NEXT_STEPS.md
file so the user can execute scripts on a secure machine and return results.
A report-writing agent then produces a per-protocol analysis report
(protocol_NN_report.md).
Phase 5 -- Executive Summary
The coordinator synthesizes all protocol reports into a final summary.md
covering key findings across the therapeutic area.
The coordinator logs every decision to coordinator_log.md and tracks
state in agent_state.json for transparency and debugging.
# Claude Code CLI
npm install -g @anthropic-ai/claude-code
# Python dependencies for MCP servers
pip install mcp httpx lxml pyyaml
# API key (if using the Anthropic API directly)
export ANTHROPIC_API_KEY="sk-ant-..."Using Claude Code with a subscription? If you have a Claude Pro or Max subscription, Claude Code works without an API key -- just run
claudeand authenticate with your Anthropic account. NoANTHROPIC_API_KEYneeded. Sub-agents launched viaclaude -pinherit the same subscription billing.
For online database mode (agents query a live database):
R (≥ 4.1) must be installed and available on
your PATH. Then open an R session and install the required packages:
# Core packages
install.packages(c("DBI", "jsonlite"))
# Plus the driver for your database engine:
install.packages("duckdb") # DuckDB
install.packages("odbc") # MS SQL Server, PostgreSQL via ODBCFor the synthetic test database (used for development and testing):
Run in R:
devtools::install_github("tjohnson250/PCORnet-CDM-Synthetic-DB")The simplest mode -- generates protocols targeting public datasets like MIMIC-IV and NHANES. No database connection required.
./run.sh "atrial fibrillation"Results appear in results/atrial_fibrillation/.
To run against NHANES (National Health and Nutrition Examination Survey)
data via the nhanesA R package:
# Install R packages (one-time)
Rscript -e 'install.packages(c("nhanesA", "duckdb", "survey", "DBI"))'
# Run
./run.sh "type 2 diabetes" --db-config databases/nhanes.yamlIn online mode, agents lazy-load NHANES tables from the CDC website into an
in-memory DuckDB database and can query them via SQL. Results appear in
results/type_2_diabetes/.
NHANES is cross-sectional, so the strongest target trial emulation designs
use the linked mortality file for prospective follow-up. See
databases/conventions/nhanes_conventions.md for full design constraints.
To run against MIMIC-IV (Medical Information Mart for Intensive Care) data from PhysioNet:
# Prerequisites: PhysioNet credentialed access (CITI training required)
# Download MIMIC-IV v3.1 CSV files from https://physionet.org/content/mimiciv/3.1/
# Install R packages (one-time)
Rscript -e 'install.packages(c("duckdb", "DBI"))'
# Load MIMIC-IV CSVs into DuckDB (one-time)
Rscript -e '
library(DBI); library(duckdb)
con <- dbConnect(duckdb::duckdb(), "databases/data/mimic_iv.duckdb")
for (f in list.files("path/to/mimiciv/3.1/hosp", pattern="\\.csv\\.gz$", full.names=TRUE)) {
tbl <- tools::file_path_sans_ext(tools::file_path_sans_ext(basename(f)))
dbExecute(con, sprintf("CREATE TABLE %s AS SELECT * FROM read_csv_auto(\"%s\")", tbl, f))
}
for (f in list.files("path/to/mimiciv/3.1/icu", pattern="\\.csv\\.gz$", full.names=TRUE)) {
tbl <- tools::file_path_sans_ext(tools::file_path_sans_ext(basename(f)))
dbExecute(con, sprintf("CREATE TABLE %s AS SELECT * FROM read_csv_auto(\"%s\")", tbl, f))
}
dbDisconnect(con)
'
# Run
./run.sh "sepsis" --db-config databases/mimic_iv.yamlMIMIC-IV provides timestamped treatments, labs, vitals, and outcomes for
~300,000 hospital admissions and ~65,000 ICU stays at Beth Israel
Deaconess Medical Center (2008-2022). It is ideal for ICU-focused target
trial emulations such as vasopressor timing, ventilation strategies, and
antibiotic initiation. See databases/conventions/mimic_iv_conventions.md
for date shifting rules, join key hierarchy, and recommended TTE designs.
To run against the bundled synthetic PCORnet database (requires R and the
PCORnetCDMSyntheticDB package from prerequisites above):
./run.sh "atrial fibrillation" --db-config databases/synthetic_pcornet.yamlThis runs in online mode -- agents query the synthetic DuckDB database,
validate cohort sizes, and execute analysis scripts end-to-end. Results
appear in results/atrial_fibrillation/.
To target a specific database, create a YAML config file in databases/
and pass it with --db-config.
id: "my_pcornet_cdw"
name: "My Institution's PCORnet CDW"
cdm: "pcornet"
cdm_version: "6.1"
engine: "mssql"
online: true
connection:
r_code: |
con <- DBI::dbConnect(odbc::odbc(), "MY_DSN")
schema_prefix: "CDW.dbo"
schema_dump: "databases/schemas/my_pcornet_cdw_schema.txt"
data_profile: "databases/profiles/my_pcornet_cdw_profile.md"
conventions: "databases/conventions/my_pcornet_cdw_conventions.md"| Field | Description |
|---|---|
id |
Unique identifier, used as a key by the datasource MCP server |
name |
Human-readable name shown in banners and logs |
cdm |
Common data model type (e.g., pcornet, omop) |
engine |
Database engine (duckdb, mssql, postgres, etc.) |
online |
Default connectivity mode; can be overridden with --db-mode |
connection.r_code |
R code that creates a DBI connection object named con |
schema_prefix |
Table qualifier (e.g., CDW.dbo, main) |
schema_dump |
Path to the schema dump file (auto-generated in online mode) |
data_profile |
Path to the data profile file (auto-generated in online mode) |
conventions |
Path to the database conventions markdown file |
# Online mode — agents query the live database
./run.sh "atrial fibrillation" --db-config databases/synthetic_pcornet.yaml
# Offline mode — agents work from schema dump + data profile
./run.sh "atrial fibrillation" --db-config databases/secure_pcornet_cdw.yaml --db-mode offline| Online | Offline | |
|---|---|---|
| When to use | Database is reachable from the machine running Claude Code | Database is behind a firewall or requires a secure environment |
| Agent capabilities | Query live DB, validate cohort sizes, run analysis scripts | Work from schema dump and data profile only |
| MCP tools | All tools + R executor (execute_r, query_db, list_tables, etc.) |
All tools except R executor |
| Schema/profile | Auto-generated during Phase 0 if missing | Must already exist on disk |
| Analysis execution | Coordinator runs R scripts via R executor, collects results JSON | User runs scripts on the secure machine and brings back results JSON |
Set online: true or online: false in the YAML config to choose the
default. Override at runtime with --db-mode online or --db-mode offline.
In online mode, the pipeline is fully automated:
- Phase 3 generates R analysis scripts (
protocol_NN_analysis.R). - Phase 4 executes them via the R executor MCP server.
- Each script saves structured results to
protocol_NN_results.json. - A report-writing agent reads the protocol spec + results JSON and
produces
protocol_NN_report.md.
When the database is not directly accessible:
- Phases 1--3 run normally, producing protocols and R analysis scripts.
- The coordinator writes
NEXT_STEPS.mdwith instructions for the user. - User action: copy the R scripts to the secure machine, execute them,
and copy the
protocol_NN_results.jsonfiles back. - Re-run with
--resume-reportsto generate reports from the results:
./run.sh "atrial fibrillation" \
--db-config databases/secure_pcornet_cdw.yaml \
--resume-reportsThis skips Phases 0--3 and goes straight to report generation and the executive summary.
-
Create a config file at
databases/my_database.yamlfollowing the format above. Setonline: trueif the database is reachable. -
Write conventions at
databases/conventions/my_database_conventions.md. Document engine-specific SQL dialect quirks, table naming patterns, known data quality issues, legacy data caveats, and any other requirements that agents must follow when writing queries. -
For offline mode, generate the schema dump and data profile manually and place them at the paths declared in the config. For PCORnet CDWs,
CDW_DB_Profiler.qmdcan generate these on a secure machine. -
For online mode, the coordinator auto-generates the schema dump and data profile during Phase 0 if the files do not yet exist.
-
Run it:
./run.sh "therapeutic area" --db-config databases/my_database.yaml./run.sh <therapeutic_area> [flags] [max_turns]
Arguments:
therapeutic_area Required. Clinical topic, e.g., "atrial fibrillation",
"type 2 diabetes", "sepsis".
Flags:
--db-config PATH Path to a database YAML config file.
--db-mode MODE Override connectivity mode: "online" or "offline".
--resume-reports Skip Phases 0-3. Generate reports from existing
protocol_NN_results.json files and produce the
executive summary.
Options:
max_turns Integer. Max turns per sub-agent (default 50).
Higher values allow more thorough but longer runs.
Examples:
./run.sh "atrial fibrillation"
./run.sh "atrial fibrillation" --db-config databases/synthetic_pcornet.yaml
./run.sh "atrial fibrillation" --db-config databases/secure_pcornet_cdw.yaml --db-mode offline
./run.sh "atrial fibrillation" --db-config databases/secure_pcornet_cdw.yaml --resume-reports
./run.sh "type 2 diabetes" --db-config databases/my_db.yaml 75
./run.sh "type 2 diabetes" --db-config databases/nhanes.yaml
./run.sh "sepsis" --db-config databases/mimic_iv.yaml
AutoTTE/
├── CLAUDE.md # Router — points agents to their instructions
├── COORDINATOR.md # Coordinator instructions + acceptance criteria
├── WORKER.md # Worker instructions + domain expertise
├── REVIEW.md # Reviewer instructions + verification protocol
├── REPORT_WRITER.md # Report-writing instructions
├── .mcp.json # MCP server configuration
├── run.sh # Launch script
├── CDW_DB_Profiler.qmd # Manual CDW profiler (for secure machines)
├── analysis_plan_template.R # R template for public datasets
├── analysis_plan_template_cdw.R # R template for DB-targeted protocols
├── databases/
│ ├── mimic_iv.yaml # Config: MIMIC-IV via DuckDB
│ ├── nhanes.yaml # Config: NHANES via nhanesA + DuckDB
│ ├── secure_pcornet_cdw.yaml # Config: institutional PCORnet CDW
│ ├── synthetic_pcornet.yaml # Config: synthetic DuckDB for testing
│ ├── schemas/ # Schema dumps (auto-generated or manual)
│ ├── profiles/ # Data profiles (auto-generated or manual)
│ ├── conventions/ # Per-database conventions (markdown)
│ └── data/ # Database files (gitignored)
├── tools/
│ ├── pubmed_server.py # MCP: PubMed search + abstract retrieval
│ ├── datasource_server.py # MCP: unified datasource registry
│ ├── r_executor_server.py # MCP: persistent R session + DB connection
│ ├── rxnorm_server.py # MCP: RxNorm drug code lookup
│ ├── clinical_codes_server.py # MCP: LOINC + HCPCS code lookup
│ └── stream_viewer.py # Streaming output formatter
├── tests/
│ ├── conftest.py # MCP module mock for testing
│ ├── test_datasource_server.py # Datasource registry tests
│ └── test_r_executor.py # R executor tests
└── results/ # Agent outputs (per therapeutic area)
└── <therapeutic_area>/
├── agent_state.json # Coordinator state
├── coordinator_log.md # Decision log
├── 01_literature_scan.md # Phase 1 output
├── 02_evidence_gaps.md # Phase 1 output (ranked PICO questions)
├── discovery_review.md # Phase 1 review
├── 03_feasibility.md # Phase 2 output
├── feasibility_review.md # Phase 2 review
├── summary.md # Executive summary
├── NEXT_STEPS.md # Offline mode: user instructions
└── protocols/
├── protocol_01.md # Protocol specification
├── protocol_01_analysis.R # R analysis script
├── protocol_01_results.json # Structured execution results
├── protocol_01_report.md # Per-protocol analysis report
├── protocol_review.md # Protocol reviews
└── ...
The system's behavior is controlled by four markdown files that serve as instructions for each agent role. Modify these to change how the system works:
- COORDINATOR.md -- Acceptance criteria for each phase, red flags that trigger revision, and guardrails (max revisions, max backtracks). This is where you tune quality thresholds.
- WORKER.md -- Domain expertise, literature search protocol (three-pass strategy), SQL dialect awareness, database conventions compliance, and known pitfalls. This is where you encode lessons learned.
- REVIEW.md -- Verification protocol, PMID checking procedure, search completeness verification, code review checklist, and conventions-based review. This is where you encode quality checks.
- REPORT_WRITER.md -- Report structure, accuracy rules, and citation handling. Controls how execution results are translated into publication-quality reports.
- Add databases: Create a new YAML config in
databases/with conventions indatabases/conventions/ - Add public datasets: Edit the datasource registry in
tools/datasource_server.py - Add MCP tools: New
@mcp.tool()functions in an existing or new server (register in.mcp.json) - Adjust acceptance criteria: Edit rubrics in
COORDINATOR.md - Adjust review rigor: Edit standards in
REVIEW.md - Add new therapeutic areas: Just run with a new area name; the system creates a new results subdirectory automatically
-
Agent-driven orchestration. The coordinator is an LLM, not a script. It can adapt to unexpected situations, make nuanced quality judgments, and route work based on content -- not just exit codes.
-
Independent review. Reviewers get fresh context. They cannot be anchored by the worker's reasoning or self-assessment.
-
Objective criteria with subjective judgment. COORDINATOR.md defines acceptance checklists, but the coordinator applies them with judgment -- the same way a PI reviews a postdoc's work.
-
Transparency. Every decision is logged. The coordinator_log.md and agent_state.json create a full audit trail of the run.
-
Graceful degradation. Guardrails (max revisions, max backtracks) prevent infinite loops, but they are guidelines for the coordinator's judgment, not hardcoded limits.
-
Database conventions as first-class concept. Every database carries a conventions file documenting its quirks -- legacy data caveats, SQL dialect differences, known data quality issues. Agents must read and apply conventions before writing any query or analysis code.
-
Lessons encoded, not just learned. When a run reveals a bug or pitfall, the fix is propagated to agent instruction files so future runs do not repeat it.