# Module 10.3: Multi-Agent Orchestration

**Level 3 - Building on ReAct pattern (M10.1) and Tool Calling (M10.2)**

## 1. Introduction: The Problem

Single agents struggle with complex multi-step tasks requiring simultaneous:
- **Research** (data gathering)
- **Strategy** (planning and analysis)
- **Validation** (quality control)

**Example scenario:** Competitor analysis requires:
1. Researching 3+ companies simultaneously
2. Identifying strategic patterns across findings
3. Validating completeness and accuracy

A single agent tries to do everything, leading to:
- Incomplete analysis (missed aspects)
- No independent quality control
- Difficulty tracking which sub-task failed

**Solution:** Multi-agent orchestration with specialized roles.

In [None]:
# Setup: Import core functionality
import sys
import json
from config import Config, logger
from l2_m10_multi_agent_orchestration import (
    run_multi_agent_query,
    should_use_multi_agent,
    AgentState
)

# Check configuration
print(f"Configured: {Config.is_configured()}")
print(f"Model: {Config.OPENAI_MODEL}")
print(f"Max iterations: {Config.MAX_ITERATIONS}")

# Expected: Configuration status and settings displayed

## 2. Prerequisites

**Required foundational knowledge:**
- Module 10.1: ReAct pattern (thought-action-observation loops)
- Module 10.2: Tool calling and structured outputs
- LangGraph library for orchestration

**Key concepts to understand:**
- **State management:** Shared state across agents
- **Message passing:** How agents communicate
- **Conditional routing:** Dynamic workflow paths based on outcomes

In [None]:
# Verify LangGraph installation
try:
    import langgraph
    from langgraph.graph import StateGraph, END
    print(f"✓ LangGraph version: {langgraph.__version__}")
except ImportError:
    print("⚠️ LangGraph not installed. Run: pip install langgraph")

# Load example queries
with open('example_data.json', 'r') as f:
    example_data = json.load(f)

print(f"\n✓ Loaded {len(example_data['queries'])} example queries")
print(f"✓ {len(example_data['common_failures'])} common failure scenarios documented")

# Expected: LangGraph version and example data summary

## 3. Theory: Role-Based Agent Teams

### The Surgical Team Analogy

Like a surgical team where each member has a specific role:
- **Surgeon** (Executor): Performs the operation
- **Surgical Planner** (Planner): Plans the procedure steps
- **Quality Control** (Validator): Monitors and validates outcomes

### Three-Agent Architecture

**1. Planner Agent**
- Role: Strategy and task decomposition
- Input: Complex query
- Output: Structured plan with 3-5 sub-tasks
- Does NOT execute tasks

**2. Executor Agent**
- Role: Task completion
- Input: Individual sub-tasks from plan
- Output: Completed results for each task
- Does NOT plan or validate

**3. Validator Agent**
- Role: Quality control and completeness checking
- Input: Executor results
- Output: Approval/rejection with actionable feedback
- Does NOT execute or plan

### Message Passing Protocol

```
User Query → Planner → Executor → Validator → [Approved: END | Rejected: Planner]
```

**Key insight:** Clear role separation prevents role confusion where agents override their designated responsibilities.

In [None]:
# Visualize the agent workflow
print("Three-Agent Workflow:")
print("=" * 50)
print("┌─────────┐")
print("│  Query  │")
print("└────┬────┘")
print("     │")
print("     ▼")
print("┌─────────┐     ┌──────────┐     ┌───────────┐")
print("│ Planner ├────→│ Executor ├────→│ Validator │")
print("└─────────┘     └──────────┘     └─────┬─────┘")
print("     ▲                                  │")
print("     │                                  │")
print("     │        Rejected (feedback)       │")
print("     └──────────────────────────────────┘")
print("                                        │")
print("                   Approved             │")
print("                                        ▼")
print("                                   ┌────────┐")
print("                                   │  END   │")
print("                                   └────────┘")
print("\n# Expected: ASCII workflow diagram displayed")

## 4. Hands-On Implementation

### Step 1: State Schema with TypedDict

The centralized state schema ensures type safety and clear data flow between agents.

**Key fields:**
- `query`: Original user query
- `plan`: List of sub-tasks from Planner
- `results`: Cumulative results from Executor
- `validation_status`: Validator's decision (approved/rejected/pending)
- `validation_feedback`: Specific actionable feedback
- `iterations`: Loop counter to prevent infinite loops
- `total_cost`: Running cost estimate
- `messages`: Conversation log for debugging

In [None]:
# Inspect the AgentState schema
from typing import get_type_hints

print("AgentState Schema:")
print("=" * 50)

# Show state structure
sample_state: AgentState = {
    'query': 'Example query',
    'plan': [],
    'results': [],
    'validation_status': 'pending',
    'validation_feedback': '',
    'iterations': 0,
    'current_step': 0,
    'total_cost': 0.0,
    'start_time': 0.0,
    'messages': []
}

for key, value in sample_state.items():
    print(f"  {key:20s}: {type(value).__name__}")

print("\n# Expected: State schema fields and types listed")

### Step 2: Planner Agent - Task Decomposition

The Planner agent breaks down complex queries into actionable sub-tasks.

**Explicit role constraints prevent role confusion:**
- DO: Break queries into 3-5 concrete sub-tasks
- DON'T: Execute tasks or provide answers

**Output format:** Structured JSON with numbered steps

In [None]:
# Demonstrate adaptive routing (before running multi-agent)
test_queries = [
    "What is our return policy?",  # Simple - should NOT use multi-agent
    "Analyze top 3 competitors and create strategy report"  # Complex - should use multi-agent
]

print("Routing Recommendations:")
print("=" * 70)

for query in test_queries:
    routing = should_use_multi_agent(query)
    print(f"\nQuery: {query[:50]}...")
    print(f"  Recommendation: {routing['recommendation']}")
    print(f"  Reason: {routing['reason']}")
    if 'warning' in routing:
        print(f"  ⚠️  {routing['warning']}")

print("\n# Expected: Routing shows simple query → single-agent, complex → multi-agent")

### Step 3: Complete Multi-Agent Workflow

Execute a complex query through the full Planner → Executor → Validator pipeline.

**Note:** If API keys are not configured, this will skip gracefully.

In [None]:
# Run a complex query (if API keys configured)
if Config.is_configured():
    query = "Analyze emerging trends in AI and identify top 3 opportunities"
    
    print(f"Running query: {query}")
    print("=" * 70)
    
    result = run_multi_agent_query(query)
    
    if result['success']:
        print(f"\n✅ Status: {result['validation_status']}")
        print(f"📝 Plan steps: {len(result['plan'])}")
        print(f"📊 Results: {len(result['results'])} items")
        print(f"⏱️  Time: {result['metadata']['total_time_seconds']}s")
        print(f"💰 Cost: ${result['metadata']['estimated_cost_usd']}")
        print(f"🔄 Iterations: {result['metadata']['iterations']}")
        
        # Show plan (truncated)
        print(f"\nPlan preview:")
        for i, step in enumerate(result['plan'][:2], 1):
            print(f"  {i}. {step.get('task', 'N/A')[:60]}...")
    else:
        print(f"❌ Error: {result.get('error')}")
else:
    print("⚠️ Skipping multi-agent execution (no API keys configured)")
    print("   To run: Set OPENAI_API_KEY in .env file")

# Expected: Query executed or gracefully skipped with cost/time metrics

## 5. Reality Check: Trade-Offs & Limitations

### Accepted Trade-Offs

**Latency multiplier:** Minimum 9-15 seconds base for three sequential LLM calls vs 3-5 seconds for single-agent.
- Measured: 47 seconds (multi-agent) vs 8 seconds (single-agent) for competitor analysis
- **2-5x slower** performance is expected and unavoidable

**Cost multiplication:** 3x API costs due to three agent calls
- Multi-agent: ~$0.045 per query
- Single-agent: ~$0.015 per query
- At 1,000 requests/hour: **$32,600/month** for multi-agent vs $10,800 for single-agent

**Code complexity:** 400+ lines of orchestration code vs 50 lines for single-agent

### Critical Limitations

1. **Quality doesn't automatically improve** - Weak agents compound errors
2. **No built-in disagreement resolution** - Validator rejection without specific guidance loops endlessly
3. **Debugging is substantially harder** - Multi-agent state transitions obscure failure points

### When This Breaks

**Communication deadlock:** Missing conditional edges cause system hangs
**Role confusion:** Executor starts planning when prompts lack explicit constraints
**Validation loops:** Non-specific feedback prevents improvement

In [None]:
# Compare costs: Single-agent vs Multi-agent
print("Cost Comparison at Scale:")
print("=" * 70)

volumes = [100, 1000, 10000]  # requests/hour
single_cost = 0.015
multi_cost = 0.045

for vol in volumes:
    monthly_hours = 730  # ~30 days
    single_monthly = vol * monthly_hours * single_cost
    multi_monthly = vol * monthly_hours * multi_cost
    
    print(f"\n{vol:,} requests/hour:")
    print(f"  Single-agent: ${single_monthly:,.0f}/month")
    print(f"  Multi-agent:  ${multi_monthly:,.0f}/month")
    print(f"  Difference:   ${multi_monthly - single_monthly:,.0f} (3x cost)")

# Expected: Cost comparison showing 3x multiplication

## 6. Common Production Failures & Fixes

### Five Critical Failure Scenarios

#### 1. Communication Deadlock
**Symptom:** System hangs indefinitely
**Cause:** Missing conditional edges in routing logic
**Fix:** Validate all routing values and explicit edge definition

#### 2. Role Confusion
**Symptom:** Executor begins planning sub-tasks instead of executing
**Cause:** Prompts lack explicit role constraints
**Fix:** Use DO/DON'T lists and specific role statement prefixes

#### 3. Coordination Overhead
**Symptom:** 12x latency compared to single-agent
**Cause:** Sequential execution creates bottlenecks
**Fixes:**
- Cache planner outputs for similar queries
- Parallelize independent executor tasks
- Adaptive routing (simple queries bypass multi-agent)

#### 4. Validation Loops
**Symptom:** Endless iterations without improvement
**Cause:** Non-specific feedback like "not enough information"
**Fix:** Require actionable feedback with specific gaps and suggested additional tasks

#### 5. Multi-Agent Overkill
**Symptom:** System applies multi-agent to simple queries
**Cause:** No complexity routing
**Fix:** Adaptive routing based on complexity heuristics

In [None]:
# Display common failures from example data
print("Common Production Failures:")
print("=" * 70)

for failure in example_data['common_failures']:
    print(f"\n❌ {failure['failure'].replace('_', ' ').title()}")
    print(f"   Trigger: {failure['trigger']}")
    print(f"   Symptom: {failure['symptom']}")

# Expected: List of 4-5 common failure patterns with descriptions

## 7. Decision Card: When to Use Multi-Agent Orchestration

### ✅ Use When:

- **Multi-step analytical queries** where quality justifies 3x cost
- **>30 second latency tolerance** (anything faster requires single-agent)
- **>$10K/month AI budget** (multi-agent is expensive)
- **>20% A/B-tested quality improvement** vs single-agent confirmed
- **Independent validation required** (regulatory, high-stakes decisions)

### ❌ Avoid When:

#### 1. Simple Queries
Factual lookup ("What's our return policy?") doesn't benefit from planning-execution separation. Wastes resources on unnecessary coordination.

#### 2. Real-Time Applications
<5 second requirements impossible with sequential multi-agent design. Minimum 9-15 seconds due to inherent coordination latency.

#### 3. Low-Budget Projects
<$500/month budgets consumed by 3x LLM costs and orchestration overhead. Single-agent better allocates resources.

#### 4. Deterministic Workflows
Fixed step processes (extract → classify → tag → store) should use scripted pipelines, not agents adding unnecessary complexity.

#### 5. High-Compliance Environments
Healthcare/finance/government require explainable decisions. Multi-agent emergent behavior defeats audit trail requirements.

### Costs & Benefits

**Benefits:**
- 15-30% quality improvement for complex analytical queries
- Task decomposition enhances reasoning transparency
- Specialized roles improve debuggability

**Costs:**
- Implementation: 2-3 days
- Monthly at 1,000 queries/hour: $32,600 (LLM + infrastructure)
- Maintenance: 30% increased engineering overhead
- 2-5x latency (30-60s vs 8-12s single-agent)

In [None]:
# Decision matrix visualization
print("Decision Matrix: Single-Agent vs Multi-Agent")
print("=" * 70)
print(f"\n{'Criteria':<30} {'Single-Agent':<20} {'Multi-Agent':<20}")
print("-" * 70)
print(f"{'Latency':<30} {'3-5 seconds':<20} {'30-60 seconds':<20}")
print(f"{'Cost per query':<30} {'$0.015':<20} {'$0.045':<20}")
print(f"{'Code complexity':<30} {'~50 lines':<20} {'400+ lines':<20}")
print(f"{'Quality (complex queries)':<30} {'Baseline':<20} {'+15-30%':<20}")
print(f"{'Debugging difficulty':<30} {'Easy':<20} {'Hard':<20}")
print(f"{'Best for':<30} {'90% of queries':<20} {'10% complex':<20}")

print("\n📊 Recommendation: Use adaptive routing - single-agent by default,")
print("   multi-agent only for complex analytical tasks.")

# Expected: Decision matrix comparing both approaches

## 8. Alternative Solutions

### Four Distinct Approaches:

**1. Single-Agent with Structured Output**
- 3x faster, 3x cheaper, 90% simpler to debug
- Best for: 90% of use cases
- Limitation: Lacks independent validation

**2. LangChain PlanAndExecute**
- Pre-built orchestration
- Less control, ~$0.030 per query cost
- Best for: Quick prototypes

**3. Human-in-the-Loop Validation**
- Highest quality
- Unscalable (human bottleneck)
- Minutes/hours latency
- Best for: High-stakes, low-volume

**4. Parallel Multi-Agent**
- 8-15 second latency (vs 30-60 sequential)
- Requires independent sub-tasks
- Best for: Complex queries with parallelizable work

In [None]:
# Alternative approaches comparison
alternatives = [
    {"approach": "Single-Agent", "latency": "3-5s", "cost": "$0.015", "quality": "Baseline", "use_case": "90% of queries"},
    {"approach": "Sequential Multi-Agent", "latency": "30-60s", "cost": "$0.045", "quality": "+15-30%", "use_case": "Complex analytical"},
    {"approach": "Parallel Multi-Agent", "latency": "8-15s", "cost": "$0.045", "quality": "+15-30%", "use_case": "Independent sub-tasks"},
    {"approach": "Human-in-Loop", "latency": "min-hours", "cost": "High", "quality": "Highest", "use_case": "High-stakes decisions"}
]

print("Alternative Approaches:")
print("=" * 90)
print(f"{'Approach':<25} {'Latency':<12} {'Cost':<10} {'Quality':<12} {'Best For':<30}")
print("-" * 90)

for alt in alternatives:
    print(f"{alt['approach']:<25} {alt['latency']:<12} {alt['cost']:<10} {alt['quality']:<12} {alt['use_case']:<30}")

# Expected: Comparison table of alternative approaches

## 9. Key Takeaways & Next Steps

### What You've Learned

✅ Built a three-agent orchestration system (Planner, Executor, Validator)
✅ Implemented structured message passing with LangGraph
✅ Understood conditional routing and state management
✅ Identified when multi-agent complexity is justified (rarely!)
✅ Learned common failure patterns and prevention strategies

### Critical Insights

**Multi-agent is expensive:** 3x cost, 2-5x latency - only justified when quality improvement measured >20% via A/B testing

**Adaptive routing is essential:** 90% of queries better served by single-agent. Use complexity heuristics to route appropriately.

**Role constraints prevent confusion:** Explicit DO/DON'T lists in prompts keep agents focused on their designated tasks.

**Validation needs specificity:** Generic feedback creates loops. Require actionable, specific gap identification.

### Production Checklist

Before deploying:
- [ ] Implement adaptive routing (simple → single-agent)
- [ ] Set up monitoring (P95 latency per agent, validation rejection rate)
- [ ] Configure cost alerts (>$0.10 per query)
- [ ] Add iteration limits (max 3) to prevent runaway loops
- [ ] Test all routing edge cases (missing values, parsing failures)
- [ ] Cache planner outputs for similar queries
- [ ] A/B test vs single-agent to confirm quality improvement >20%

### Next Module

**Module 11:** Advanced RAG patterns for production scale

---

**Remember:** Multi-agent orchestration is a powerful tool, but like any power tool, it's easy to misuse. Start with single-agent, measure quality gaps, and only introduce multi-agent complexity when A/B testing proves it's worth the 3x cost and 2-5x latency trade-off.