---
## Decision Card: ReAct Pattern (Quick Reference)

**Save this for future reference when deciding between agent complexity and static pipeline simplicity.**

### ‚úÖ BENEFIT
Enables multi-step reasoning queries requiring 2-5 tools, solving complex questions like "Compare our metrics to industry, calculate differences, and suggest strategies" that static pipelines cannot handle. Adds autonomous tool selection without manual orchestration.

### ‚ùå LIMITATION
Adds 3-10s P95 latency compared to 300ms static pipeline. Agent reasoning is probabilistic‚Äî10-15% tool selection errors even with GPT-4. Infinite loops and state corruption require careful guard rails and monitoring to prevent production issues.

### üí∞ COST
- **Time to implement:** 40-60 hours (including testing, monitoring, debugging)
- **Monthly cost at scale:** $1,500-15,000 for 100-1,000 req/hr (5-10x vs static)
- **Complexity:** +500 lines code, LangChain dependency, state management infrastructure required

### ü§î USE WHEN
- You have <10% complex queries requiring 2-5 tools
- Query volume <1,000/hr
- Can tolerate 3-10s latency
- Margin >$0.10/query supports agent cost
- Queries genuinely need reasoning not just retrieval
- Tools reliable >95% success rate

### üö´ AVOID WHEN
- 90%+ queries are simple retrieval (use static pipeline)
- Need <1s P95 latency (use workflows)
- Margin <$0.05/query or budget tight (use simpler alternatives)
- Building first production system (use LangGraph managed framework)
- Tools unreliable or have variable latency

---

## Next Steps

1. **Test the implementation** - Run the cells above with your API key
2. **Try the FastAPI server** - `python app.py` and test via `/query` endpoint
3. **Run smoke tests** - `python tests_smoke.py` to verify everything works
4. **Review common failures** - Study the 5 failure modes in the script
5. **Practice** - Try the PractaThon challenges (Easy, Medium, Hard)

### Resources
- **Module script:** `augmented_M10_VideoM10_1_ReAct_Pat.md`
- **Next module:** M10.2 - Building Custom Agent Tools & Integrations
- **Discord:** #practathon channel for questions and feedback

In [None]:
# Section 5: Decision Framework - When to Use Agents vs Static Pipeline

print("=== DECISION FRAMEWORK ===\n")

# Load failure scenarios from example data
with open('example_data.json', 'r') as f:
    data = json.load(f)

print("Failure Scenarios to Watch For:\n")
for scenario_name, scenario_data in data.get('failure_scenarios', {}).items():
    print(f"‚Ä¢ {scenario_name.replace('_', ' ').title()}")
    print(f"  Query: {scenario_data.get('query', 'N/A')}")
    if 'expected_behavior' in scenario_data:
        print(f"  Expected: {scenario_data['expected_behavior']}")
    print()

print("\n=== COST ANALYSIS (from script) ===\n")

scenarios = [
    ("Simple retrieval (90% traffic)", "$0.002", "Static pipeline", "300ms"),
    ("Multi-step reasoning (<10% traffic)", "$0.01-0.03", "ReAct agent", "3-10s"),
]

for scenario, cost, approach, latency in scenarios:
    print(f"{scenario}")
    print(f"  Approach: {approach}")
    print(f"  Cost per query: {cost}")
    print(f"  Latency: {latency}")
    print()

print("\n=== MONTHLY COST AT SCALE ===\n")
print("| Scale | Compute | LLM Calls | Tool Costs | Total |")
print("|-------|---------|-----------|------------|-------|")
print("| Small (100/hr)  | $50   | $1,400  | $40    | **$1,490**   |")
print("| Medium (1K/hr)  | $200  | $14,000 | $400   | **$14,600**  |")
print("| Large (10K/hr)  | $800  | $140,000| $4,000 | **$144,800** |")

print("\n\n=== KEY TAKEAWAY ===")
print("Use ReAct agents for <10% of complex queries.")
print("Keep static pipeline for 90%+ simple queries.")
print("This gives you the best of both: flexibility + efficiency.")

# Expected:
# - Decision framework clearly shows when to use agents vs static
# - Cost breakdown helps with business case
# - Failure scenarios warn about common pitfalls

---
## Section 5: Common Failures & Decision Framework

### The 5 Common Agent Failures (from Script)

| Failure | Symptom | Root Cause | Fix |
|---------|---------|------------|-----|
| **#1 Infinite Loop** | Agent repeats same action 3+ times | Tool returns unhelpful observation | Loop detection + better error messages |
| **#2 Wrong Tool** | Uses RAG_Search when should use Calculator | Unclear tool descriptions or weak reasoning | Query classification + GPT-4 |
| **#3 State Corruption** | Forgets previous conversation turns | No conversation memory | Session-based history |
| **#4 Parsing Failure** | OutputParserException | Tool returns structured data (dict), not text | Standardize all tool outputs to strings |
| **#5 No Stop Condition** | Keeps searching unnecessarily | Prompt doesn't emphasize efficiency | Add stopping criteria to prompt |

### Real Failure Example from Script

**Infinite Loop:**
```
Query: "What is the population of cities in California?"

Step 1 - Action: RAG_Search("California cities")
Step 1 - Observation: [No relevant documents found]

Step 2 - Action: RAG_Search("California cities")  # ‚Üê Same action!
Step 2 - Observation: [No relevant documents found]

Step 3 - Action: RAG_Search("California cities")  # ‚Üê Same action again!
... repeats until max_iterations reached
```

**Why this happens:** LLM doesn't understand that "No documents found" means "this tool won't help, try different approach."

### When NOT to Use ReAct Agents (Critical!)

‚ùå **Don't use when:**
- 90%+ queries are simple retrieval ‚Üí **use static pipeline**
- Need <1s latency ‚Üí **use workflows or caching**
- Margin <$0.05 per query ‚Üí **use cheaper alternatives**
- Document corpus <100 docs ‚Üí **use static RAG**
- Tools are unreliable (>10% failure rate) ‚Üí **use workflows with error handling**

‚úÖ **Do use when:**
- <10% queries but they're high-value (worth the cost)
- Queries genuinely require 2+ tools
- Can tolerate 3-10s latency
- Margin >$0.10 per query supports $0.02 agent cost
- Tools are reliable (>95% success rate)

In [None]:
# Section 4: Test Queries (if agent available)

import time

# Load example queries
with open('example_data.json', 'r') as f:
    examples = json.load(f)

test_queries = [
    ("Simple RAG", "What is our refund policy?"),
    ("Calculator", "What is 125000 * 1.15?"),
    ("Industry Data", "What is the SaaS industry growth rate?"),
]

print("=== Query Tests ===\n")

if agent is not None:
    for i, (query_type, query) in enumerate(test_queries, 1):
        print(f"[Query {i}] {query_type}")
        print(f"Question: {query}")
        
        start = time.time()
        result = agent.query(query)
        duration = time.time() - start
        
        print(f"Duration: {duration:.2f}s")
        print(f"Steps: {result['num_steps']}")
        print(f"Answer: {result['output'][:100]}...")
        if result['error']:
            print(f"Error: {result['error']}")
        print()
else:
    print("‚ö†Ô∏è  Skipping query tests (agent not initialized)")
    print("   This is expected if OPENAI_API_KEY is not set")
    print()
    print("   Example expected output:")
    print("   - Simple RAG: 1-2 steps, 2-4s")
    print("   - Calculator: 1-2 steps, 2-3s")
    print("   - Industry Data: 1-2 steps, 2-3s")

# Expected:
# - If agent available: Each query completes in 2-5s with 1-2 reasoning steps
# - If no agent: Graceful skip with example output description

---
## Section 4: Running Queries - Simple vs Complex

### Query Types and Expected Tool Usage

| Query Type | Example | Expected Tools | Steps |
|------------|---------|----------------|-------|
| **Simple RAG** | "What is our refund policy?" | RAG_Search | 1-2 |
| **Calculation** | "What is 125000 * 1.15?" | Calculator | 1-2 |
| **Benchmark** | "What is SaaS industry growth rate?" | Industry_Data | 1-2 |
| **Multi-step** | "Compare Q3 revenue to industry benchmark" | RAG_Search ‚Üí Industry_Data ‚Üí Calculator | 4-5 |

### What to Observe

When you run a query, watch for:

1. **Reasoning steps** - How many Thought ‚Üí Action ‚Üí Observation cycles?
2. **Tool selection** - Does it pick the right tool first?
3. **Stopping behavior** - Does it stop when it has enough info?
4. **Fallback handling** - If agent fails, does it gracefully fall back?

### Performance Expectations

- **Simple queries:** 2-4 seconds, 1-2 steps
- **Complex queries:** 5-10 seconds, 4-5 steps
- **Agent failures:** <10% with fallback to static pipeline

‚ö†Ô∏è **If API key is missing:** Queries will skip gracefully with a warning message.

In [None]:
# Section 3: Initialize ReAct Agent (if API key available)

import os

# Check if we can initialize the agent
has_api_key = bool(Config.OPENAI_API_KEY)

print("=== Agent Initialization ===\n")
print(f"OpenAI API Key configured: {has_api_key}")
print(f"Agent enabled in config: {Config.ENABLE_AGENT}")
print(f"Model: {Config.AGENT_MODEL}")
print(f"Max iterations: {Config.AGENT_MAX_ITERATIONS}")
print(f"Timeout: {Config.AGENT_TIMEOUT_SECONDS}s")
print()

if has_api_key:
    print("‚úì Initializing agent...")
    try:
        agent = StatefulReActAgent(
            model_name=Config.AGENT_MODEL,
            temperature=Config.AGENT_TEMPERATURE,
            max_iterations=Config.AGENT_MAX_ITERATIONS,
            timeout_seconds=Config.AGENT_TIMEOUT_SECONDS
        )
        print("‚úì Agent initialized successfully!")
        print(f"  Tools available: {len(agent.tools)}")
    except Exception as e:
        print(f"‚úó Agent initialization failed: {e}")
        agent = None
else:
    print("‚ö†Ô∏è  Skipping agent initialization (no API key)")
    print("   Set OPENAI_API_KEY in .env to enable agent")
    agent = None

# Expected:
# - If API key present: Agent initializes with 3 tools
# - If no API key: Graceful skip with warning message

---
## Section 3: The ReAct Loop - How Agents Think

### The Core Pattern

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ User Query  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  THOUGHT: "What do I need to do?"   ‚îÇ ‚Üê Reasoning
‚îÇ  Generated by LLM                   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  ACTION: Select and execute tool    ‚îÇ ‚Üê Acting
‚îÇ  (e.g., search_docs, calculate)     ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  OBSERVATION: Tool result           ‚îÇ ‚Üê Learning
‚îÇ  (e.g., "Found 3 docs about Q3")    ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
       ‚îÇ
       ‚ñº
    Repeat until Answer or Max Steps
```

### Key Components

1. **Thought Generation:** LLM decides what to do next based on query + observation history
2. **Action Selection:** Agent picks a tool from registry
3. **Tool Execution:** Run the selected tool with extracted parameters
4. **Observation Capture:** Store tool output for next reasoning step
5. **Stopping Condition:** Agent decides "I have enough info" or hits max iterations

### Why This Matters for Production

‚úÖ **Handles 10x more query complexity** - Multi-step reasoning queries that were impossible before  
‚úÖ **Reduces manual orchestration** - No need to code specific workflows for each query type  
‚úÖ **Provides reasoning transparency** - You can see the agent's thought process

### The Critical Trade-Off

**Common misconception:** "Agents are always better than static pipelines."

**Reality check:** 
- Agents add **3-10s latency** (vs 300ms static)
- Cost **5-10x more** ($0.01-0.03 vs $0.002 per query)
- Harder to debug (probabilistic reasoning)

**Use agents only when queries genuinely require multi-step reasoning or tool use.**

In [None]:
# Section 2: Demonstrate Tool Registry

# Get the tool registry
from l2_m10_react_pattern_implementation import get_tools, calculator_tool, industry_data_tool

tools = get_tools()

print("=== Tool Registry ===")
print(f"Number of tools: {len(tools)}\n")

for tool in tools:
    print(f"Tool: {tool.name}")
    print(f"Description: {tool.description[:80]}...")
    print()

# Test each tool independently
print("\n=== Tool Tests ===\n")

# Test 1: Calculator
print("[Test 1] Calculator Tool")
result = calculator_tool("125000 * 1.15")
print(f"Input: 125000 * 1.15")
print(f"Output: {result}")
print(f"Output type: {type(result).__name__}")  # Must be 'str'
print()

# Test 2: Industry Data
print("[Test 2] Industry Data Tool")
result = industry_data_tool("SaaS,growth_rate")
print(f"Input: SaaS,growth_rate")
print(f"Output: {result}")
print()

# Expected:
# - 3 tools registered (RAG_Search, Calculator, Industry_Data)
# - Calculator returns: "Calculation: 125000 * 1.15 = 143,750.00"
# - Industry_Data returns: "Industry benchmark for SaaS - growth_rate: 25-35% YoY"
# - All outputs are strings (not dicts/JSON)

---
## Section 2: Tool Registry - Building the Agent's Capabilities

### What Are Tools?

Tools are **functions** that the agent can call to accomplish tasks. Each tool:
- Has a **clear name** (e.g., `RAG_Search`, `Calculator`)
- Has a **description** explaining when to use it
- Takes **input** and returns **plain text output** (not JSON!)
- Must be **reliable** (>95% success rate in production)

### The Three Core Tools

1. **RAG_Search** - Wraps your Level 1 semantic search pipeline
2. **Calculator** - Safely evaluates mathematical expressions
3. **Industry_Data** - Fetches external benchmark data

### Critical Design Decision

Tools **must return plain text**, not structured data (dict/JSON). Why?
- The LLM needs to **read** the result as natural language
- Structured data causes **parsing failures** (Failure #4 in script)
- Plain text with interpretation is more reliable

### Tool Output Examples

‚ùå **BAD** (structured):
```python
{"result": 143750.0, "formatted": "$143,750.00"}
```

‚úÖ **GOOD** (plain text):
```python
"Calculation: 125000 * 1.15 = 143,750.00"
```

# Module 10.1: ReAct Pattern Implementation
## Agentic RAG with Thought ‚Üí Action ‚Üí Observation Reasoning Loop

**Based on:** augmented_M10_VideoM10_1_ReAct_Pat.md  
**Duration:** 42 minutes  
**Level:** 3 (requires Level 1 M1.4 and Level 2 completion)

---

## What You'll Learn

- Execute **Thought ‚Üí Action ‚Üí Observation cycles** for complex queries
- Build a **tool registry** with RAG search, calculation, and API capabilities
- Create **agent executors** that autonomously select and run appropriate tools
- Debug **5 common agent failures**: infinite loops, wrong tool selection, state corruption, parsing failures, missed stop conditions
- Recognize **when NOT to use** agentic patterns versus static pipelines

**Important:** We'll be brutally honest about when agentic RAG is overkill‚Äîbecause 90% of queries don't need it.

---
## Section 1: Introduction & Problem Statement

### The Challenge

Your Level 1 static RAG pipeline works beautifully for straightforward questions like "What is our refund policy?" But it **fails on complex queries** that require:

1. **Multiple information sources** (internal docs + external benchmarks)
2. **Calculations** (percentage differences, comparisons)
3. **Multi-step reasoning** (gather ‚Üí calculate ‚Üí synthesize)

Example query that breaks static pipelines:
> *"Compare our Q3 revenue to industry benchmarks, calculate the percentage difference, and suggest three growth strategies based on our current market position."*

### The ReAct Solution

The **ReAct pattern** (Reasoning and Acting) gives your RAG system the ability to:
- **Think** about what tools it needs
- **Act** by executing those tools in sequence
- **Observe** results and decide on next steps
- **Repeat** until the question is answered

### Real-World Analogy

Think of a detective solving a case:
1. **Thought:** "I need to check the suspect's alibi"
2. **Action:** Interview witnesses
3. **Observation:** "The alibi checks out, but there's a timeline gap"
4. **Thought:** "I should examine phone records for that time period"
5. **Action:** Request phone records
6. **Observation:** "Multiple calls to an unknown number"
...and so on until conclusion.

In [None]:
# Setup: Import required modules
import sys
import json
from pathlib import Path

# Add current directory to path
sys.path.insert(0, str(Path.cwd()))

# Import our implementation
from config import Config
from l2_m10_react_pattern_implementation import (
    get_tools,
    StatefulReActAgent
)

print("‚úì Imports successful")
print(f"\nConfiguration:")
for key, val in Config.get_info().items():
    print(f"  {key}: {val}")

# Expected: Module imports work, configuration displays