# Adaptive RAG with Microsoft Agent Framework

----

Adaptive RAG predicts the **complexity of the input question** using a SLM/LLM and selects an appropriate processing workflow accordingly.

- **Very simple question (No Retrieval)**: Generates answers without RAG.
- **Simple question (Single-shot RAG)**: Efficiently generates answers through a single-step search and generation.
- **Complex question (Iterative RAG)**: Provides accurate answers to complex questions through repeated multi-step search and generation.

This notebook demonstrates how to implement Adaptive RAG using **Microsoft Agent Framework** with:
- **Workflow-based orchestration** for multi-step processing
- **Reflection pattern** for quality evaluation and retry logic
- **Azure AI Search** for document retrieval
- **Azure Evaluation SDK** for groundedness and relevance checking

**Reference**

- [Adaptive-RAG paper](https://arxiv.org/abs/2403.14403)
- [Microsoft Agent Framework Documentation](https://learn.microsoft.com/en-us/agent-framework/)

## Setup
---

Install required packages and import libraries.

In [None]:
# Install Microsoft Agent Framework and required packages
# %pip install agent-framework azure-search-documents azure-ai-evaluation azure-identity python-dotenv pydantic

In [None]:
import asyncio
import json
import os
import sys
from dataclasses import dataclass
from enum import Enum
from typing import Annotated, Any
from uuid import uuid4

from dotenv import load_dotenv
from pydantic import BaseModel, Field

# Microsoft Agent Framework
from agent_framework import (
    AgentRunResponseUpdate,
    AgentRunUpdateEvent,
    ChatMessage,
    Executor,
    Role,
    WorkflowBuilder,
    WorkflowContext,
    handler,
)
from agent_framework.azure import AzureOpenAIChatClient

# Azure services
from azure.core.credentials import AzureKeyCredential
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery

# Azure AI Evaluation
from azure.ai.evaluation import (
    GroundednessEvaluator,
    RelevanceEvaluator,
    RetrievalEvaluator,
)

# Add parent directory to path for utils
parent_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.getcwd())))
if parent_dir not in sys.path:
    sys.path.append(parent_dir)

from utils.search_utils import web_search

load_dotenv(override=True)
print("✓ Libraries imported successfully")

✓ Libraries imported successfully


### Configure Environment Variables

In [2]:
# Azure OpenAI configuration
azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
azure_openai_key = os.getenv("AZURE_OPENAI_API_KEY")
azure_openai_chat_deployment = os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME")
azure_openai_api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2024-08-01-preview")
azure_openai_embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME", "text-embedding-ada-002")

# Azure AI Search configuration
azure_ai_search_endpoint = os.getenv("AZURE_AI_SEARCH_ENDPOINT")
azure_search_admin_key = os.getenv("AZURE_AI_SEARCH_API_KEY", "")
index_name = os.getenv("AZURE_SEARCH_INDEX_NAME", "hotels-sample-index")

# Model configuration for evaluators
model_config = {
    "azure_endpoint": azure_openai_endpoint,
    "api_key": azure_openai_key,
    "azure_deployment": azure_openai_chat_deployment,
    "api_version": azure_openai_api_version,
    "type": "azure_openai",
}

print("✓ Environment variables configured")
print(f"  - Azure OpenAI Endpoint: {azure_openai_endpoint}")
print(f"  - Chat Deployment: {azure_openai_chat_deployment}")
print(f"  - Search Index: {index_name}")

✓ Environment variables configured
  - Azure OpenAI Endpoint: https://hyo-ai-foundry-pjt1-resource.openai.azure.com/
  - Chat Deployment: gpt-4.1-mini
  - Search Index: hotels-sample-index


## 🧪 Step 1: Test and Construct Each Module
---

Before building the complete workflow, we'll test each component individually:

1. **Intent Router**: Classifies query complexity (LLM, RAG, or WebSearch)
2. **Retrieval**: Searches Azure AI Search for relevant documents
3. **Retrieval Grader**: Evaluates relevance of retrieved documents
4. **Question Rewriter**: Optimizes query for better retrieval
5. **Answer Generator**: Generates response based on context
6. **Groundedness Evaluator**: Checks for hallucinations
7. **Relevance Evaluator**: Validates answer relevance
8. **Web Search**: Fetches external information when needed

### 1.1 Define Data Models

In [3]:
# Intent types for routing
class IntentType(str, Enum):
    LLM = "LLM"  # Simple conversation, no retrieval needed
    RAG = "RAG"  # Requires document retrieval
    WEBSEARCH = "websearch"  # Requires web search

class IntentResponse(BaseModel):
    intent_type: IntentType = Field(..., description="Detected intent type")
    reasoning: str = Field(..., description="Brief explanation of the classification")

# Request/Response structures for workflow
@dataclass
class ProcessingRequest:
    """Request passed through the workflow."""
    request_id: str
    query: str
    intent: str = ""
    context: str = ""
    response: str = ""
    retrieval_score: float = 0.0
    groundedness_score: float = 0.0
    relevance_score: float = 0.0

@dataclass
class ReviewResponse:
    """Review result from evaluator."""
    request_id: str
    approved: bool
    feedback: str
    groundedness_score: float = 0.0
    relevance_score: float = 0.0

print("✓ Data models defined")

✓ Data models defined


### 1.2 Test Intent Router

The Intent Router analyzes the query and determines the processing path.

In [4]:
# Create chat client for testing
chat_client = AzureOpenAIChatClient(
    model_id=azure_openai_chat_deployment,
    endpoint=azure_openai_endpoint,
    api_key=azure_openai_key,
    api_version=azure_openai_api_version
)

async def test_intent_router(query: str) -> IntentResponse:
    """Test intent classification."""
    
    prompt = f"""You are an intelligent intent classifier. Classify the query into one of:
    
- LLM: Casual conversation, greetings, general knowledge (no current info needed)
- RAG: Hotel-related questions (data available until Aug 2024)
- websearch: Recent events, news, or topics after Aug 2024

Query: {query}

Respond in JSON format with 'intent_type' and 'reasoning'."""
    
    messages = [ChatMessage(role=Role.USER, text=prompt)]
    
    response = await chat_client.get_response(
        messages=messages,
        response_format=IntentResponse
    )
    
    result = IntentResponse.model_validate_json(response.messages[-1].text)
    return result

# Test with different queries
test_queries = [
    "Hello, how are you?",
    "Can you recommend hotels with complimentary breakfast?",
    "What are the latest hotel openings in NYC in 2025?"
]

print("Testing Intent Router:")
print("=" * 80)
for query in test_queries:
    result = await test_intent_router(query)
    print(f"\nQuery: {query}")
    print(f"Intent: {result.intent_type}")
    print(f"Reasoning: {result.reasoning}")
    print("-" * 80)

Testing Intent Router:

Query: Hello, how are you?
Intent: IntentType.LLM
Reasoning: The query is a casual greeting, which fits the LLM category for casual conversation and greetings.
--------------------------------------------------------------------------------

Query: Hello, how are you?
Intent: IntentType.LLM
Reasoning: The query is a casual greeting, which fits the LLM category for casual conversation and greetings.
--------------------------------------------------------------------------------

Query: Can you recommend hotels with complimentary breakfast?
Intent: IntentType.RAG
Reasoning: The query asks for hotel recommendations with a specific amenity (complimentary breakfast), which relates to hotel information available until August 2024, fitting the RAG category.
--------------------------------------------------------------------------------

Query: Can you recommend hotels with complimentary breakfast?
Intent: IntentType.RAG
Reasoning: The query asks for hotel recommendat

### 1.3 Test Retrieval from Azure AI Search

In [5]:
# Create search client
search_client = SearchClient(
    endpoint=azure_ai_search_endpoint,
    index_name=index_name,
    credential=AzureKeyCredential(azure_search_admin_key),
)

async def test_retrieval(query: str, top_k: int = 3) -> str:
    """Test document retrieval."""
    
    # Vector search query
    vector_query = VectorizableTextQuery(
        text=query,
        k_nearest_neighbors=top_k,
        fields="descriptionVector",
        exhaustive=True
    )
    
    # Search
    search_results = search_client.search(
        search_text=query,
        vector_queries=[vector_query],
        select="Description,HotelName,Tags",
        top=top_k,
    )
    
    # Format results
    sources_formatted = "\n".join([
        f'{document["HotelName"]}: {document["Description"]} (Tags: {", ".join(document["Tags"])})'
        for document in search_results
    ])
    
    return sources_formatted

# Test retrieval
test_query = "Can you recommend hotels with complimentary breakfast?"
print(f"Testing Retrieval for: '{test_query}'")
print("=" * 80)
retrieved_context = await test_retrieval(test_query)
print("\nRetrieved Documents:")
print(retrieved_context)

Testing Retrieval for: 'Can you recommend hotels with complimentary breakfast?'

Retrieved Documents:
Friendly Motor Inn: Close to historic sites, local attractions, and urban parks. Free Shuttle to the airport and casinos. Free breakfast and WiFi. (Tags: 24-hour front desk service, continental breakfast, free wifi)
Thunderbird Motel: Book Now & Save. Clean, Comfortable rooms at the lowest price. Enjoy complimentary coffee and tea in common areas. (Tags: coffee in lobby, free parking, free wifi)
Lion's Den Inn: Full breakfast buffet for 2 for only $1. Excited to show off our room upgrades, faster high speed WiFi, updated corridors & meeting space. Come relax and enjoy your stay. (Tags: laundry service, free wifi, restaurant)

Retrieved Documents:
Friendly Motor Inn: Close to historic sites, local attractions, and urban parks. Free Shuttle to the airport and casinos. Free breakfast and WiFi. (Tags: 24-hour front desk service, continental breakfast, free wifi)
Thunderbird Motel: Book Now

### 1.4 Test Retrieval Grader

Evaluates if retrieved documents are relevant to the query.

In [6]:
# Create retrieval evaluator
retrieval_eval = RetrievalEvaluator(model_config)

# Test retrieval quality
query_response = {
    "query": test_query,
    "context": retrieved_context
}

retrieval_score = retrieval_eval(**query_response)
print("\nRetrieval Evaluation:")
print(f"Score: {retrieval_score['retrieval']}")
print(f"Threshold: 3.0 (scores >= 3.0 are acceptable)")
print(f"Result: {'✓ PASS' if retrieval_score['retrieval'] >= 3.0 else '✗ FAIL - needs rewrite'}")


Retrieval Evaluation:
Score: 5.0
Threshold: 3.0 (scores >= 3.0 are acceptable)
Result: ✓ PASS


### 1.5 Test Question Rewriter

In [7]:
async def test_question_rewriter(query: str) -> str:
    """Test query rewriting for better retrieval."""
    
    prompt = f"""You are a question re-writer that optimizes queries for vector search.
Look at the input and reason about the underlying semantic intent based on hotel domain.
Make the query more specific and searchable.

Original Query: {query}

Rewritten Query:"""
    
    messages = [ChatMessage(role=Role.USER, text=prompt)]
    response = await chat_client.get_response(messages=messages)
    
    return response.messages[-1].text.strip()

# Test with a poorly worded query
poor_query = "Can you recommend a few factories with complimentary breakfast?"
print(f"Original Query: {poor_query}")
rewritten = await test_question_rewriter(poor_query)
print(f"Rewritten Query: {rewritten}")

Original Query: Can you recommend a few factories with complimentary breakfast?
Rewritten Query: Can you recommend hotels that offer complimentary breakfast?
Rewritten Query: Can you recommend hotels that offer complimentary breakfast?


### 1.6 Test Answer Generator

In [8]:
async def test_answer_generator(query: str, context: str, intent: str = "RAG") -> str:
    """Test answer generation."""
    
    if intent == "LLM":
        prompt = f"""You are a kind and helpful assistant. Answer in a friendly, warm manner.
Use emojis appropriately (1-3 per response). Be clear and concise.

Query: {query}

Answer:"""
    else:  # RAG or websearch
        prompt = f"""Answer using ONLY the context below. Be friendly and concise.
If there isn't enough information, say you don't know.

Query: {query}

Context:
{context}

Answer:"""
    
    messages = [ChatMessage(role=Role.USER, text=prompt)]
    response = await chat_client.get_response(messages=messages)
    
    return response.messages[-1].text.strip()

# Test answer generation
answer = await test_answer_generator(test_query, retrieved_context, "RAG")
print(f"\nGenerated Answer:")
print(answer)


Generated Answer:
I recommend Friendly Motor Inn—they offer free breakfast along with free WiFi and a shuttle service. Thunderbird Motel has complimentary coffee and tea but not a full breakfast. Lion's Den Inn has a breakfast buffet for $1, so not completely free.


### 1.7 Test Groundedness and Relevance Evaluators

In [9]:
# Create evaluators
groundedness_eval = GroundednessEvaluator(model_config)
relevance_eval = RelevanceEvaluator(model_config)

# Test groundedness (checks for hallucinations)
groundedness_input = {
    "query": test_query,
    "context": retrieved_context,
    "response": answer
}
groundedness_score = groundedness_eval(**groundedness_input)

print("\nGroundedness Evaluation (Hallucination Check):")
print(f"Score: {groundedness_score['groundedness']}")
print(f"Threshold: 3.0")
print(f"Result: {'✓ No hallucinations detected' if groundedness_score['groundedness'] >= 3.0 else '✗ Possible hallucination'}")

# Test relevance
relevance_input = {
    "query": test_query,
    "response": answer
}
relevance_score = relevance_eval(**relevance_input)

print("\nRelevance Evaluation:")
print(f"Score: {relevance_score['relevance']}")
print(f"Threshold: 3.0")
print(f"Result: {'✓ Answer is relevant' if relevance_score['relevance'] >= 3.0 else '✗ Answer not relevant'}")


Groundedness Evaluation (Hallucination Check):
Score: 5.0
Threshold: 3.0
Result: ✓ No hallucinations detected

Relevance Evaluation:
Score: 4.0
Threshold: 3.0
Result: ✓ Answer is relevant

Relevance Evaluation:
Score: 4.0
Threshold: 3.0
Result: ✓ Answer is relevant


### 1.8 Test Web Search

In [10]:
# Test web search
web_query = "Latest hotel openings in NYC 2025"
print(f"Testing Web Search for: '{web_query}'")
print("=" * 80)

web_results = await web_search(
    query=web_query,
    max_result=3,
    web_search_mode="bing"
)

print("\nWeb Search Results:")
if isinstance(web_results, list):
    for i, result in enumerate(web_results, 1):
        print(f"\n{i}. {result}")
else:
    print(web_results)

Testing Web Search for: 'Latest hotel openings in NYC 2025'

Web Search Results:

1. Here are some of the latest hotel openings in New York City for 2025, along with new entries that have recently opened:

### Newly Opened Hotels in 2025

1. **The Wall Street Hotel by Suiteness**
   - **Opened:** June 2025
   - **Location:** Near the National September 11 Memorial & Museum
   - **Features:** Luxury accommodations with a terrace, restaurant, bar, and free Wi-Fi. Notable nearby landmarks include One World Observatory and Brooklyn Bridge.
   - **Rating:** 5 stars【3:0†source】.

2. **Faena New York**
   - **Expected Opening:** Spring 2025
   - **Location:** One High Line’s East Tower
   - **Features:** This hotel will feature 120 rooms, a spa, a restaurant by a celebrity chef, and a live entertainment venue, offering views of the High Line and Hudson River【3:1†source】.

3. **Kimpton Midtown NYC**
   - **Expected Opening:** Late 2025
   - **Location:** Rockefeller Center
   - **Features:** T

## 🧪 Step 2: Build the Adaptive RAG Workflow
---

Now we'll build the complete workflow using Microsoft Agent Framework's Workflow API.
The workflow implements a **reflection pattern** where:

1. **Worker** generates responses
2. **Reviewer** evaluates quality
3. If not approved, **Worker** regenerates with feedback
4. Only approved responses are returned to the user

### Workflow Architecture

```
User Query
    |
    v
Intent Router
    |
    +---> LLM Path -------> Generate Answer ---+
    |                                           |
    +---> RAG Path -------> Retrieve Docs      |
    |                           |               |
    |                       Grade Quality       |
    |                           |               |
    |                       [Good/Bad]          |
    |                           |               |
    |                       Good: Generate      |
    |                       Bad: Rewrite Query  |
    |                                           |
    +---> WebSearch -----> Fetch Web Data ---> |
                                                |
                                                v
                                        Evaluate Quality
                                                |
                                        [Pass/Fail]
                                                |
                                        Pass: Return Answer
                                        Fail: Retry with Feedback
```

### 2.1 Define Workflow Executors

We'll create two main executors:
- **AdaptiveRAGWorker**: Handles query processing and response generation
- **ResponseReviewer**: Evaluates response quality and provides feedback

In [15]:
class ResponseReviewer(Executor):
    """Executor that reviews generated responses for quality."""
    
    def __init__(self, id: str, model_config: dict) -> None:
        super().__init__(id=id)
        self.groundedness_eval = GroundednessEvaluator(model_config)
        self.relevance_eval = RelevanceEvaluator(model_config)
    
    @handler
    async def review(self, request: ProcessingRequest, ctx: WorkflowContext[ReviewResponse]) -> None:
        print(f"\n{'='*80}")
        print(f"Reviewer: Evaluating response for request {request.request_id[:8]}...")
        print(f"{'='*80}")
        
        # Skip evaluation for LLM-only responses (casual chat)
        if request.intent == "LLM":
            print("Reviewer: LLM-only response, auto-approving...")
            await ctx.send_message(
                ReviewResponse(
                    request_id=request.request_id,
                    approved=True,
                    feedback="LLM response - no evaluation needed",
                    groundedness_score=5.0,
                    relevance_score=5.0
                )
            )
            return
        
        # Evaluate groundedness (check for hallucinations)
        groundedness_input = {
            "query": request.query,
            "context": request.context,
            "response": request.response
        }
        groundedness_result = self.groundedness_eval(**groundedness_input)
        groundedness_score = groundedness_result["groundedness"]
        
        # Evaluate relevance
        relevance_input = {
            "query": request.query,
            "response": request.response
        }
        relevance_result = self.relevance_eval(**relevance_input)
        relevance_score = relevance_result["relevance"]
        
        print(f"Reviewer: Groundedness Score: {groundedness_score:.2f}/5.0")
        print(f"Reviewer: Relevance Score: {relevance_score:.2f}/5.0")
        
        # Determine if response is approved
        threshold = 3.0
        approved = groundedness_score >= threshold and relevance_score >= threshold
        
        if approved:
            feedback = "Response is grounded and relevant. Approved."
            print(f"Reviewer: ✓ APPROVED - {feedback}")
        else:
            feedback_parts = []
            if groundedness_score < threshold:
                feedback_parts.append(
                    f"Response may contain hallucinations (groundedness: {groundedness_score:.2f}). "
                    "Ensure all facts come from the provided context."
                )
            if relevance_score < threshold:
                feedback_parts.append(
                    f"Response not sufficiently relevant (relevance: {relevance_score:.2f}). "
                    "Focus more directly on answering the user's question."
                )
            feedback = " ".join(feedback_parts)
            print(f"Reviewer: ✗ REJECTED - {feedback}")
        
        # Send review result
        await ctx.send_message(
            ReviewResponse(
                request_id=request.request_id,
                approved=approved,
                feedback=feedback,
                groundedness_score=groundedness_score,
                relevance_score=relevance_score
            )
        )

print("✓ ResponseReviewer defined")

✓ ResponseReviewer defined


In [19]:
class AdaptiveRAGWorker(Executor):
    """Worker executor that processes queries through adaptive RAG pipeline."""
    
    def __init__(
        self,
        id: str,
        chat_client: AzureOpenAIChatClient,
        search_client: SearchClient,
        model_config: dict
    ) -> None:
        super().__init__(id=id)
        self.chat_client = chat_client
        self.search_client = search_client
        self.retrieval_eval = RetrievalEvaluator(model_config)
        self._pending_requests: dict[str, ProcessingRequest] = {}
    
    @handler
    async def process_query(self, user_messages: list[ChatMessage], ctx: WorkflowContext[ProcessingRequest]) -> None:
        """Main entry point for query processing."""
        # Extract query text from the last user message
        query = user_messages[-1].text if user_messages else ""
        
        print(f"\n{'='*80}")
        print(f"Worker: Processing new query")
        print(f"Query: {query}")
        print(f"{'='*80}")
        
        request = ProcessingRequest(
            request_id=str(uuid4()),
            query=query
        )
        
        # Step 1: Classify intent
        intent = await self._classify_intent(query)
        request.intent = intent.intent_type
        print(f"\nWorker: Intent Classification")
        print(f"  - Intent: {intent.intent_type}")
        print(f"  - Reasoning: {intent.reasoning}")
        
        # Step 2: Process based on intent
        if intent.intent_type == "LLM":
            # Direct generation without retrieval
            await self._generate_llm_response(request, ctx)
        elif intent.intent_type == "RAG":
            # RAG pipeline with quality checks
            await self._process_rag_pipeline(request, ctx)
        else:  # websearch
            # Web search pipeline
            await self._process_websearch_pipeline(request, ctx)
    
    @handler
    async def handle_review(self, review: ReviewResponse, ctx: WorkflowContext[ProcessingRequest]) -> None:
        """Handle review feedback and retry if needed."""
        print(f"\n{'='*80}")
        print(f"Worker: Received review for request {review.request_id[:8]}")
        print(f"{'='*80}")
        
        if review.request_id not in self._pending_requests:
            print(f"Worker: Warning - Unknown request ID {review.request_id[:8]}")
            return
        
        request = self._pending_requests.pop(review.request_id)
        
        if review.approved:
            print("Worker: ✓ Response approved, emitting to user...")
            # Emit approved response to external consumer
            await ctx.add_event(
                AgentRunUpdateEvent(
                    self.id,
                    data=AgentRunResponseUpdate(
                        contents=[ChatMessage(role=Role.ASSISTANT, text=request.response)],
                        role=Role.ASSISTANT
                    )
                )
            )
            print(f"\nFinal Response Emitted:")
            print(f"{'-'*80}")
            print(request.response)
            print(f"{'-'*80}")
        else:
            print(f"Worker: ✗ Response rejected, regenerating with feedback...")
            print(f"Feedback: {review.feedback}")
            # Regenerate with feedback
            await self._regenerate_with_feedback(request, review.feedback, ctx)
    
    async def _classify_intent(self, query: str) -> IntentResponse:
        """Classify query intent."""
        prompt = f"""You are an intelligent intent classifier. Classify the query into one of:

- LLM: Casual conversation, greetings, general knowledge (no current info needed)
- RAG: Hotel-related questions (data available until Aug 2024)
- websearch: Recent events, news, or topics after Aug 2024

Query: {query}

Respond in JSON format with 'intent_type' and 'reasoning'."""
        
        messages = [ChatMessage(role=Role.USER, text=prompt)]
        response = await self.chat_client.get_response(
            messages=messages,
            response_format=IntentResponse
        )
        
        return IntentResponse.model_validate_json(response.messages[-1].text)
    
    async def _generate_llm_response(self, request: ProcessingRequest, ctx: WorkflowContext[ProcessingRequest]) -> None:
        """Generate response without retrieval."""
        print(f"\nWorker: Generating LLM response (no retrieval)...")
        
        prompt = f"""You are a kind and helpful assistant. Answer in a friendly, warm manner.
Use emojis appropriately (1-3 per response). Be clear and concise.

Query: {request.query}

Answer:"""
        
        messages = [ChatMessage(role=Role.USER, text=prompt)]
        response = await self.chat_client.get_response(messages=messages)
        
        request.response = response.messages[-1].text.strip()
        print(f"Worker: Response generated: {request.response[:100]}...")
        
        # Store and send for review
        self._pending_requests[request.request_id] = request
        await ctx.send_message(request)
    
    async def _process_rag_pipeline(self, request: ProcessingRequest, ctx: WorkflowContext[ProcessingRequest]) -> None:
        """Process RAG pipeline with retrieval and quality checks."""
        print(f"\nWorker: Starting RAG pipeline...")
        
        # Retrieve documents
        context = await self._retrieve_documents(request.query)
        request.context = context
        print(f"Worker: Retrieved {len(context.split(chr(10)))} documents")
        
        # Grade retrieval quality
        retrieval_score = self.retrieval_eval(
            query=request.query,
            context=context
        )["retrieval"]
        request.retrieval_score = retrieval_score
        print(f"Worker: Retrieval quality score: {retrieval_score:.2f}/5.0")
        
        # If retrieval quality is low, rewrite query and retry
        if retrieval_score < 3.0:
            print("Worker: Low retrieval quality, rewriting query...")
            rewritten_query = await self._rewrite_query(request.query)
            print(f"Worker: Rewritten query: {rewritten_query}")
            
            # Retry retrieval with rewritten query
            context = await self._retrieve_documents(rewritten_query)
            request.context = context
            print(f"Worker: Retrieved {len(context.split(chr(10)))} documents (retry)")
        
        # Generate answer
        await self._generate_rag_response(request, ctx)
    
    async def _process_websearch_pipeline(self, request: ProcessingRequest, ctx: WorkflowContext[ProcessingRequest]) -> None:
        """Process web search pipeline."""
        print(f"\nWorker: Starting web search pipeline...")
        
        # Rewrite query for web search
        search_query = await self._rewrite_for_websearch(request.query)
        print(f"Worker: Web search query: {search_query}")
        
        # Perform web search
        print("Worker: Performing web search...")
        web_results = await web_search(
            query=search_query,
            max_result=3,
            web_search_mode="bing"
        )
        
        # Format web results as context
        if isinstance(web_results, list):
            request.context = "\n\n".join([str(result) for result in web_results])
        else:
            request.context = str(web_results)
        
        print(f"Worker: Retrieved web results")
        
        # Generate answer with web context
        await self._generate_rag_response(request, ctx)
    
    async def _retrieve_documents(self, query: str) -> str:
        """Retrieve documents from Azure AI Search."""
        vector_query = VectorizableTextQuery(
            text=query,
            k_nearest_neighbors=3,
            fields="descriptionVector",
            exhaustive=True
        )
        
        search_results = self.search_client.search(
            search_text=query,
            vector_queries=[vector_query],
            select="Description,HotelName,Tags",
            top=3,
        )
        
        sources_formatted = "\n".join([
            f'{document["HotelName"]}: {document["Description"]} (Tags: {", ".join(document["Tags"])})'
            for document in search_results
        ])
        
        return sources_formatted
    
    async def _rewrite_query(self, query: str) -> str:
        """Rewrite query for better retrieval."""
        prompt = f"""You are a question re-writer that optimizes queries for vector search.
Look at the input and reason about the underlying semantic intent based on hotel domain.
Make the query more specific and searchable.

Original Query: {query}

Rewritten Query:"""
        
        messages = [ChatMessage(role=Role.USER, text=prompt)]
        response = await self.chat_client.get_response(messages=messages)
        
        return response.messages[-1].text.strip()
    
    async def _rewrite_for_websearch(self, query: str) -> str:
        """Rewrite query as web search keywords."""
        prompt = f"""You are a keyword re-writer that converts queries to web search keywords.
Generate search keywords that are specific and detailed for accurate web search results.
Don't include extra context like location or date.

Query: {query}

Search Keywords:"""
        
        messages = [ChatMessage(role=Role.USER, text=prompt)]
        response = await self.chat_client.get_response(messages=messages)
        
        return response.messages[-1].text.strip()
    
    async def _generate_rag_response(self, request: ProcessingRequest, ctx: WorkflowContext[ProcessingRequest]) -> None:
        """Generate response with retrieval context."""
        print(f"\nWorker: Generating response with context...")
        
        prompt = f"""Answer using ONLY the context below. Be friendly and concise.
If there isn't enough information, say you don't know.

Query: {request.query}

Context:
{request.context}

Answer:"""
        
        messages = [ChatMessage(role=Role.USER, text=prompt)]
        response = await self.chat_client.get_response(messages=messages)
        
        request.response = response.messages[-1].text.strip()
        print(f"Worker: Response generated: {request.response[:100]}...")
        
        # Store and send for review
        self._pending_requests[request.request_id] = request
        await ctx.send_message(request)
    
    async def _regenerate_with_feedback(self, request: ProcessingRequest, feedback: str, ctx: WorkflowContext[ProcessingRequest]) -> None:
        """Regenerate response incorporating feedback."""
        print(f"\nWorker: Regenerating response with feedback...")
        
        prompt = f"""Previous response had issues. Please improve based on feedback.

Query: {request.query}

Context:
{request.context}

Previous Response:
{request.response}

Feedback:
{feedback}

Improved Answer:"""
        
        messages = [ChatMessage(role=Role.USER, text=prompt)]
        response = await self.chat_client.get_response(messages=messages)
        
        request.response = response.messages[-1].text.strip()
        print(f"Worker: New response generated: {request.response[:100]}...")
        
        # Store and send for re-review
        self._pending_requests[request.request_id] = request
        await ctx.send_message(request)

print("✓ AdaptiveRAGWorker defined")

✓ AdaptiveRAGWorker defined


### 2.2 Build and Run the Workflow

In [20]:
async def run_adaptive_rag_workflow(query: str):
    """Run the adaptive RAG workflow with reflection pattern."""
    
    print(f"\n{'#'*80}")
    print(f"# Starting Adaptive RAG Workflow")
    print(f"# Query: {query}")
    print(f"{'#'*80}")
    
    # Create chat client
    chat_client = AzureOpenAIChatClient(
        model_id=azure_openai_chat_deployment,
        endpoint=azure_openai_endpoint,
        api_key=azure_openai_key,
        api_version=azure_openai_api_version
    )
    
    # Create search client
    search_client = SearchClient(
        endpoint=azure_ai_search_endpoint,
        index_name=index_name,
        credential=AzureKeyCredential(azure_search_admin_key),
    )
    
    # Create executors
    worker = AdaptiveRAGWorker(
        id="adaptive_rag_worker",
        chat_client=chat_client,
        search_client=search_client,
        model_config=model_config
    )
    
    reviewer = ResponseReviewer(
        id="response_reviewer",
        model_config=model_config
    )
    
    # Build workflow with reflection pattern
    print("\nBuilding workflow with Worker ↔ Reviewer cycle...")
    agent = (
        WorkflowBuilder()
        .add_edge(worker, reviewer)  # Worker sends responses to Reviewer
        .add_edge(reviewer, worker)  # Reviewer sends feedback to Worker
        .set_start_executor(worker)
        .build()
        .as_agent()  # Wrap workflow as an agent
    )
    
    print("Workflow built successfully!")
    print("\nRunning workflow...\n")
    
    # Run the workflow
    async for event in agent.run_stream(query):
        # Only final approved responses will be emitted
        if event.text:
            print(f"\n{'='*80}")
            print(f"FINAL APPROVED RESPONSE:")
            print(f"{'='*80}")
            print(event.text)
            print(f"{'='*80}\n")
    
    print("\n✓ Workflow completed successfully!")

print("✓ Workflow runner defined")

✓ Workflow runner defined


## 🧪 Step 3: Test the Complete Workflow
---

Let's test the workflow with different types of queries to see adaptive routing in action.

### Test 1: Simple LLM Query (No Retrieval)

In [21]:
# Test with a simple conversational query
await run_adaptive_rag_workflow("Hello! How are you today?")


################################################################################
# Starting Adaptive RAG Workflow
# Query: Hello! How are you today?
################################################################################

Building workflow with Worker ↔ Reviewer cycle...
Workflow built successfully!

Running workflow...


Worker: Processing new query
Query: Hello! How are you today?



Worker: Intent Classification
  - Intent: IntentType.LLM
  - Reasoning: The query 'Hello! How are you today?' is a casual greeting unrelated to hotels or recent events, thus best classified under LLM.

Worker: Generating LLM response (no retrieval)...
Worker: Response generated: Hello! I’m doing great, thank you for asking 😊 How about you? Hope you’re having a wonderful day! 🌟...

Reviewer: Evaluating response for request dfd4ae53...
Reviewer: LLM-only response, auto-approving...

Worker: Received review for request dfd4ae53
Worker: ✓ Response approved, emitting to user...

Final Response Emitted:
--------------------------------------------------------------------------------
Hello! I’m doing great, thank you for asking 😊 How about you? Hope you’re having a wonderful day! 🌟
--------------------------------------------------------------------------------

✓ Workflow completed successfully!
Worker: Response generated: Hello! I’m doing great, thank you for asking 😊 How about you? Hope y

### Test 2: RAG Query (Hotel Information)

In [22]:
# Test with a hotel-related query
await run_adaptive_rag_workflow("Can you recommend hotels with complimentary breakfast in downtown area?")


################################################################################
# Starting Adaptive RAG Workflow
# Query: Can you recommend hotels with complimentary breakfast in downtown area?
################################################################################

Building workflow with Worker ↔ Reviewer cycle...
Workflow built successfully!

Running workflow...


Worker: Processing new query
Query: Can you recommend hotels with complimentary breakfast in downtown area?

Worker: Intent Classification
  - Intent: IntentType.RAG
  - Reasoning: The query is about hotel recommendations with complimentary breakfast in a specific area, which is hotel-related and suitable for Retrieval-Augmented Generation using data available until Aug 2024.

Worker: Starting RAG pipeline...

Worker: Intent Classification
  - Intent: IntentType.RAG
  - Reasoning: The query is about hotel recommendations with complimentary breakfast in a specific area, which is hotel-related and suitable for Retr

### Test 3: Web Search Query (Recent Information)

In [23]:
# Test with a query requiring web search
await run_adaptive_rag_workflow("What are the latest hotel openings in New York City in 2025?")


################################################################################
# Starting Adaptive RAG Workflow
# Query: What are the latest hotel openings in New York City in 2025?
################################################################################

Building workflow with Worker ↔ Reviewer cycle...
Workflow built successfully!

Running workflow...


Worker: Processing new query
Query: What are the latest hotel openings in New York City in 2025?

Worker: Intent Classification
  - Intent: IntentType.WEBSEARCH
  - Reasoning: The query asks for the latest hotel openings in New York City in 2025, which is a future event beyond the available data until August 2024, so current or future web search is needed.

Worker: Starting web search pipeline...

Worker: Intent Classification
  - Intent: IntentType.WEBSEARCH
  - Reasoning: The query asks for the latest hotel openings in New York City in 2025, which is a future event beyond the available data until August 2024, so current o

### Test 4: Query with Poor Retrieval (Tests Rewriting)

In [24]:
# Test with a poorly worded query to trigger query rewriting
await run_adaptive_rag_workflow("Can you recommend some factories with free breakfast?")


################################################################################
# Starting Adaptive RAG Workflow
# Query: Can you recommend some factories with free breakfast?
################################################################################

Building workflow with Worker ↔ Reviewer cycle...
Workflow built successfully!

Running workflow...


Worker: Processing new query
Query: Can you recommend some factories with free breakfast?

Worker: Intent Classification
  - Intent: IntentType.RAG
  - Reasoning: The query is related to requesting recommendations for factories offering free breakfast, which is similar to hotel-related questions involving amenities and services. Such queries fit the RAG category with data availability until August 2024.

Worker: Starting RAG pipeline...

Worker: Intent Classification
  - Intent: IntentType.RAG
  - Reasoning: The query is related to requesting recommendations for factories offering free breakfast, which is similar to hotel-related 

## 📊 Summary
---

In this notebook, we implemented **Adaptive RAG** using Microsoft Agent Framework with the following features:

### ✅ Key Components

1. **Intent Classification**: Automatically routes queries to LLM, RAG, or Web Search
2. **Retrieval with Quality Check**: Grades retrieved documents and rewrites queries if needed
3. **Web Search Integration**: Fetches recent information when documents are outdated
4. **Reflection Pattern**: Evaluates responses for quality and regenerates if needed
5. **Quality Evaluators**: Uses Azure Evaluation SDK for:
   - Retrieval quality assessment
   - Groundedness checking (hallucination detection)
   - Relevance validation

### 🔄 Workflow Architecture

- **Worker Executor**: Handles query processing, retrieval, and generation
- **Reviewer Executor**: Evaluates response quality and provides feedback
- **Cyclic Flow**: Worker ↔ Reviewer loop ensures high-quality outputs
- **Event-Driven**: Only approved responses are emitted to users

### 🎯 Benefits

- **Adaptive**: Automatically selects the right processing strategy
- **Robust**: Self-correcting through reflection and retry
- **Transparent**: Detailed logging of decision-making process
- **Scalable**: Workflow-based architecture easy to extend

### 🔗 Learn More

- [Microsoft Agent Framework Documentation](https://learn.microsoft.com/en-us/agent-framework/)
- [Azure AI Search](https://learn.microsoft.com/en-us/azure/search/)
- [Azure AI Evaluation SDK](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-sdk)
- [Adaptive RAG Paper](https://arxiv.org/abs/2403.14403)