# Agentic RAG with Autogen

This notebook demonstrates implementing Retrieval-Augmented Generation (RAG) using Autogen agents with enhanced evaluation capabilities.

In [1]:
# Make sure to run this cell before running the rest of the notebook
!pip install chromadb




[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
from typing import List, Dict
import time
import os
from autogen_agentchat.agents import AssistantAgent
from autogen_core.models import UserMessage
from autogen_core import CancellationToken
from autogen_agentchat.messages import TextMessage
from azure.core.credentials import AzureKeyCredential
from autogen_ext.models.azure import AzureAIChatCompletionClient
import chromadb
import asyncio

## Create the Client 

First, we initialize the Azure AI Chat Completion Client. This client will be used to interact with the Azure OpenAI service to generate responses to user queries.

In [3]:
client = AzureAIChatCompletionClient(
    model="gpt-4o-mini",
    endpoint="https://models.inference.ai.azure.com",
    credential=AzureKeyCredential(os.environ["GITHUB_TOKEN"]),
    model_info={
        "json_output": True,
        "function_calling": True,
        "vision": True,
        "family": "unknown",
    },
)
result = await client.create([UserMessage(content="What is the capital of France?", source="user")])
print(result)

finish_reason='stop' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=14, completion_tokens=7) cached=False logprobs=None thought=None


## Initialize Assistant Agent

Next, we create an instance of the `AssistantAgent`. This agent will use the Azure AI Chat Completion Client to generate responses to user queries.

In [4]:
assistant = AssistantAgent(
    name="assistant",
    model_client=client,
    system_message="You are a helpful assistant.",
)

## Vector Database Initialization

We initialize ChromaDB with persistent storage and add enhanced sample documents. ChromaDB will be used to store and retrieve documents that provide context for generating accurate responses.

In [5]:
# Initialize ChromaDB with persistent storage
chroma_client = chromadb.PersistentClient(path="./chroma_db")
collection = chroma_client.create_collection(
    name="travel_documents",
    metadata={"description": "travel_service"},
    get_or_create=True
)

# Enhanced sample documents
documents = [
    "Contoso Travel offers luxury vacation packages to exotic destinations worldwide.",
    "Our premium travel services include personalized itinerary planning and 24/7 concierge support.",
    "Contoso's travel insurance covers medical emergencies, trip cancellations, and lost baggage.",
    "Popular destinations include the Maldives, Swiss Alps, and African safaris.",
    "Contoso Travel provides exclusive access to boutique hotels and private guided tours."
]

# Add documents with metadata
collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))],
    metadatas=[{"source": "training", "type": "explanation"} for _ in documents]
)

## Agent Configuration

We configure the retrieval and assistant agents. The retrieval agent is specialized in finding relevant information using semantic search, while the assistant generates detailed responses based on the retrieved information.

In [6]:
# Create agents with enhanced capabilities
retrieval_agent = AssistantAgent(
    name="retrieval_agent",
    model_client=client,
    system_message="""I am a retrieval agent specialized in finding relevant information.
    I use semantic search to find the most pertinent context for queries.""",
)

assistant = AssistantAgent(
    name="assistant",
    system_message="""I am an AI assistant that generates detailed responses based on retrieved information.
    I cite sources and explain my reasoning process.""",
    model_client=client,
)

## RAGEvaluator Class

We define the `RAGEvaluator` class to evaluate the response based on various metrics like response length, source citations, response time, and context relevance.

In [7]:
class RAGEvaluator:
    def __init__(self):
        self.responses = []
        self.metrics = {}
        
    def evaluate_response(self, query: str, response: str, context: List[str]) -> Dict:
        # Calculate response time
        start_time = time.time()
        
        metrics = {
            'response_length': len(response),
            'source_citations': sum(1 for doc in context if doc in response),
            'response_time': time.time() - start_time,
            'context_relevance': self._calculate_relevance(query, context)
        }
        
        self.responses.append({
            'query': query,
            'response': response,
            'metrics': metrics
        })
        
        return metrics
    
    def _calculate_relevance(self, query: str, context: List[str]) -> float:
        # Simple relevance scoring
        return sum(1 for c in context if query.lower() in c.lower()) / len(context)

## Query Processing with RAG

We define the `ask_rag` function to send the query to the assistant, process the response, and evaluate it. This function handles the interaction with the assistant and uses the evaluator to measure the quality of the response.

In [8]:
async def ask_rag(query: str, evaluator: RAGEvaluator):
    try:
        # Get response with timing
        start_time = time.time()
        response = await assistant.on_messages(
            [TextMessage(content=query, source="user")],
            cancellation_token=CancellationToken(),
        )
        processing_time = time.time() - start_time
        
        # Evaluate response
        metrics = evaluator.evaluate_response(
            query=query,
            response=response.chat_message.content,
            context=documents
        )
        
        return {
            'response': response.chat_message.content,
        }
    except Exception as e:
        print(f"Error processing query: {e}")
        return None

# Example usage

We initialize the evaluator and define the queries that we want to process and evaluate.

In [9]:
evaluator = RAGEvaluator()
queries = [
    "What luxury vacation packages does Contoso Travel offer?",
    "Can you explain Contoso's travel insurance coverage?",
    "What destinations and experiences are available through Contoso Travel?"
]

In [10]:
async def main():
    for query in queries:
        print(f"\nProcessing Query: {query}")
        result = await ask_rag(query, evaluator)
        if result:
            print(f"Response: {result['response']}")

## Run the Script

We check if the script is running in an interactive environment or a standard script, and run the main function accordingly.

In [11]:
if __name__ == "__main__":
    if asyncio.get_event_loop().is_running():
        # Running in an interactive environment, use await main()
        await main()
    else:
        # Running in a standard script, use asyncio.run()
        asyncio.run(main())


Processing Query: What luxury vacation packages does Contoso Travel offer?
Response: As of my last knowledge update in October 2023, Contoso Travel is a fictional company often used as a placeholder in various business scenarios, particularly in Microsoft documentation and examples. Because it doesn't exist in reality, there are no actual luxury vacation packages offered by Contoso Travel.

However, if we consider a real travel company, typical luxury vacation packages might include:

1. **Luxury Cruises:** All-inclusive packages featuring gourmet dining, spacious suites, and excursions to exotic destinations.
   
2. **Private Villas or Estates:** Exclusive stays at luxurious properties, often with personal chefs and concierge service.

3. **Custom Itineraries:** Tailored travel experiences that include private tours, fine dining, and unique cultural experiences, such as a guided trip through Italy or a safari in Africa.

4. **Wellness Retreats:** Packages focused on relaxation and se