# Agentic RAG with Autogen

This notebook demonstrates implementing Retrieval-Augmented Generation (RAG) using Autogen agents with enhanced evaluation capabilities.

In [None]:
# Make sure to run this cell before running the rest of the notebook
!pip install chromadb

In [1]:
from typing import List, Dict
import time
import os
from autogen_agentchat.agents import AssistantAgent
from autogen_core.models import UserMessage
from autogen_core import CancellationToken
from autogen_agentchat.messages import TextMessage
from azure.core.credentials import AzureKeyCredential
from autogen_ext.models.azure import AzureAIChatCompletionClient
import chromadb
import asyncio

## Create the Client 

First, we initialize the Azure AI Chat Completion Client. This client will be used to interact with the Azure OpenAI service to generate responses to user queries.

In [2]:
client = AzureAIChatCompletionClient(
    model="gpt-4o-mini",
    endpoint="https://models.inference.ai.azure.com",
    credential=AzureKeyCredential(os.environ["GITHUB_TOKEN"]),
    model_info={
        "json_output": True,
        "function_calling": True,
        "vision": True,
        "family": "unknown",
    },
)
result = await client.create([UserMessage(content="What is the capital of France?", source="user")])
print(result)

finish_reason='stop' content='The capital of France is Paris.' usage=RequestUsage(prompt_tokens=14, completion_tokens=7) cached=False logprobs=None thought=None


## Initialize Assistant Agent

Next, we create an instance of the `AssistantAgent`. This agent will use the Azure AI Chat Completion Client to generate responses to user queries.

In [3]:
assistant = AssistantAgent(
    name="assistant",
    model_client=client,
    system_message="You are a helpful assistant.",
)

## Vector Database Initialization

We initialize ChromaDB with persistent storage and add enhanced sample documents. ChromaDB will be used to store and retrieve documents that provide context for generating accurate responses.

In [4]:
# Initialize ChromaDB with persistent storage
chroma_client = chromadb.PersistentClient(path="./chroma_db")
collection = chroma_client.create_collection(
    name="documents",
    metadata={"description": "RAG documentation"},
    get_or_create=True
)

# Enhanced sample documents
documents = [
    "RAG combines retrieval with generative AI for accurate responses.",
    "Key features of RAG include document indexing and contextual generation.",
    "RAG helps reduce hallucinations by grounding responses in source documents.",
    "RAG systems use vector embeddings to find relevant context.",
    "The retrieval component ensures factual accuracy in responses."
]

# Add documents with metadata
collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))],
    metadatas=[{"source": "training", "type": "explanation"} for _ in documents]
)

Add of existing embedding ID: doc_0
Add of existing embedding ID: doc_1
Add of existing embedding ID: doc_2
Add of existing embedding ID: doc_3
Add of existing embedding ID: doc_4
Add of existing embedding ID: doc_0
Add of existing embedding ID: doc_1
Add of existing embedding ID: doc_2
Add of existing embedding ID: doc_3
Add of existing embedding ID: doc_4
Add of existing embedding ID: doc_0
Add of existing embedding ID: doc_1
Add of existing embedding ID: doc_2
Add of existing embedding ID: doc_3
Add of existing embedding ID: doc_4
Add of existing embedding ID: doc_0
Add of existing embedding ID: doc_1
Add of existing embedding ID: doc_2
Add of existing embedding ID: doc_3
Add of existing embedding ID: doc_4
Add of existing embedding ID: doc_0
Add of existing embedding ID: doc_1
Add of existing embedding ID: doc_2
Add of existing embedding ID: doc_3
Add of existing embedding ID: doc_4
Add of existing embedding ID: doc_0
Add of existing embedding ID: doc_1
Add of existing embedding ID

## Agent Configuration

We configure the retrieval and assistant agents. The retrieval agent is specialized in finding relevant information using semantic search, while the assistant generates detailed responses based on the retrieved information.

In [5]:
# Create agents with enhanced capabilities
retrieval_agent = AssistantAgent(
    name="retrieval_agent",
    model_client=client,
    system_message="""I am a retrieval agent specialized in finding relevant information.
    I use semantic search to find the most pertinent context for queries.""",
)

assistant = AssistantAgent(
    name="assistant",
    system_message="""I am an AI assistant that generates detailed responses based on retrieved information.
    I cite sources and explain my reasoning process.""",
    model_client=client,
)

## RAGEvaluator Class

We define the `RAGEvaluator` class to evaluate the response based on various metrics like response length, source citations, response time, and context relevance.

In [6]:
class RAGEvaluator:
    def __init__(self):
        self.responses = []
        self.metrics = {}
        
    def evaluate_response(self, query: str, response: str, context: List[str]) -> Dict:
        # Calculate response time
        start_time = time.time()
        
        metrics = {
            'response_length': len(response),
            'source_citations': sum(1 for doc in context if doc in response),
            'response_time': time.time() - start_time,
            'context_relevance': self._calculate_relevance(query, context)
        }
        
        self.responses.append({
            'query': query,
            'response': response,
            'metrics': metrics
        })
        
        return metrics
    
    def _calculate_relevance(self, query: str, context: List[str]) -> float:
        # Simple relevance scoring
        return sum(1 for c in context if query.lower() in c.lower()) / len(context)

## Query Processing with RAG

We define the `ask_rag` function to send the query to the assistant, process the response, and evaluate it. This function handles the interaction with the assistant and uses the evaluator to measure the quality of the response.

In [7]:
async def ask_rag(query: str, evaluator: RAGEvaluator):
    try:
        # Get response with timing
        start_time = time.time()
        response = await assistant.on_messages(
            [TextMessage(content=query, source="user")],
            cancellation_token=CancellationToken(),
        )
        processing_time = time.time() - start_time
        
        # Evaluate response
        metrics = evaluator.evaluate_response(
            query=query,
            response=response.chat_message.content,
            context=documents
        )
        
        return {
            'response': response.chat_message.content,
        }
    except Exception as e:
        print(f"Error processing query: {e}")
        return None

# Example usage

We initialize the evaluator and define the queries that we want to process and evaluate.

In [8]:
evaluator = RAGEvaluator()
queries = [
    "What are the key features of RAG?",
    "How does RAG improve response accuracy?",
    "Explain the retrieval process in RAG"
]

In [9]:
async def main():
    for query in queries:
        print(f"\nProcessing Query: {query}")
        result = await ask_rag(query, evaluator)
        if result:
            print(f"Response: {result['response']}")

## Run the Script

We check if the script is running in an interactive environment or a standard script, and run the main function accordingly.

In [10]:
if __name__ == "__main__":
    if asyncio.get_event_loop().is_running():
        # Running in an interactive environment, use await main()
        await main()
    else:
        # Running in a standard script, use asyncio.run()
        asyncio.run(main())


Processing Query: What are the key features of RAG?
Response: RAG, or Retrieval-Augmented Generation, is a model architecture that combines the strengths of information retrieval and generative language models. Here are some key features of RAG:

1. **Dual Components**: RAG integrates two main components: a retriever and a generator. The retriever fetches relevant documents or pieces of information from a knowledge base, while the generator produces human-like text based on the retrieved information.

2. **Retrieval-Augmented Generation**: The generator uses the documents retrieved by the retriever to inform its responses, effectively enhancing the model's ability to provide accurate and contextually relevant answers.

3. **End-to-End Training**: RAG models allow for joint training, where both the retriever and generator can be tuned together. This approach helps optimize the interaction between the two components for improved performance.

4. **Flexible Knowledge Sources**: RAG can u