# **Multi-Agent RAG for Document-Based Question Answering**  

## **Introduction**  
This notebook demonstrates a **Multi-Agent Retrieval-Augmented Generation (RAG) system** using **CrewAI** to efficiently query and analyze documents. It employs a structured workflow where multiple agents collaborate to refine queries, retrieve relevant information, generate responses, and verify their accuracy.  

### **Key Features:**  
1. **Multi-format Document Processing** – Supports PDFs, TXT, DOCX, and HTML, extracting and chunking content for efficient retrieval.  
2. **Vector Search with FAISS** – Uses `HuggingFaceEmbeddings` to embed document chunks and retrieve relevant excerpts.  
3. **Agent-Based Workflow** – Dedicated agents for query refinement, document retrieval, response generation, and verification ensure structured processing.  
4. **CrewAI Orchestration** – Implements a sequential process to enhance retrieval accuracy and response reliability.  
5. **Local LLM Execution with Ollama** – Utilizes `ollama/deepseek-r1:1.5b` for all LLM-driven tasks, enabling offline execution without cloud dependencies.  
6. **Customizable and Extensible** – Easily adaptable to different LLMs, retrieval techniques, and document types.  

### **Workflow Overview:**  
- Processes and embeds documents using FAISS for efficient search.  
- Retrieves the top 5 most relevant excerpts based on a refined user query.  
- Enhances query clarity for improved document retrieval.  
- Generates structured responses strictly based on retrieved content.  
- Verifies responses to ensure they are fully grounded in the provided excerpts.  
- Uses CrewAI to seamlessly manage agent interactions and context flow.  
- Outputs structured results while cleaning agents responses.  

## **1. Import Required Libraries**  

This section imports the essential libraries for building an **Agentic RAG system** to process PDFs and answer user queries.  

- **CrewAI** (`Agent, Task, Crew, Process, LLM`) – Manages multi-agent workflows, orchestrating query refinement, retrieval, and response validation.  
- **LangChain** (`RecursiveCharacterTextSplitter, FAISS, HuggingFaceEmbeddings`) – Splits documents into chunks, embeds them using HuggingFace models, and enables vector-based retrieval with FAISS.  
- **LangChain Document Loaders** (`DirectoryLoader, TextLoader, PyPDFLoader`) – Loads documents from various formats (PDF, TXT, DOCX, HTML) for processing.  
- **JSON** – Handles structured data output, ensuring easy parsing and analysis of results.  



In [None]:
from crewai import Agent, Task, Crew, Process
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from crewai import LLM
import json
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import TextLoader
from langchain.document_loaders import PyPDFLoader
import re

## **2. Documents Processing and Vector Store Creation**  

This function processes all types of documents, splits it into smaller chunks, and creates a vector store for efficient similarity-based search.  
The `process_all_files(directory)` function:  
1. Loads **PDF, TXT, DOCX, and HTML** files from a directory.  
2. Splits text into chunks using `RecursiveCharacterTextSplitter`.  
3. Embeds chunks with `HuggingFaceEmbeddings`.  
4. Stores embeddings in a **FAISS vector database**.  
5. Returns a retriever for efficient document search.  


In [3]:


def process_all_files(directory):
    loaders = [
        DirectoryLoader(directory, glob="**/*.html",show_progress=True),
        DirectoryLoader(directory, glob="**/*.pdf",show_progress=True, loader_cls=PyPDFLoader),
        DirectoryLoader(directory, glob="**/*.txt",show_progress=True, loader_cls=TextLoader),
        DirectoryLoader(directory, glob="**/*.docx",show_progress=True)
    ]
    documents=[]
    for loader in loaders:
        data =loader.load()
        documents.extend(data)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=150)
    split_docs = text_splitter.split_documents(documents)
    embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    vectorstore = FAISS.from_documents(split_docs, embedding_model)
    return vectorstore



### 2.1 **Loading and Processing Documents**

- **Directory Setup**: The directory containing documents is specified as `"data/"`.  
- **Document Processing**: The `process_all_files(directory)` method is called to load, split, embed, and store the document chunks in a **FAISS vector store**.  
- **Retriever Initialization**: The vector store is then converted into a retriever with the top 5 relevant document excerpts being retrieved using `search_kwargs={"k": 5}`.


In [None]:

# Load and process the PDF
directory="data/"
vectorstore=process_all_files(directory)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})


## 3.**Creating Agents for the Workflow**

In this section, we will define the agents that will handle different tasks within the retrieval-augmented generation (RAG) workflow. Each agent will be responsible for a specific step, such as refining user queries, retrieving relevant document excerpts, generating responses, and verifying the accuracy of those responses. These agents work together in a sequential flow to ensure that the entire process from query refinement to response verification is seamless and efficient. Let's start by creating the **QueryRefinementAgent**.

### **3.1 [QueryRefinementAgent](#)**

The **QueryRefinementAgent** is responsible for improving the clarity and specificity of user queries to enhance document retrieval accuracy. By refining the user’s original query, this agent ensures that the most relevant document excerpts are retrieved.

#### **Key Features**:
- **Role**: Refines user queries to ensure clarity without changing their original intent.
- **Backstory**: Aids in improving search accuracy by clarifying the user's query.
- **Goal**: To refine queries for more relevant document retrieval.

#### **Workflow**:
1. The agent receives the user’s query.
2. It uses an LLM to rephrase the query and make it more specific.
3. The refined query is then passed on for document retrieval.

This agent is the first step in the sequential workflow, setting the stage for improved document retrieval in later stages.


In [28]:
class QueryRefinementAgent(Agent):
    def __init__(self, llm):
        super().__init__(
            role="Query Refiner",
            backstory="I enhance user queries by clarifying intent and adding relevant context to improve search accuracy.",
            goal="Refine the user's query to retrieve the most relevant document excerpts."
        )
        self.llm = llm

    def execute_task(self, task: Task, context: dict = None, tools: list = None):
        query = task.description

        system_prompt =  (
        "You are an AI assistant that refines user queries for document retrieval. "
        "Rephrase the query to be clearer and more specific while keeping it concise. "
        "Return only the refined query without any explanations or additional text.\n\n"
        f"Original Query: {query}\n\n"
        "Refined Query:"
    )

        refined_query = self.llm.call([
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": query}
        ])

        return refined_query.strip()


### **3.2  [RetrievalAgent](#)**

The **RetrievalAgent** is responsible for retrieving relevant document excerpts from the stored vector database based on a user’s query. It utilizes a retriever to search the indexed documents and fetch the most pertinent content for further processing.

#### **Key Features**:
- **Role**: Retrieves relevant document excerpts using a pre-defined retriever.
- **Backstory**: Specializes in fetching the most relevant information from research papers or documents, particularly stored in PDF format.
- **Goal**: To fetch the top document excerpts based on the user query for accurate response generation.

#### **Workflow**:
1. The agent receives the user query from the **QueryRefinementAgent**.
2. It queries the vectorstore (FAISS) using the retriever to fetch relevant document excerpts.
3. The relevant document content is returned for analysis and response generation.

This agent ensures that the generated responses are directly grounded in the most relevant content, making it a key part of the retrieval process.


In [29]:

# Initialize the Retrieval Agent
class RetrievalAgent(Agent):
    def __init__(self):
        super().__init__(
            role="PDF Retriever",
            backstory="I retrieve relevant information from research papers stored as PDFs.",
            goal="Fetch the most relevant document excerpts based on a user query."
        )
        self._retriever = retriever  # Store the retriever as a private attribute

    def execute_task(self, task: Task, context: dict = None, tools: list = None):
        query = task.description
        docs = self._retriever.invoke(query)
        retrieved_text = "\n".join([doc.page_content for doc in docs])
        return retrieved_text

# Initialize the Agent
retrieval_agent = RetrievalAgent()


### **3.3 [LLMAgent](#)**

The **LLMAgent** is responsible for analyzing the retrieved document excerpts and generating meaningful insights or responses based on them. Using an LLM (Language Model), this agent interprets the retrieved content and provides a clear, structured answer to the user's query.

#### **Key Features**:
- **Role**: Processes retrieved document excerpts to generate relevant responses.
- **Backstory**: Refines and interprets the retrieved content to summarize and generate meaningful insights.
- **Goal**: To produce clear, accurate responses by analyzing the provided document excerpts.

#### **Workflow**:
1. The agent receives the retrieved document excerpts from the **RetrievalAgent**.
2. It uses the LLM to generate a response based on the provided context, ensuring the response is strictly based on the content in the document excerpts.
3. The generated response is returned along with the retrieved content for verification.

This agent ensures that the response generated is relevant to the user’s query and grounded in the provided document content, making it an essential part of the response generation process.

In [30]:
class LLMAgent(Agent):
    def __init__(self, llm):
        super().__init__(
            role="LLM Processor",
            backstory="I analyze and refine retrieved excerpts to generate meaningful insights.",
            goal="Summarize and interpret retrieved text to provide a clear response."
        )
        self.llm = llm

    def execute_task(self, task: Task, context: dict = None, tools: list = None):
        if not context or not isinstance(context, str) or len(context.strip()) == 0:
            return {"generated_response": "No relevant information found.", "retrieved_text": ""}
        
        system_prompt = (
            "You are an AI assistant that answers ONLY based on the provided document excerpts. "
            "Do not use external knowledge. If the answer is not found, reply with 'Not found in the document.'\n\n"
            "DOCUMENT EXCERPTS:\n" + context
        )

        generated_response = self.llm.call([
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": task.description}
        ])

        #return generated_response
        return json.dumps({  "generated_response": generated_response.strip(),     "retrieved_text": context  })


### 3.4 **[ResponseVerificationAgent](#)**

The **ResponseVerificationAgent** ensures that the generated response is fully supported by the retrieved document excerpts. This agent checks the alignment between the response and the content it was generated from, verifying that no external or unsupported information is included.

#### **Key Features**:
- **Role**: Verifies that the generated response is grounded in the retrieved document excerpts.
- **Backstory**: Ensures that the final response aligns with the provided text, highlighting any unsupported claims.
- **Goal**: To validate whether the response is properly supported by the document excerpts or flag any unsupported information.

#### **Workflow**:
1. The agent receives the generated response and the retrieved document excerpts from the **LLMAgent**.
2. It analyzes the response to check if all information is supported by the document excerpts.
3. The agent returns a verification result:
   - If the response is fully supported, it returns "Verified ✅".
   - If the response contains unsupported information, it highlights the unsupported parts.

This agent plays a crucial role in ensuring the integrity and accuracy of the response, confirming that it is based solely on the retrieved content.

In [73]:
class ResponseVerificationAgent(Agent):
    def __init__(self, llm):
        super().__init__(
            role="Response Verifier",
            backstory="I ensure that the generated response is strictly based on retrieved document excerpts.",
            goal="Validate whether the response is properly grounded in the provided excerpts."
        )
        self.llm = llm

    def execute_task(self, task: Task, context: dict = None, tools: list = None):
        
        refined_context=json.loads(context)
        retrieved_text = refined_context.get("retrieved_text", "")
        generated_response = refined_context.get("generated_response", "")
        print(retrieved_text)
        print(generated_response)
        system_prompt = (
            "You are an AI assistant that verifies whether a generated response is properly supported "
            "by the given document excerpts. Your task is to analyze the response and check if it is "
            "grounded in the provided text.\n\n"
            "DOCUMENT EXCERPTS:\n"
            f"{retrieved_text}\n\n"
            "GENERATED RESPONSE:\n"
            f"{generated_response}\n\n"
            "Verification Output:\n"
            "- If the response is fully supported by the DOCUMENT EXCERPTS, reply with: 'Verified ✅'\n"
            "- If the response includes information not found in the DOCUMENT EXCERPTS, reply with: 'Unable to Verify'."
        )
        print("system_prompt",system_prompt)
        print("generated_response",generated_response)
        verification_result = self.llm.call([
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": generated_response}
        ])

        return verification_result.strip()


### 4.**Sequential Agent Workflow for Query Refinement, Retrieval, Response Generation, and Verification**
This section outlines the process of initializing and executing a multi-agent workflow for querying documents, refining user input, retrieving relevant content, generating responses, and verifying the accuracy of the generated answers.



#### **4.1. Initialize Local LLM (Ollama)**

The **LLM** is initialized using a locally hosted model `ollama/deepseek-r1:1.5b`, which will be used by the various agents for query refinement, response generation, and verification.


In [74]:
llm = LLM(model="ollama/deepseek-r1:1.5b", base_url="http://localhost:11434")

#### 4.2 Query Refinement Agent Initialization
The `QueryRefinementAgent` is initialized with the local LLM model. The agent's purpose is to refine user queries to enhance the accuracy of document retrieval. A task is created to process a sample query related to project prerequisites.

In [75]:
query = "what are the Prerequisites to Run the Project"
query_refinement_agent = QueryRefinementAgent(llm)
query_refinement_task = Task(
    description=query,
    expected_output="A well-structured and precise query for document retrieval.",
    agent=query_refinement_agent
)


#### 4.3. Retrieval Agent Task
A retrieval task is defined where the agent fetches the most relevant excerpts from the uploaded PDFs based on the refined query. This task ensures that only the most pertinent information is retrieved from the documents.

In [76]:
retrieval_task = Task(
    description="Retrieve the most relevant excerpts from the uploaded PDF based on the refined query.",
    expected_output="A list of the most relevant document excerpts.",
    agent=retrieval_agent
)


#### 4.4 LLM Agent Task for Response Generation
The `LLMAgent` takes the retrieved excerpts and generates a well-structured response based solely on the document content. This agent ensures that the response is meaningful and interprets the retrieved information effectively.

In [77]:
llm_agent = LLMAgent(llm)
llm_task = Task(
    description="Analyze and summarize the retrieved text.",
    expected_output="A well-structured response based on the retrieved excerpts.",
    agent=llm_agent
)


#### 4.5. Response Verification Agent Task
The `ResponseVerificationAgent` checks whether the generated response is entirely supported by the retrieved excerpts. It validates the integrity of the response and ensures no unsupported information is included.

In [78]:
response_verification_agent = ResponseVerificationAgent(llm)
response_verification_task = Task(
    description="Verify if the generated response is based solely on the retrieved document excerpts.",
    expected_output="Verification result indicating whether the response is supported by the excerpts.",
    agent=response_verification_agent, context=[llm_task]
)


#### 4.6. Define and Execute the Agent Workflow
The Crew framework is used to orchestrate the agent workflow, ensuring a sequential flow where each agent performs its task in a defined order. The context flow ensures that tasks pass information correctly, from query refinement to retrieval, response generation, and verification.

In [None]:
crew = Crew(
    agents=[query_refinement_agent, retrieval_agent, llm_agent, response_verification_agent],
    tasks=[query_refinement_task, retrieval_task, llm_task, response_verification_task],
    verbose=True,
    process=Process.sequential,
    context_flow={
        retrieval_task: query_refinement_task,
        llm_task: retrieval_task,
        response_verification_task: llm_task  
    }
)

result = crew.kickoff()
print(result.raw)


### **4.7 Output of the LLMAgent Task: Displaying Task Details and Outputs**

After executing the LLMAgent task, the following output is retrieved and displayed. This includes the task's description, summary, raw output, and structured JSON output. The results are printed for review, providing a detailed breakdown of the generated content. If available, the JSON dictionary and Pydantic output are also printed for further inspection.


In [None]:
task_output = llm_task.output
print(f"Task Description: {task_output.description}")
print(f"Task Summary: {task_output.summary}")
print(f"Raw Output: {task_output.raw}")
print(f"JSON Output: {json.loads(task_output.raw)}")
if task_output.json_dict:
    print(f"JSON Output: {json.dumps(task_output.json_dict, indent=2)}")
if task_output.pydantic:
    print(f"Pydantic Output: {task_output.pydantic}")

### **4.8. Extract and Remove `<think>` Content for cleaner output**

In this section, two functions are implemented to handle `<think>` tags within text:

1. **`extract_think_content`**: This function extracts the content between the `<think>` and `</think>` tags. If such tags are found, it returns the content inside; otherwise, it returns an empty string.

2. **`remove_think_content`**: This function removes all content enclosed within the `<think>` and `</think>` tags, ensuring that the text is cleaned of any such sections.



In [82]:

def extract_think_content(text):
    match = re.search(r'<think>(.*?)</think>', text, flags=re.DOTALL)
    return match.group(1).strip() if match else ""

# Function to remove everything between <think> and </think> tags
def remove_think_content(text):
    return re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)


## 5. Final Output 

#### 5.1 Final response to the question

In [None]:
final_response=remove_think_content(json.loads(task_output.raw)['generated_response'])
print(final_response)

#### 5.2 Reasoning for final response

In [84]:
reasoning_final_response=extract_think_content(json.loads(task_output.raw)['generated_response'])
print(reasoning_final_response)


#### 5.3 Document Excerpts

In [None]:

source_documents = json.loads(task_output.raw)['retrieved_text']
print(source_documents)

#### 5.4 Verify Response against Document Exceprts

In [None]:
verify_answer=remove_think_content(response_verification_task.output.raw)
print(verify_answer)

## **Summary**  

This notebook implements a **Multi-Agentic RAG (Retrieval-Augmented Generation) workflow** using **CrewAI** to process documents, retrieve relevant excerpts, generate responses, and verify their accuracy.  

- **Document Processing & Retrieval**: Documents (PDF, TXT, DOCX, HTML) are processed and embedded using **FAISS** for vector search.  
- **Agents Workflow**: Four agents handle different tasks:  
  1. **QueryRefinementAgent**: Enhances user queries for better search accuracy.  
  2. **RetrievalAgent**: Fetches the most relevant document excerpts.  
  3. **LLMAgent**: Generates a response strictly based on retrieved excerpts.  
  4. **ResponseVerificationAgent**: Ensures the response is grounded in the retrieved content.  
- **Execution Flow**: The agents are orchestrated sequentially via **CrewAI**, maintaining context across tasks.  
- Enables **efficient document-based Q&A** with local LLM execution using **Ollama** for full control over inference.
- **Final Output Processing**: The generated response is cleaned, extracting the final response, reasoning, and source documents.  
