# **Agentic RAG for PDF-based Question Answering**

## **Introduction**
This notebook implements an **Agentic RAG (Retrieval-Augmented Generation) system** for querying PDFs using **CrewAI**. It employs two autonomous agents:  

- **PDFRetrievalAgent**: Retrieves the most relevant document excerpts using FAISS and sentence-transformer embeddings.  
- **LLMAgent**: Processes retrieved content and generates responses while ensuring answers are strictly grounded in the document.  

The system follows a **multi-step workflow** where agents collaborate to ensure accuracy, reducing hallucination risks. This approach is useful for **research papers, legal documents, reports, or any structured knowledge retrieval task**.  

---


## **1. Import Required Libraries**  

This section imports the essential libraries for building an **Agentic RAG system** to process PDFs and answer user queries.  

### **Libraries Overview:**  
- **CrewAI Components (`Agent`, `Task`, `Crew`, `Process`)**  
  - Define and manage autonomous agents for retrieval and response generation.  

- **LangChain for Document Processing**  
  - `PyPDFLoader`: Loads and extracts text from PDFs.  
  - `RecursiveCharacterTextSplitter`: Splits documents into smaller chunks for better retrieval.  

- **Vector Database for Efficient Search**  
  - `FAISS`: A fast vector search library for storing and retrieving document embeddings.  
  - `HuggingFaceEmbeddings`: Generates embeddings using a **pre-trained transformer model** for similarity search.  

- **LLM (Language Model Processing)**  
  - Used to generate responses strictly based on retrieved document content.  

This setup ensures an efficient **retrieval-augmented workflow** where documents are processed, stored, and queried seamlessly.


In [1]:
from crewai import Agent, Task, Crew, Process

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from crewai import LLM




* 'fields' has been removed


## **2. PDF Processing and Vector Store Creation**  

This function processes a PDF document, splits it into smaller chunks, and creates a vector store for efficient similarity-based search.  

### **Steps:**
1. **Load the PDF**:  
   - The PDF document is loaded from the specified file path using a PDF loader.

2. **Split the Document**:  
   - The document is split into smaller, manageable chunks using a text splitter. The chunk size is defined to ensure context is preserved across chunks with some overlap.

3. **Generate Embeddings**:  
   - Each document chunk is converted into numerical representations (embeddings) using a pre-trained HuggingFace model. These embeddings capture the semantic meaning of the content.

4. **Create FAISS Vector Store**:  
   - The generated embeddings are stored in a FAISS vector store, which enables fast similarity search for document retrieval.

5. **Return the Vector Store**:  
   - The vector store is returned, ready for use in the query process to retrieve relevant document chunks based on similarity.

This process enables efficient document retrieval by transforming PDF content into embeddings, making it ready for similarity search and query handling.


In [2]:


def process_pdf(pdf_path):
    loader = PyPDFLoader(pdf_path)
    docs = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=220)
    split_docs = text_splitter.split_documents(docs)
    embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    vectorstore = FAISS.from_documents(split_docs, embedding_model)
    return vectorstore


## **3. Load and Process the PDF**

In this section, we load and process the PDF document to create a vector store for document retrieval.  

### **Steps:**
1. **Load and Process the PDF**:  
   - The PDF document is processed by calling the `process_pdf` function, which loads the document, splits it into chunks, generates embeddings, and stores them in a FAISS vector store.

2. **Create the Retriever**:  
   - The vector store is converted into a retriever with the specified number of top document results (`k=5`). The retriever will fetch the most relevant document chunks based on similarity to a given query.

This step prepares the system for efficient document retrieval by transforming the PDF into a searchable vector store.


In [4]:

# Load and process the PDF
pdf_path = "Revolutionizing YouTube Video Summaries.pdf"
vectorstore = process_pdf(pdf_path)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})


Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 27 0 (offset 0)
Ignoring wrong pointing object 39 0 (offset 0)
Ignoring wrong pointing object 67 0 (offset 0)
  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  from tqdm.autonotebook import tqdm, trange


### 4.1 Query Refinement Agent

The **Query Refinement Agent** is designed to enhance user queries by improving clarity, specificity, and context. This ensures that document retrieval is more accurate, leading to better results in the downstream retrieval and response generation processes.


In [None]:
class QueryRefinementAgent(Agent):
    def __init__(self, llm):
        super().__init__(
            role="Query Refiner",
            backstory="I enhance user queries by clarifying intent and adding relevant context to improve search accuracy.",
            goal="Refine the user's query to retrieve the most relevant document excerpts."
        )
        self.llm = llm

    def execute_task(self, task: Task, context: dict = None, tools: list = None):
        query = task.description

        system_prompt = (
            "You are an AI assistant that refines user queries for document retrieval. "
            "Rephrase the query to make it clearer and more specific without altering its intent.\n\n"
            f"Original Query: {query}\n\n"
            "Refined Query:"
        )

        refined_query = self.llm.call([
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": query}
        ])

        return refined_query.strip()


### **4.2  Retrieval Agent**

In this section, we define and initialize the **PDFRetrievalAgent**, which is responsible for retrieving relevant excerpts from the PDF based on a user query.

### **Steps:**
1. **Define the Retrieval Agent Class**:  
   - The `PDFRetrievalAgent` class inherits from the `Agent` class and is configured with a specific role, backstory, and goal. The agent's purpose is to fetch the most relevant document excerpts based on a user query.
   
2. **Initialize the Retriever**:  
   - The agent stores the `retriever` (created in the previous step) as a private attribute, enabling the agent to perform efficient document retrieval.

3. **Execute the Retrieval Task**:  
   - The `execute_task` method takes a user query, retrieves the top documents from the vector store, and concatenates the text of the relevant excerpts for further processing.

4. **Initialize the Retrieval Agent**:  
   - The `pdf_retrieval_agent` is initialized and ready to retrieve document excerpts based on user input.

This step defines the agent responsible for the retrieval process, ensuring that relevant document excerpts are fetched based on similarity to a user’s query.


In [5]:

# Initialize the Retrieval Agent
class PDFRetrievalAgent(Agent):
    def __init__(self):
        super().__init__(
            role="PDF Retriever",
            backstory="I retrieve relevant information from research papers stored as PDFs.",
            goal="Fetch the most relevant document excerpts based on a user query."
        )
        self._retriever = retriever  # Store the retriever as a private attribute

    def execute_task(self, task: Task, context: dict = None, tools: list = None):
        query = task.description
        docs = self._retriever.invoke(query)
        retrieved_text = "\n".join([doc.page_content for doc in docs])
        return retrieved_text

# Initialize the Agent
pdf_retrieval_agent = PDFRetrievalAgent()


### **4.3. LLM Agent**

In this section, we define and initialize the **LLMAgent**, which processes the retrieved document excerpts and generates meaningful insights based on them.

### **Steps:**
1. **Define the LLM Agent Class**:  
   - The `LLMAgent` class inherits from the `Agent` class and is configured with a specific role, backstory, and goal. The agent's purpose is to analyze and summarize the document excerpts retrieved by the **PDFRetrievalAgent** and generate a clear response.

2. **Initialize the LLM Model**:  
   - The agent is initialized with an LLM model, which will be used to process the retrieved text and generate answers.

3. **Execute the Task**:  
   - The `execute_task` method takes the retrieved text (context) and the user query (task description). It first ensures that the context is valid. If no valid context is provided, the agent returns a default response: "No relevant information found in the document."
   - The **system prompt** is constructed to instruct the LLM to answer strictly based on the provided document excerpts, avoiding any external knowledge.
   - The agent then invokes the LLM model to generate a response based on the provided query and context.

4. **Generate the Response**:  
   - The LLM processes the system prompt and the user query, generating a response grounded only in the retrieved document content.

This step defines the agent responsible for summarizing and interpreting the retrieved text, ensuring that the responses are strictly based on the document content and not external information.


In [6]:
class LLMAgent(Agent):
    def __init__(self, llm):
        super().__init__(
            role="LLM Processor",
            backstory="I analyze and refine retrieved excerpts to generate meaningful insights.",
            goal="Summarize and interpret retrieved text to provide a clear response."
        )
        self.llm = llm
    
    def execute_task(self, task: Task, context: dict = None, tools: list = None):
        if not context or not isinstance(context, str) or len(context.strip()) == 0:
            return "No relevant information found in the document."
        
        system_prompt = (
            "You are an AI assistant that answers ONLY based on the provided document excerpts. "
            "Do not use external knowledge. If the answer is not found, reply with 'Not found in the document.'\n\n"
            "DOCUMENT EXCERPTS:\n" + context
        )

        return self.llm.call([{"role": "system", "content": system_prompt}, {"role": "user", "content": task.description}])


## **6. Initialize the Agents**

In this section, we initialize the **LLM model** and create the associated **LLMAgent** responsible for processing the retrieved text and generating meaningful responses.

### **Steps:**
1. **Initialize the LLM Model**:  
   - We initialize the LLM model by specifying the model identifier (`ollama/deepseek-r1:1.5b`) and the base URL (`http://localhost:11434`) where the model is hosted. This model will be used by the **LLMAgent** to generate responses.

2. **Create the LLMAgent**:  
   - The `LLMAgent` is initialized with the LLM model, which enables it to process the retrieved document excerpts.

3. **Define the LLM Task**:  
   - A `Task` is created for the **LLMAgent** with the following attributes:  
     - **Description**: "Analyze and summarize the retrieved text."  
     - **Expected Output**: "A well-structured response based on the retrieved excerpts."  
     - **Agent**: The agent responsible for this task is the `LLMAgent` initialized earlier.

This step sets up the agent that will analyze and summarize the retrieved document content, preparing it for the next steps in the system’s execution pipeline.


## **7. Example Usage**

In this section, we demonstrate how the agents work together to handle a user query. The process involves querying the **PDFRetrievalAgent** for relevant document excerpts and then passing the results to the **LLMAgent** for summarization.

### **Steps:**
1. **Define the Query**:  
   - A user query, such as "What are the prerequisites to run the project?", is defined to be processed by the system.

2. **Create the Retrieval Task**:  
   - A task is created for the **PDFRetrievalAgent** with the query as its description. The expected output is "Relevant excerpts from the PDF."

3. **Define the Crew**:  
   - A **Crew** is initialized to manage the execution of the agents and tasks.  
     - **Agents**: Both the **PDFRetrievalAgent** and the **LLMAgent** are included in the crew.  
     - **Tasks**: The task for **PDFRetrievalAgent** and the summarization task for **LLMAgent** are included.  
     - **Context Flow**: Defines the flow of context between tasks. Here, the results of the **retrieval_task** will be passed to the **llm_task** for further processing.

4. **Execute the Crew**:  
   - The `crew.kickoff()` function initiates the tasks, and the agents execute their respective roles sequentially.  
   - The result of the execution is printed, showing the raw output.

This part demonstrates how the agents and tasks are orchestrated to handle the user query, retrieve relevant information, and generate a meaningful response.


In [16]:
llm = LLM(model="ollama/deepseek-r1:1.5b", base_url="http://localhost:11434")
# Example usage
query = "what are the Prerequisites to Run the Project"
query_refinement_agent = QueryRefinementAgent(llm)
query_refinement_task = Task(
    description=query,
    expected_output="A well-structured and precise query for document retrieval.",
    agent=query_refinement_agent
)

retrieval_task = Task(
    description="Retrieve the most relevant excerpts from the uploaded PDF based on the refined query.",
    expected_output="A list of the most relevant document excerpts.",
    agent=pdf_retrieval_agent
)



llm_agent = LLMAgent(llm)
llm_task = Task(
    description="Analyze and summarize the retrieved text.",
    expected_output="A well-structured response based on the retrieved excerpts.",
    agent=llm_agent
)


crew = Crew(
    agents=[query_refinement_agent, pdf_retrieval_agent, llm_agent],
    tasks=[query_refinement_task, retrieval_task,llm_task],
    verbose=True,
    process=Process.sequential,
    context_flow={"retrieval_task": "query_refinement_task","llm_task":"retrieval_task"} 
)

result = crew.kickoff()
print(result.raw)


Overriding of current TracerProvider is not allowed


<think>
Okay, so I'm looking at the user's query. They provided a document excerpt about the Nested Retrieval-Augmented Generation (RAG) system created by napkin.ai. The task is to analyze and summarize that text.

First, I need to understand what RAG entails. From the user's message, it seems like RAG combines retrieval from external sources with augmented learning using AI. It uses a system called the YoutubeLoader method which probably extracts video transcripts from YouTube URLs. Then it splits this transcript into manageable chunks using RecursiveCharacterTextSplitter, which divides text into segments of 512 characters each. These chunks are then embedded using HuggingFaceEmbeddings and stored in a FAISS vector database.

The RAG system offers functionalities like summarization and interactive Q&A powered by LangChain, Llama 3.2, and Gradio. It's designed to make the video retrieval process more efficient, especially for various users who might be students, content creators, or pr

## **8. Extract and Remove `<think>` Content**

In this section, we process the raw result output by extracting and removing specific content enclosed within `<think>` tags. This ensures that only the relevant parts of the response are shown to the user.

### **Steps:**
1. **Extract Content Inside `<think>` Tags**:  
   - The `extract_think_content` function uses regular expressions to find and extract content within `<think>` and `</think>` tags. If content is found, it is returned; otherwise, an empty string is returned.

2. **Remove Content Inside `<think>` Tags**:  
   - The `remove_think_content` function uses regular expressions to remove everything between `<think>` and `</think>` tags. This is useful to clean up the result and ensure only the relevant content remains.

3. **Process the Raw Result**:  
   - The `think_content` variable stores any extracted content from the `<think>` tags, while the `response` variable stores the cleaned response with the `<think>` content removed.

4. **Print the Final Response**:  
   - The cleaned response (without `<think>` content) is printed, ensuring the final answer is concise and relevant.

This step processes the raw response to clean and refine the output, making it ready for presentation to the user.


In [9]:
import re
def extract_think_content(text):
    match = re.search(r'<think>(.*?)</think>', text, flags=re.DOTALL)
    return match.group(1).strip() if match else ""

# Function to remove everything between <think> and </think> tags
def remove_think_content(text):
    return re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)


In [None]:
think_content=extract_think_content(result.raw)
response=remove_think_content(result.raw)
print(response)

## **Summary**

This notebook demonstrates the creation and orchestration of agents to process a PDF document and generate meaningful insights from user queries. The system performs the following key steps:

1. **PDF Processing**: The PDF document is loaded, split into manageable chunks, and stored in a FAISS vector store for efficient retrieval using embeddings.
2. **Retrieval Agent**: A **PDFRetrievalAgent** retrieves the most relevant document excerpts based on the user’s query.
3. **LLM Processing**: The **LLMAgent** processes the retrieved text using a pre-trained language model, generating a well-structured response.
4. **Task Execution**: The **Crew** orchestrates the execution of these agents, ensuring sequential processing of the tasks (retrieving and summarizing the document content).
5. **Result Refining**: The raw response is refined by extracting or removing content enclosed in `<think>` tags, providing a clean and relevant response to the user.

This system efficiently combines retrieval-based and generation-based techniques to answer user queries, making it a robust tool for document analysis and summarization.
