# Lesson 3 - Building a RAG Agent with CrewAI

In this lesson, you will build a RAG agent with `CrewAI`. You will do that by integrating `RagTool` from `crewai_tools` with a `CrewAI` agent. `RagTool` provides a way to create and query knowledge bases from various data sources, and allows the agent to access specialized context. In this lesson, you will provide the RAG tool a pdf file containing details about insurance coverage provided by a private health insurer. By the end of the lesson, you will build an insurer agent specialized in answering queries related to health benefits. In the next lessons, you will wrap this agent in an ACP server and make it interact with other ACP agents.

<p style="background-color:#fff6ff; padding:15px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px"> 💻 &nbsp; <b>To Access <code>requirements.txt</code> and the <code>data</code> files:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Open"</em>. For more help, please see the <em>"Appendix – Tips, Help, and Download"</em> Lesson.</p>

## 3.1. Import Libraries

In [1]:
from crewai import Crew, Task, Agent, LLM
from crewai_tools import RagTool

In [2]:
import warnings
warnings.filterwarnings('ignore')

## 3.2. Define the Agent's Large Language Model

You'll now define the large language model that you will use for your CrewAI agent. `max_tokens`: maximum number of tokens the model can generate in a single response.

**Note**: If you will define this model locally, it requires that you define the API key in a **.env** file as follows:
```
# Required
OPENAI_API_KEY=sk-...

# Optional
OPENAI_API_BASE=<custom-base-url>
OPENAI_ORGANIZATION=<your-org-id>
```

In [3]:
llm = LLM(model="openai/gpt-4", max_tokens=1024)

## 3.3. Define the RAG Tool

For the RAG tool, you can define the model provider and the embedding model in a configuration Python dictionary. You can also define the details of your vector database. If you don't specify the vector database, the RagTool will use Chroma (ChromaDB) as the default vector database in local/in-memory mode.

In [4]:
config = {
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4",
        }
    },
    "embedding_model": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-ada-002"
        }
    }
}

You can then pass the `config` to the `RagTool`, and then specify the data source for which the knowledge base will be constructed. When embedding your data, the `RagTool` chunks your document into chunks and create an embedding vector for each chunk. You can specify the chunk size (`chunk_size`: number of characters) and how many characters overlap between consecutive chunks (`chunk_overlap`). You can also use the default behavior.

In [5]:
rag_tool = RagTool(config=config,  
                   chunk_size=1200,       
                   chunk_overlap=200,     
                  )
rag_tool.add("../data/gold-hospital-and-premium-extras.pdf", data_type="pdf_file")

Inserting batches in chromadb: 100%|██████████| 1/1 [00:00<00:00, 12.81it/s]


In [6]:
rag_tool

RagTool(name='Knowledge base', description="Tool Name: Knowledge base\nTool Arguments: {'query': {'description': None, 'type': 'str'}}\nTool Description: A knowledge base that can be used to answer questions.", args_schema=<class 'abc.RagToolSchema'>, description_updated=False, cache_function=<function BaseTool.<lambda> at 0x7f329a203380>, result_as_answer=False, summarize=False, adapter=EmbedchainAdapter(embedchain_app=<embedchain.app.App object at 0x7f328cf65890>, summarize=False), config={'llm': {'provider': 'openai', 'config': {'model': 'gpt-4'}}, 'embedding_model': {'provider': 'openai', 'config': {'model': 'text-embedding-ada-002'}}})

## 3.4. Define the Insurance Agent

Now that you have the `rag_tool` defined, you define the CrewAI agent that can assist with insurance coverage queries.

In [7]:
insurance_agent = Agent(
    role="Senior Insurance Coverage Assistant", 
    goal="Determine whether something is covered or not",
    backstory="You are an expert insurance agent designed to assist with coverage queries",
    verbose=True,
    allow_delegation=False,
    llm=llm,
    tools=[rag_tool], 
    max_retry_limit=5
)

## 3.5. Define the Agent Task

Let's now test the insurance agent. For that, you need to define the agent task and pass to it the query and the agent.

In [8]:
task1 = Task(
        description='What is the waiting period for rehabilitation?',
        expected_output = "A comprehensive response as to the users question",
        agent=insurance_agent
)

## 3.6. Run the Insurance Agent

To run the agent, you need to pass the agent and the task to a Crew object that you can run using the `kickoff` method.

In [9]:
crew = Crew(agents=[insurance_agent], tasks=[task1], verbose=True)
task_output = crew.kickoff()
print(task_output) 

[1m[95m# Agent:[00m [1m[92mSenior Insurance Coverage Assistant[00m
[95m## Task:[00m [92mWhat is the waiting period for rehabilitation?[00m




[1m[95m# Agent:[00m [1m[92mSenior Insurance Coverage Assistant[00m
[95m## Thought:[00m [92mIn order to provide a specific answer, I need to check the waiting period for rehabilitation in our insurance coverage policies.[00m
[95m## Using tool:[00m [92mKnowledge base[00m
[95m## Tool Input:[00m [92m
"{\"query\": \"Waiting period for rehabilitation in insurance coverage\"}"[00m
[95m## Tool Output:[00m [92m
Relevant Content:
Health Programs & Support Our Health and Hospital Substitution programs are designed to support our members on their health journey or on their road to recovery. We have a number of programs to suit a variety of health and recovery needs. HEALTH PROGRAMS Our health programs are designed to help you keep on top of your health and live a healthier life. We have a range of health programs to help you manage a number of different health conditions. The waiting period to recieve health programs is 2 months of continuous cover. HOSPITAL SUBSTITUTION PRO



[1m[95m# Agent:[00m [1m[92mSenior Insurance Coverage Assistant[00m
[95m## Final Answer:[00m [92m
The waiting period for rehabilitation under our insurance coverage is 2 months. Do note, if this is related to a pre-existing condition, there is usually a 12-month waiting period, but this exception does not apply to rehabilitation, hospital psychiatric services, palliative care, and ambulance services. For those services, the waiting period remains 2 months, even if the condition is pre-existing.[00m




The waiting period for rehabilitation under our insurance coverage is 2 months. Do note, if this is related to a pre-existing condition, there is usually a 12-month waiting period, but this exception does not apply to rehabilitation, hospital psychiatric services, palliative care, and ambulance services. For those services, the waiting period remains 2 months, even if the condition is pre-existing.


## 3.7. Resources

- [CrewAI Agents](https://docs.crewai.com/concepts/agents)
- [CrewAI Tasks](https://docs.crewai.com/concepts/tasks)
- [CrewAI RagTool](https://docs.crewai.com/tools/ai-ml/ragtool)
- [Short course on Multi Agents with CrewAI](https://www.deeplearning.ai/short-courses/multi-ai-agent-systems-with-crewai/)

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">
<p> ⬇ &nbsp; <b>Download Notebooks:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Download as"</em> and select <em>"Notebook (.ipynb)"</em>.</p>

</div>