# RAG with DocAgent

In this notebook, see how DocAgent, through natural language, can

1. Ingest documents from a local file or URL
2. and answer questions with RAG capability

### Installation

To get started with the document agent integration in AG2, follow these steps:

Install AG2 with the `rag` extra:
```bash
pip install ag2[rag]
```

Notes:

   1. DocAgent only queries ingested documents, this ensures that it won't make up information if it can't find it.
   2. Answers may not be accurate for documents that cannot be parsed correctly to Markdown format.

You're all set! Now you can start using DocAgent feature in AG2.

## Documents supported

The following documents can be ingested:
- PDF
- IMAGE
- DOCX
- HTML
- PPTX
- ASCIIDOC
- MD

You can also have the DocAgent use a web page by giving it a URL to ingest.

## Inside the DocAgent

![](docagent_swarm.png)

Before jumping into the code, let's have a look at the swarm inside the DocAgent.

The swarm contains the following agents:

- **Triage Agent**: Decides what type of task to perform from user requests.
- **Task Manager Agent**: Manages the tasks and initiates actions.
- **Data Ingestion Agent**: Ingests the documents.
- **Query Agent**: Answers user questions based on ingested documents.
- **Error Agent**: If anything fails, the error agent will report the problem back.
- **Summary Agent**: Generates a summary of the completed tasks.

When it is the DocAgent's turn to reply, the DocAgent initializes the agents and sets up the context variables, then:

1. **Triage User Requests**: The `Triage Agent` categorizes the tasks into ingestions and queries.
2. **Task Management**: The `Task Manager Agent` manages the tasks and ensures they are executed in the correct sequence.
3. **Data Ingestion**: The `Data Ingestion Agent` processes any document ingesting tasks.
4. **Query Execution**: The `Query Agent` answers any user queries.
5. **Summary Generation**: The `Summary Agent` generates a summary of the completed tasks.

The swarm is dynamic in that if only a data ingestation task is required, it will do that. Similarly for queries. This allows you to use the DocAgent many times covering the needs of ingesting, querying, or doing both at the same time.

## Code example

In [1]:
import os

import autogen
from autogen import AfterWorkOption, ConversableAgent, initiate_swarm_chat
from autogen.agents.experimental import DocAgent

In [2]:
def get_api_key(key_name="OPENAI/Yiwen"):
    from api_key_store import ApiKeyStore
    return ApiKeyStore().get_api_key(key_name)

In [3]:
os.environ["OPENAI_API_KEY"] = get_api_key()

In [7]:
config_list = [
    {
        "model": "gpt-4o",
        "api_key": get_api_key(),
    },
]
llm_config = {
    "config_list": config_list,
}

### Ingesting local documents and answering questions

In [8]:
# Create a document agent and ask them to ingest the document and answer the question
document_agent = DocAgent(llm_config=llm_config, collection_name="toast_report")
document_agent.run(
    "could you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf? What is the fiscal year 2024 financial summary?",
    max_turns=1,
)

INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Creating new collection toast_report in the database.


[33muser[0m (to DocAgent):

could you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf? What is the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------
[33m_User[0m (to chat_manager):

could you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf? What is the fiscal year 2024 financial summary?

--------------------------------------------------------------------------------
[32m
Next speaker: DocumentTriageAgent
[0m
[33mDocumentTriageAgent[0m (to chat_manager):

{"ingestions":[{"path_or_url":"../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf"}],"queries":[{"query_type":"RAG_QUERY","query":"What is the fiscal year 2024 financial summary?"}]}

--------------------------------------------------------------------------------
[32m
Next speaker: TaskManagerAgent
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mTaskManagerAgent[0m (to chat_manager):

[32m***** Sugg

INFO:autogen.agents.experimental.document_agent.document_agent:initiate_tasks context_variables


[33m_Swarm_Tool_Executor[0m (to chat_manager):

[32m***** Response from calling tool (call_m1UFBPY2u2EvIzhbSZ3HmLx9) *****[0m
Updated context variables with task decisions
[32m**********************************************************************[0m

--------------------------------------------------------------------------------
[32m
Next speaker: TaskManagerAgent
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mTaskManagerAgent[0m (to chat_manager):

[32m***** Suggested tool call (call_rR8fcWNlFEjVh9OWVXDRpPwm): transfer_TaskManagerAgent_to_DoclingDocIngestAgent *****[0m
Arguments: 
{}
[32m*******************************************************************************************************************[0m

--------------------------------------------------------------------------------
[32m
Next speaker: _Swarm_Tool_Executor
[0m
[35m
>>>>>>>> EXECUTING FUNCTION transfer_TaskManagerAgent_to_DoclingDocIngestAgent...
Call ID: call_rR8fcWNlFEjVh9OWVXDRpPwm
Input arguments

INFO:autogen.agents.experimental.document_agent.document_utils:Error when checking if ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf is a valid URL: Invalid URL.
INFO:autogen.agents.experimental.document_agent.document_utils:Detected file. Returning file path...
INFO:docling.document_converter:Going to convert document batch...
INFO:easyocr.easyocr:Download complete
INFO:easyocr.easyocr:Download complete.
INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0'
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0'
INFO:docling.pipeline.base_pipeline:Processing document Toast_financial_report.pdf
INFO:docling.document_converter:Finished converting document Toast_financial_report.pdf in 64.66 sec.
INFO:autogen.agents.experimental.document_agent.parser_utils:Document converted in 64.66 seconds.
DEBUG:autogen.agents.experimental.document_agent.parser_utils:Document Toast_financial_re

[33m_Swarm_Tool_Executor[0m (to chat_manager):

[32m***** Response from calling tool (call_yxRcfqfYoZ0omOnwkD5Xefh9) *****[0m
Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\u2612' in position 126: ordinal not in range(128): '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'
[32m**********************************************************************[0m

--------------------------------------------------------------------------------
[32m
Next speaker: ErrorManagerAgent
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mErrorManagerAgent[0m (to chat_manager):

The following error(s) have occurred:
- Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\u2612' in position 126: ordinal not in range(128): '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'

--------------------------------------------------------------------------------
[33mDocAgent[0m (to user):

The following error(s) have occurred:
- Data Inge

ChatResult(chat_id=None, chat_history=[{'content': 'could you ingest ../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf? What is the fiscal year 2024 financial summary?', 'role': 'assistant', 'name': 'user'}, {'content': "The following error(s) have occurred:\n- Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\\u2612' in position 126: ordinal not in range(128): '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'", 'role': 'user', 'name': 'DocAgent'}], summary="The following error(s) have occurred:\n- Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\\u2612' in position 126: ordinal not in range(128): '../test/agentchat/contrib/graph_rag/Toast_financial_report.pdf'", cost={'usage_including_cached_inference': {'total_cost': 0}, 'usage_excluding_cached_inference': {'total_cost': 0}}, human_input=[])

reported issue - https://github.com/ag2ai/ag2/issues/1167

### Fetching a webpage and answering questions

In [9]:
# Create a document agent and ask them to summarize a web page article
document_agent = DocAgent(llm_config=llm_config, collection_name="news_reports")
document_agent.run(
    "could you read 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html' and summarize the article?",
    max_turns=1,
)

INFO:autogen.agents.experimental.document_agent.docling_query_engine:Creating new collection news_reports in the database.


[33muser[0m (to DocAgent):

could you read 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html' and summarize the article?

--------------------------------------------------------------------------------
[33m_User[0m (to chat_manager):

could you read 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html' and summarize the article?

--------------------------------------------------------------------------------
[32m
Next speaker: DocumentTriageAgent
[0m
[33mDocumentTriageAgent[0m (to chat_manager):

{"ingestions":[{"path_or_url":"https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html"}],"queries":[{"query_type":"RAG_QUERY","query":"summarize the article"}]}

--------------------------------------------------------------------------------
[32m
Next speaker: TaskManagerAgent
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mTaskManagerAgent[0m (to chat_manager):

[32m***** Suggested tool call (ca

INFO:autogen.agents.experimental.document_agent.document_agent:initiate_tasks context_variables


[33m_Swarm_Tool_Executor[0m (to chat_manager):

[32m***** Response from calling tool (call_T1Z6kx0SQ7TuDkeTSiix0asE) *****[0m
Updated context variables with task decisions
[32m**********************************************************************[0m

--------------------------------------------------------------------------------
[32m
Next speaker: TaskManagerAgent
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mTaskManagerAgent[0m (to chat_manager):

[32m***** Suggested tool call (call_Oa6OaNGtOM34igCy2T8SLKU4): transfer_TaskManagerAgent_to_DoclingDocIngestAgent *****[0m
Arguments: 
{}
[32m*******************************************************************************************************************[0m

--------------------------------------------------------------------------------
[32m
Next speaker: _Swarm_Tool_Executor
[0m
[35m
>>>>>>>> EXECUTING FUNCTION transfer_TaskManagerAgent_to_DoclingDocIngestAgent...
Call ID: call_Oa6OaNGtOM34igCy2T8SLKU4
Input arguments

INFO:autogen.agents.experimental.document_agent.document_utils:Detected URL. Downloading content...
INFO:WDM:Get LATEST chromedriver version for google-chrome
INFO:WDM:Get LATEST chromedriver version for google-chrome
INFO:WDM:There is no [linux64] chromedriver "133.0.6943.141" for browser google-chrome "133.0.6943" in cache
INFO:WDM:Get LATEST chromedriver version for google-chrome
INFO:WDM:WebDriver version 133.0.6943.141 selected
INFO:WDM:Modern chrome version https://storage.googleapis.com/chrome-for-testing-public/133.0.6943.141/linux64/chromedriver-linux64.zip
INFO:WDM:About to download new driver from https://storage.googleapis.com/chrome-for-testing-public/133.0.6943.141/linux64/chromedriver-linux64.zip
INFO:WDM:Driver downloading response is 200
INFO:WDM:Get LATEST chromedriver version for google-chrome
INFO:WDM:Driver has been saved in cache [/home/papagame/.wdm/drivers/chromedriver/linux64/133.0.6943.141]
INFO:docling.document_converter:Going to convert document batch...
INF

[33m_Swarm_Tool_Executor[0m (to chat_manager):

[32m***** Response from calling tool (call_xD4oOU8E1hNVYDspITrpkJIP) *****[0m
Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\u2019' in position 3166: ordinal not in range(128): 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html'
[32m**********************************************************************[0m

--------------------------------------------------------------------------------
[32m
Next speaker: ErrorManagerAgent
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mErrorManagerAgent[0m (to chat_manager):

The following error(s) have occurred:
- Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\u2019' in position 3166: ordinal not in range(128): 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html'

--------------------------------------------------------------------------------
[33mDocAgent[0m (to user):

The following

ChatResult(chat_id=None, chat_history=[{'content': "could you read 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html' and summarize the article?", 'role': 'assistant', 'name': 'user'}, {'content': "The following error(s) have occurred:\n- Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\\u2019' in position 3166: ordinal not in range(128): 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html'", 'role': 'user', 'name': 'DocAgent'}], summary="The following error(s) have occurred:\n- Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\\u2019' in position 3166: ordinal not in range(128): 'https://www.independent.co.uk/space/earth-core-inner-shape-change-b2695585.html'", cost={'usage_including_cached_inference': {'total_cost': 0}, 'usage_excluding_cached_inference': {'total_cost': 0}}, human_input=[])

## Multiple DocAgents in a Swarm

Now we're going to use multiple DocAgents, each responsible for their own data.

An nvidia_agent agent will ingest NVIDIA's financial report and query it. Similarly, an amd_agent will do the same with AMD's financial report.

Although a single agent could ingest and query the documents, we want to ensure that their queries aren't tainted by the other company's documents. So we keep them separate and give them a unique `collection_name` which, in turn, will create individual data stores (See more on [Chroma collections](https://cookbook.chromadb.dev/core/collections/)).

In [10]:
# Ensure the OPENAI_API_KEY is set in the environment
llm_config = {"model": "gpt-4o-mini", "api_type": "openai", "cache_seed": None}

# Create agents for the NVIDIA and AMD documents
# Each agent has a unique collection_name so that data and queries are run in different vector store spaces
nvidia_agent = DocAgent(
    name="nvidia_agent",
    llm_config=llm_config,
    collection_name="nvidia-demo",
)

amd_agent = DocAgent(
    name="amd_agent",
    llm_config=llm_config,
    collection_name="amd-demo",
)

INFO:autogen.agents.experimental.document_agent.docling_query_engine:Creating new collection nvidia-demo in the database.
INFO:autogen.agents.experimental.document_agent.docling_query_engine:Creating new collection amd-demo in the database.


In [11]:
# A financial analyst agent who will direct the DocAgents to ingest documents and answer questions
# The financial analyst will also summarize the results and terminate the conversation
analyst = ConversableAgent(
    name="financial_analyst",
    system_message=(
        "You are a financial analyst working with two specialist agents, amd_agent who handles all AMD documents and queries, and nvidia_agent who handles all NVIDIA documents and queries. "
        "Each agent knows how to load documents and answer questions from the document regarding their respective companies. "
        "Only mention one of the two agents at a time, prioritize amd_agent. You will be able to engage each agent separately in subsequent iterations. "
        "CRITICAL - Work with ONLY ONE agent at a time and provide (a) an indication for them to take action by saying '[Next to speak is ...]' together with (b) documents they need to ingest and (c) queries they need to run, if any. "
        "DO NOT provide instructions that include the mention of both agents in the one response. "
        "When all documents have been ingested and all queries have been answered, provide a summary and add 'TERMINATE' to the end of your summary. "
        "The summary should contain detailed bullet points (multiple per query if needed) and grouped by each query. After the summary provide a one line conclusion. "
        "Add the term '(This is not financial advice)' at the end of your conclusion. "
        "If there are errors, list them and say 'TERMINATE'. "
        "If there are no errors, do not say 'TERMINATE' until each agent has run their queries and provided their answers."
    ),
    is_termination_msg=lambda x: x.get("content", "") and "terminate" in x.get("content", "").lower(),
    llm_config=llm_config,
)

In [12]:
# Initiate the swarm (change the file paths in the messages if needed)
result, _, _ = initiate_swarm_chat(
    initial_agent=analyst,
    agents=[analyst, nvidia_agent, amd_agent],
    messages=(
        "Use the amd_agent to load AMD's 4th quarter 2024 report from "
        "./docagent/AMDQ4-2024.pdf "
        "and use the nvidia_agent to load NVIDIA's 3rd quarter 2025 report from "
        "./docagent/NVIDIAQ3-2025.pdf. "
        "Ask 'amd_agent' to ingest the AMD document and answer two queries (a) what AMD did in regards to AI and (b) what was the Q4 2024 GAAP revenue."
        "Ask 'nvidia_agent' to ingest the NVIDIA document and answer two queries (a) what NVIDIA did in regards to AI and (b) what was Q3 2025 GAAP revenue."
    ),
    swarm_manager_args={
        "llm_config": llm_config,
        "system_message": "You are managing a financial analyst and two specialist company agents. After amd_agent or nvidia_agent, select the financial_analyst to speak next.",
        "is_termination_msg": lambda x: x.get("content", "") and "terminate" in x.get("content", "").lower(),
    },
    after_work=AfterWorkOption.SWARM_MANAGER,
)

[33m_User[0m (to chat_manager):

Use the amd_agent to load AMD's 4th quarter 2024 report from ./docagent/AMDQ4-2024.pdf and use the nvidia_agent to load NVIDIA's 3rd quarter 2025 report from ./docagent/NVIDIAQ3-2025.pdf. Ask 'amd_agent' to ingest the AMD document and answer two queries (a) what AMD did in regards to AI and (b) what was the Q4 2024 GAAP revenue.Ask 'nvidia_agent' to ingest the NVIDIA document and answer two queries (a) what NVIDIA did in regards to AI and (b) what was Q3 2025 GAAP revenue.

--------------------------------------------------------------------------------
[32m
Next speaker: financial_analyst
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mfinancial_analyst[0m (to chat_manager):

[Next to speak is amd_agent]  
Documents to ingest:  
- ./docagent/AMDQ4-2024.pdf  

Queries to run:  
1. What did AMD do in regards to AI?  
2. What was the Q4 2024 GAAP revenue?  

--------------------------------------------------------------------------------
[32m
Next s

INFO:autogen.agents.experimental.document_agent.document_agent:initiate_tasks context_variables


[33m_Swarm_Tool_Executor[0m (to chat_manager):

[32m***** Response from calling tool (call_WWJBcraaRd10mUCxiUNeXgLx) *****[0m
Updated context variables with task decisions
[32m**********************************************************************[0m

--------------------------------------------------------------------------------
[32m
Next speaker: TaskManagerAgent
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mTaskManagerAgent[0m (to chat_manager):

[32m***** Suggested tool call (call_o7No5hR8ncV1IB1YVSJaCTMd): transfer_TaskManagerAgent_to_DoclingDocIngestAgent *****[0m
Arguments: 
{}
[32m*******************************************************************************************************************[0m
[32m***** Suggested tool call (call_tw1qypxDXrXp3N3v4caSkBCH): transfer_TaskManagerAgent_to_DoclingDocIngestAgent *****[0m
Arguments: 
{}
[32m*******************************************************************************************************************[0m

----

INFO:autogen.agents.experimental.document_agent.document_utils:Error when checking if ./docagent/AMDQ4-2024.pdf is a valid URL: Invalid URL.
INFO:autogen.agents.experimental.document_agent.document_utils:Detected file. Returning file path...
INFO:docling.document_converter:Going to convert document batch...
INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0'
INFO:docling.utils.accelerator_utils:Accelerator device: 'cuda:0'
INFO:docling.pipeline.base_pipeline:Processing document AMDQ4-2024.pdf
INFO:docling.document_converter:Finished converting document AMDQ4-2024.pdf in 10.84 sec.
INFO:autogen.agents.experimental.document_agent.parser_utils:Document converted in 10.84 seconds.
DEBUG:autogen.agents.experimental.document_agent.parser_utils:Document AMDQ4-2024.pdf converted.
Saved markdown output to: /home/papagame/projects/wgong/AG2/ag2/notebook/parsed_docs
DEBUG:autogen.agents.experimental.document_agent.parser_utils:item-0 at level 0: unspecified: group _root_
  item-1 at

[33m_Swarm_Tool_Executor[0m (to chat_manager):

[32m***** Response from calling tool (call_V028uLJbil9mxafODqmNc0cX) *****[0m
Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\xb7' in position 5086: ordinal not in range(128): './docagent/AMDQ4-2024.pdf'
[32m**********************************************************************[0m

--------------------------------------------------------------------------------
[32m
Next speaker: ErrorManagerAgent
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mErrorManagerAgent[0m (to chat_manager):

The following error(s) have occurred:
- Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\xb7' in position 5086: ordinal not in range(128): './docagent/AMDQ4-2024.pdf'

--------------------------------------------------------------------------------
[33mamd_agent[0m (to chat_manager):

The following error(s) have occurred:
- Data Ingestion Task Failed, Error 'ascii' codec can't encode character '\xb7

INFO:openai._base_client:Retrying request to /chat/completions in 0.455995 seconds


[32m
Next speaker: financial_analyst
[0m
[31m
>>>>>>>> USING AUTO REPLY...[0m
[33mfinancial_analyst[0m (to chat_manager):

There was an error in processing the AMD document. I will not proceed with the NVIDIA agent yet. 

[Next to speak is amd_agent]  
Documents to ingest:  
- Please try loading a different document or a different method to ingest the AMD Q4 2024 report.  

Please advise on how to proceed regarding the AMD document ingestion.

--------------------------------------------------------------------------------
[32m
Next speaker: _User
[0m


Replying as _User. Provide feedback to chat_manager. Press enter to skip and use auto-reply, or type 'exit' to end the conversation:  exit


### Tips for DocAgent

- When asking for information from ingested information, be precise. For example, asking for revenue for the quarter in the previous example could retrieve a number of different revenue values, so we ask for "GAAP revenue" for the specific quarter.
- If you have ingested documents in previous runs and just need to query the information, be clear to the DocAgent that they don't need to ingest documents you refer to.
- Ensure that any files to be ingested can be accessed by the process you are running.
- Ingestions take time, be sure to use `collection_name` to reuse collections that have already had the documents ingested in to.
- You can review the Markdown files ingested in the `parsed_docs` folder to see how effective the conversion to Markdown was. This will help you investigate any query issues. 