<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/agent/multi_document_agents-v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi-Document Agents (V1)

In this guide, you learn towards setting up a multi-document agent over the LlamaIndex documentation.

This is an extension of V0 multi-document agents with the additional features:
- Reranking during document (tool) retrieval
- Query planning tool that the agent can use to plan


We do this with the following architecture:

- setup a "document agent" over each Document: each doc agent can do QA/summarization within its doc
- setup a top-level agent over this set of document agents. Do tool retrieval and then do CoT over the set of tools to answer a question.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [46]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [47]:
cd '/content/drive/My Drive/Code Documentation Project/Issue_Report/'

/content/drive/My Drive/Code Documentation Project/Issue_Report


In [1]:
!pip install llama-index

Installing collected packages: urllib3, mypy-extensions, marshmallow, jsonpointer, h11, deprecated, aiostream, typing-inspect, jsonpatch, httpcore, tiktoken, langsmith, httpx, dataclasses-json, openai, langchain, llama-index
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.0.7
    Uninstalling urllib3-2.0.7:
      Successfully uninstalled urllib3-2.0.7
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.[0m[31m
[0mSuccessfully installed aiostream-0.5.2 dataclasses-json-0.5.14 deprecated-1.2.14 h11-0.14.0 httpcore-1.0.2 httpx-0.25.1 jsonpatch-1.33 jsonpointer-2.4 langchain-0.0.335 langsmith-0.0.64 llama-index-0.8.69.post2 marshmallow-3.20.1 mypy-extensions-1.0.0 openai-1.2.4 tiktoken-0.5.1 typing-inspect-0.9.0 urllib3-1.26.18


In [2]:
%load_ext autoreload
%autoreload 2

## Setup and Download Data

In this section, we'll load in the LlamaIndex documentation.

In [None]:
domain = "docs.llamaindex.ai"
docs_url = "https://docs.llamaindex.ai/en/latest/"
!wget -e robots=off --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains {domain} --no-parent {docs_url}

In [5]:
!pip install llama_hub

Installing collected packages: retrying, html2text, atlassian-python-api, llama_hub
Successfully installed atlassian-python-api-3.41.3 html2text-2020.1.16 llama_hub-0.0.44 retrying-1.3.4


In [6]:
from llama_hub.file.unstructured.base import UnstructuredReader
from pathlib import Path
from llama_index.llms import OpenAI
from llama_index import ServiceContext

In [7]:
reader = UnstructuredReader()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [50]:
all_files_gen = Path("./issue1/").rglob("*")
all_files = [f.resolve() for f in all_files_gen]
len(all_files)

15

In [8]:
all_files_gen = Path("./docs.llamaindex.ai/").rglob("*")
all_files = [f.resolve() for f in all_files_gen]

In [9]:
all_html_files = [f for f in all_files if f.suffix.lower() == ".html"]

In [10]:
len(all_html_files)

478

In [None]:
!pip install unstructured

In [117]:
from llama_index import Document
import random
exceptions = ['segmentation error', 'assertion error', 'null pointer exception']
# TODO: set to higher value if you want more docs
doc_limit = 100

docs = []
for idx, f in enumerate(all_files):
    if idx > doc_limit:
        break
    print(f"Idx {idx}/{len(all_files)}")
    loaded_docs = reader.load_data(file=f, split_documents=True)
    # Hardcoded Index. Everything before this is ToC for all pages
    start_idx = 72

    exception = random.choice(exceptions)
    loaded_doc = Document(
        text="\n\n".join([d.get_content() for d in loaded_docs[72:]]),
        metadata={"path": str(f),"issue_number": str(os.path.basename(f)).split('.')[0],'Exception Type':exception},

    )
    print(loaded_doc.metadata)
    docs.append(loaded_doc)

Idx 0/15
{'path': '/content/drive/MyDrive/Code Documentation Project/Issue_Report/issue1/18305.txt', 'issue_number': '18305', 'Exception Type': 'assertion error'}
Idx 1/15
{'path': '/content/drive/MyDrive/Code Documentation Project/Issue_Report/issue1/18306.txt', 'issue_number': '18306', 'Exception Type': 'segmentation error'}
Idx 2/15
{'path': '/content/drive/MyDrive/Code Documentation Project/Issue_Report/issue1/18307.txt', 'issue_number': '18307', 'Exception Type': 'null pointer exception'}
Idx 3/15
{'path': '/content/drive/MyDrive/Code Documentation Project/Issue_Report/issue1/18308.txt', 'issue_number': '18308', 'Exception Type': 'assertion error'}
Idx 4/15
{'path': '/content/drive/MyDrive/Code Documentation Project/Issue_Report/issue1/18309.txt', 'issue_number': '18309', 'Exception Type': 'assertion error'}
Idx 5/15
{'path': '/content/drive/MyDrive/Code Documentation Project/Issue_Report/issue1/18316.txt', 'issue_number': '18316', 'Exception Type': 'null pointer exception'}
Idx 6

In [None]:
from llama_index import Document

# TODO: set to higher value if you want more docs
doc_limit = 100

docs = []
for idx, f in enumerate(all_html_files):
    if idx > doc_limit:
        break
    print(f"Idx {idx}/{len(all_html_files)}")
    loaded_docs = reader.load_data(file=f, split_documents=True)
    # Hardcoded Index. Everything before this is ToC for all pages
    start_idx = 72
    loaded_doc = Document(
        text="\n\n".join([d.get_content() for d in loaded_docs[72:]]),
        metadata={"path": str(f)},
    )
    print(loaded_doc.metadata["path"])
    docs.append(loaded_doc)

Define LLM + Service Context + Callback Manager

In [15]:
import openai
import os

os.environ["OPENAI_API_KEY"] = "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"
openai.api_key = os.environ["OPENAI_API_KEY"]
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm)

## Building Multi-Document Agents

In this section we show you how to construct the multi-document agent. We first build a document agent for each document, and then define the top-level parent agent with an object index.

In [16]:
from llama_index import VectorStoreIndex, SummaryIndex

In [17]:
import nest_asyncio

nest_asyncio.apply()

### Build Document Agent for each Document

In this section we define "document agents" for each document.

We define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.

This document agent can dynamically choose to perform semantic search or summarization within a given document.

We create a separate document agent for each city.

In [118]:
from llama_index.agent import OpenAIAgent
from llama_index import load_index_from_storage, StorageContext
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.node_parser import SimpleNodeParser
import os
from tqdm.notebook import tqdm
import pickle


async def build_agent_per_doc(nodes, file_base):
    print(file_base)

    vi_out_path = f"./data/issue1/{file_base}"
    summary_out_path = f"./data/issue1/{file_base}_summary.pkl"
    if not os.path.exists(vi_out_path):
        Path("./data/issue1/").mkdir(parents=True, exist_ok=True)
        # build vector index
        vector_index = VectorStoreIndex(nodes, service_context=service_context)
        vector_index.storage_context.persist(persist_dir=vi_out_path)
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=vi_out_path),
            service_context=service_context,
        )

    # build summary index
    summary_index = SummaryIndex(nodes, service_context=service_context)

    # define query engines
    vector_query_engine = vector_index.as_query_engine()
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize"
    )

    # extract a summary
    if not os.path.exists(summary_out_path):
        Path(summary_out_path).parent.mkdir(parents=True, exist_ok=True)
        summary = str(
            await summary_query_engine.aquery(
                "Extract a concise 1-2 line summary of this document"
            )
        )
        pickle.dump(summary, open(summary_out_path, "wb"))
    else:
        summary = pickle.load(open(summary_out_path, "rb"))

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name=f"vector_tool_{file_base}",
                description=f"Useful for questions related to specific facts",
            ),
        ),
        QueryEngineTool(
            query_engine=summary_query_engine,
            metadata=ToolMetadata(
                name=f"summary_tool_{file_base}",
                description=f"Useful for summarization questions",
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-4")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
        system_prompt=f"""\
You are a specialized agent designed to answer queries about the `{file_base}.txt` part of issue reports.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
    )

    return agent, summary


async def build_agents(docs):
    node_parser = SimpleNodeParser.from_defaults()

    # Build agents dictionary
    agents_dict = {}
    extra_info_dict = {}

    # # this is for the baseline
    # all_nodes = []

    for idx, doc in enumerate(tqdm(docs)):
        nodes = node_parser.get_nodes_from_documents([doc])
        # all_nodes.extend(nodes)

        # ID will be base + parent
        file_path = Path(doc.metadata["path"])
        file_base = str(file_path.parent.stem) + "_" + str(file_path.stem)
        agent, summary = await build_agent_per_doc(nodes, file_base)

        agents_dict[file_base] = agent
        extra_info_dict[file_base] = {"summary": summary, "nodes": nodes}

    return agents_dict, extra_info_dict

In [119]:
agents_dict, extra_info_dict = await build_agents(docs)

  0%|          | 0/15 [00:00<?, ?it/s]

issue1_18305
issue1_18306
issue1_18307
issue1_18308
issue1_18309
issue1_18316
issue1_18318
issue1_18320
issue1_18321
issue1_18322
issue1_18326
issue1_18328
issue1_18329
issue1_18324
issue1_18304


### Build Retriever-Enabled OpenAI Agent

We build a top-level agent that can orchestrate across the different document agents to answer any user query.

This `RetrieverOpenAIAgent` performs tool retrieval before tool use (unlike a default agent that tries to put all tools in the prompt).

**Improvements from V0**: We make the following improvements compared to the "base" version in V0.

- Adding in reranking: we use Cohere reranker to better filter the candidate set of documents.
- Adding in a query planning tool: we add an explicit query planning tool that's dynamically created based on the set of retrieved tools.


In [120]:
# define tool for each document agent
all_tools = []
for file_base, agent in agents_dict.items():
    summary = extra_info_dict[file_base]["summary"]
    doc_tool = QueryEngineTool(
        query_engine=agent,
        metadata=ToolMetadata(
            name=f"tool_{file_base}",
            description=summary,
        ),
    )
    all_tools.append(doc_tool)

In [121]:
print(all_tools[0].metadata)

ToolMetadata(description='Empty Response', name='tool_issue1_18305', fn_schema=<class 'llama_index.tools.types.DefaultToolFnSchema'>)


In [122]:
# define an "object" index and retriever over these tools
from llama_index import VectorStoreIndex
from llama_index.objects import (
    ObjectIndex,
    SimpleToolNodeMapping,
    ObjectRetriever,
)
from llama_index.retrievers import BaseRetriever
from llama_index.indices.postprocessor import CohereRerank
from llama_index.tools import QueryPlanTool
from llama_index.query_engine import SubQuestionQueryEngine
from llama_index.llms import OpenAI

llm = OpenAI(model_name="gpt-4-0613")

tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
obj_index = ObjectIndex.from_objects(
    all_tools,
    tool_mapping,
    VectorStoreIndex,
)
vector_node_retriever = obj_index.as_node_retriever(similarity_top_k=10)


# define a custom retriever with reranking
class CustomRetriever(BaseRetriever):
    def __init__(self, vector_retriever, postprocessor=None):
        self._vector_retriever = vector_retriever
        self._postprocessor = postprocessor or CohereRerank(top_n=5)

    def _retrieve(self, query_bundle):
        retrieved_nodes = self._vector_retriever.retrieve(query_bundle)
        filtered_nodes = self._postprocessor.postprocess_nodes(
            retrieved_nodes, query_bundle=query_bundle
        )

        return filtered_nodes


# define a custom object retriever that adds in a query planning tool
class CustomObjectRetriever(ObjectRetriever):
    def __init__(self, retriever, object_node_mapping, all_tools, llm=None):
        self._retriever = retriever
        self._object_node_mapping = object_node_mapping
        self._llm = llm or OpenAI("gpt-4-0613")

    def retrieve(self, query_bundle):
        nodes = self._retriever.retrieve(query_bundle)
        tools = [self._object_node_mapping.from_node(n.node) for n in nodes]

        sub_question_sc = ServiceContext.from_defaults(llm=self._llm)
        sub_question_engine = SubQuestionQueryEngine.from_defaults(
            query_engine_tools=tools, service_context=sub_question_sc
        )
        sub_question_description = f"""\
Useful for any queries that involve comparing multiple documents. ALWAYS use this tool for comparison queries - make sure to call this \
tool with the original query. Do NOT use the other tools for any queries involving multiple documents.
"""
        sub_question_tool = QueryEngineTool(
            query_engine=sub_question_engine,
            metadata=ToolMetadata(
                name="compare_tool", description=sub_question_description
            ),
        )

        return tools + [sub_question_tool]

In [24]:
import os
os.environ['COHERE_API_KEY'] = 'VJaf75J148x8fhvBGz7cPs9c3Ujmq2xFnUdUibW8'

In [26]:
pip install cohere

Installing collected packages: fastavro, cohere
Successfully installed cohere-4.34 fastavro-1.8.2


In [125]:
custom_node_retriever = CustomRetriever(vector_node_retriever)

# wrap it with ObjectRetriever to return objects
custom_obj_retriever = CustomObjectRetriever(
    custom_node_retriever, tool_mapping, all_tools, llm=llm
)

In [126]:
tmps = custom_obj_retriever.retrieve("hello")
print(len(tmps))

6


In [127]:
from llama_index.agent import FnRetrieverOpenAIAgent, ReActAgent

top_agent = FnRetrieverOpenAIAgent.from_retriever(
    custom_obj_retriever,
    system_prompt=""" \
You are an agent designed to answer queries about the documentation.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    llm=llm,
    verbose=True,
)

# top_agent = ReActAgent.from_tools(
#     tool_retriever=custom_obj_retriever,
#     system_prompt=""" \
# You are an agent designed to answer queries about the documentation.
# Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

# """,
#     llm=llm,
#     verbose=True,
# )

### Define Baseline Vector Store Index

As a point of comparison, we define a "naive" RAG pipeline which dumps all docs into a single vector index collection.

We set the top_k = 4

In [128]:
all_nodes = [
    n for extra_info in extra_info_dict.values() for n in extra_info["nodes"]
]

In [None]:
all_nodes[0]

In [130]:
base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)

## Running Example Queries

Let's run some example queries, ranging from QA / summaries over a single document to QA / summarization over multiple documents.

In [168]:
response = base_query_engine.query(
    "List out all performance-related issues")
print(str(response))

There are no specific performance-related issues mentioned in the given context information.


In [32]:
response = top_agent.query(
    "Tell me about the different types of evaluation in LlamaIndex"
)

In [33]:
print(response)

There are two types of evaluation in LlamaIndex:

1. LLM-based evaluation: This type of evaluation is facilitated by the BaseEvaluator class. It involves running evaluations using query strings, retrieved contexts, and generated response strings. The BaseEvaluator class provides methods for managing prompts and customizing the evaluation logic.

2. Retrieval-based evaluation: This type of evaluation is facilitated by the BaseRetrievalEvaluator class. It involves evaluating the performance of retrievers using metrics. The BaseRetrievalEvaluator class includes a field for specifying the metrics to be evaluated.

Both types of evaluation allow for customization and can be used to assess the performance of different components in the LlamaIndex framework.


In [138]:
# baseline
response = base_query_engine.query(
    "Tell me about the different types of evaluation in LlamaIndex"
)
print(str(response))

There is no information provided in the given context about LlamaIndex or any types of evaluation related to it.


In [139]:
response = base_query_engine.query(
    "Can you compare the tree index and list index at a very high-level?"
)
print(str(response))

Yes, at a very high-level, the tree index and list index are both used to organize and access data. However, they have different structures and characteristics. A tree index is typically used in hierarchical data structures and allows for efficient searching, insertion, and deletion operations. It organizes data in a tree-like structure with nodes and branches. On the other hand, a list index is used in linear data structures and provides sequential access to elements. It stores data in a linear sequence and allows for efficient traversal and indexing operations. Overall, while both the tree index and list index serve the purpose of organizing and accessing data, they have different structures and are suited for different types of data and operations.


In [140]:
response = top_agent.query(
    "Compare the content in the contributions page vs. index page."
)

STARTING TURN 1
---------------

=== Calling Function ===
Calling function: compare_tool with args: {
  "input": "content in the contributions page vs. index page"
}
Generated 2 sub questions.
[1;3;38;2;237;90;200m[tool_issue1_18308] Q: What is the content in the contributions page?
[0mSTARTING TURN 1
---------------

[1;3;38;2;90;149;237m[tool_issue1_18308] Q: What is the content in the index page?
[0mSTARTING TURN 1
---------------

=== Calling Function ===
Calling function: vector_tool_issue1_18308 with args: {
  "input": "contributions page"
}
Got output: The contributions page is not mentioned in the given context information.

STARTING TURN 2
---------------

=== Calling Function ===
Calling function: vector_tool_issue1_18308 with args: {
  "input": "index page content"
}
Got output: The given context information does not provide any information about an index page or its content. Therefore, it is not possible to answer the query based on the provided context.

STARTING TURN 

In [36]:
print(response)

LlamaIndex supports two types of evaluation: LLM-based evaluation and retrieval-based evaluation. LLM-based evaluation involves evaluating the generated response using a powerful language model like GPT-4. Retrieval-based evaluation, on the other hand, involves evaluating the ranked list of items generated by the retrievers. Both types of evaluation modules are available in LlamaIndex.


In [38]:
response = top_agent.query(
    "Can you compare the tree index and list index at a very high-level?"
)

STARTING TURN 1
---------------

=== Calling Function ===
Calling function: compare_tool with args: {
"input": "Compare the tree index and list index at a high level."
}
Generated 4 sub questions.
[1;3;38;2;237;90;200m[tool_indices_tree] Q: What is the purpose of the Tree Index?
[0mSTARTING TURN 1
---------------

[1;3;38;2;90;149;237m[tool_indices_list] Q: What is the purpose of the List Index?
[0mSTARTING TURN 1
---------------

[1;3;38;2;11;159;203m[tool_indices_tree] Q: What are the differences between the Tree Index and the List Index?
[0mSTARTING TURN 1
---------------

[1;3;38;2;155;135;227m[tool_indices_list] Q: What are the differences between the Tree Index and the List Index?
[0mSTARTING TURN 1
---------------

=== Calling Function ===
Calling function: summary_tool_indices_tree with args: {
  "input": "purpose of Tree Index"
}
=== Calling Function ===
Calling function: vector_tool_indices_list with args: {
  "input": "Tree Index vs List Index"
}
=== Calling Function

BadRequestError: ignored

In [39]:
print(str(response))

LlamaIndex supports two types of evaluation: LLM-based evaluation and retrieval-based evaluation. LLM-based evaluation involves evaluating the generated response using a powerful language model like GPT-4. Retrieval-based evaluation, on the other hand, involves evaluating the ranked list of items generated by the retrievers. Both types of evaluation modules are available in LlamaIndex.


In [45]:
response = top_agent.query(
    "which is the best agent comparing retrieval"
)
print(str(response))

STARTING TURN 1
---------------

=== Calling Function ===
Calling function: compare_tool with args: {
  "input": "Which agent is the best for comparing retrieval?"
}
Generated 5 sub questions.
[1;3;38;2;237;90;200m[tool_low_level_oss_ingestion_retrieval] Q: What is the performance of Llama 2 compared to other models on retrieval benchmarks?
[0mSTARTING TURN 1
---------------

[1;3;38;2;90;149;237m[tool_query_retrievers] Q: What are the different types of retrievers in the LLAMA Index API?
[0mSTARTING TURN 1
---------------

[1;3;38;2;11;159;203m[tool_retrievers_list] Q: What retrievers are available in the SummaryIndex?
[0mSTARTING TURN 1
---------------

[1;3;38;2;155;135;227m[tool_retrievers_table] Q: What retrievers are available for keyword tables?
[0mSTARTING TURN 1
---------------

[1;3;38;2;237;90;200m[tool_low_level_retrieval] Q: How can I build a standard retriever against a vector database using Pinecone?
[0mSTARTING TURN 1
---------------

=== Calling Function ===
