# Lesson 4: Building a Multi-Document Agent

## Setup

In [1]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio
nest_asyncio.apply()

## 1. Setup an agent over 3 papers

**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`.

In [7]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf",
]

papers = ["pdfs/" + p for p in papers]
papers

['pdfs/metagpt.pdf', 'pdfs/longlora.pdf', 'pdfs/selfrag.pdf']

In [8]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: pdfs/metagpt.pdf
Getting tools for paper: pdfs/longlora.pdf
Getting tools for paper: pdfs/selfrag.pdf


In [9]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [10]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [11]:
len(initial_tools)

6

In [12]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [13]:
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation dataset"}
=== Function Output ===
The evaluation dataset used in the experiments is the PG19 test split.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation results"}
=== Function Output ===
The evaluation results indicate that the models achieve better perplexity with longer context sizes, showcasing the effectiveness of the efficient fine-tuning method. The models demonstrate improved efficiency and comparable performance to full attention or full fine-tuning baselines as the context size increases. Additionally, the results highlight the models' state-of-the-art performance on various benchmarks and tasks, emphasizing their effectiveness, generalization capabilities, and efficiency improvements compared to existi

In [14]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework that enhances the quality and factuality of large language models through retrieval on demand and self-reflection. It involves training language models to learn to retrieve, generate, and critique text passages and their own generation using special tokens called reflection tokens. This framework enables the model to tailor its behavior at test time by leveraging reflection tokens, leading to significant improvements in performance, factuality, and citation accuracy compared to other models. Additionally, Self-RAG focuses on browser-assisted question-answering with a particular emphasis on evaluating the relevance, supportiveness, and usefulness of the generated responses. It uses reflection tokens to assess whether the output is fully supported by the evidence provi

## 2. Setup an agent over 11 papers

### Download 11 ICLR papers

In [15]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D"
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "loftq.pdf",
    "swebench.pdf",
    "selfrag.pdf",
    "zipformer.pdf",
    "values.pdf",
    "finetune_fair_diffusion.pdf",
    "knowledge_card.pdf",
    "metra.pdf",
    "vr_mcl.pdf"
]
papers = ["pdfs/" + p for p in papers]
papers

['pdfs/metagpt.pdf',
 'pdfs/longlora.pdf',
 'pdfs/loftq.pdf',
 'pdfs/swebench.pdf',
 'pdfs/selfrag.pdf',
 'pdfs/zipformer.pdf',
 'pdfs/values.pdf',
 'pdfs/finetune_fair_diffusion.pdf',
 'pdfs/knowledge_card.pdf',
 'pdfs/metra.pdf',
 'pdfs/vr_mcl.pdf']

To download these papers, below is the needed code:


    #for url, paper in zip(urls, papers):
         #!wget "{url}" -O "{paper}"
    
    
**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`.

In [16]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: pdfs/metagpt.pdf
Getting tools for paper: pdfs/longlora.pdf
Getting tools for paper: pdfs/loftq.pdf
Getting tools for paper: pdfs/swebench.pdf
Getting tools for paper: pdfs/selfrag.pdf
Getting tools for paper: pdfs/zipformer.pdf
Getting tools for paper: pdfs/values.pdf
Getting tools for paper: pdfs/finetune_fair_diffusion.pdf
Getting tools for paper: pdfs/knowledge_card.pdf
Getting tools for paper: pdfs/metra.pdf
Getting tools for paper: pdfs/vr_mcl.pdf


### Extend the Agent with Tool Retrieval

In [18]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [19]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [20]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [21]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)

In [22]:
tools[2].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_values', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [23]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [24]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench
=== Calling Function ===
Calling function: summary_tool_metra with args: {"input": "evaluation dataset used in MetaGPT"}
=== Function Output ===
The evaluation dataset used in MetaGPT is not explicitly mentioned in the provided context information.
=== Calling Function ===
Calling function: summary_tool_swebench with args: {"input": "evaluation dataset used in SWE-Bench"}
=== Function Output ===
The evaluation dataset used in SWE-Bench consists of task instances constructed from pull requests that are merged, resolve issues in the repository, and introduce new tests. Each task instance includes the codebase, problem statement aggregated from related issues, test patch, and gold patch. The dataset is validated through execution-based verification to ensure usability and non-triviality of the solutions. It is designed to be easily updatable with new task instances based on 

In [25]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA paper"}
=== Function Output ===
The LongLoRA paper introduces an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs) with minimal accuracy compromise. It utilizes Shifted Sparse Attention (S2-Attn) to approximate the standard self-attention pattern during training, allowing for significant extension of context window while reducing GPU memory cost and training time compared to standard full fine-tuning. By combining improved LoRA with S2-Attn, LongLoRA achieves strong empirical results on various tasks and model sizes. Additionally, the paper focuses on context scaling through contrastive training, utilizing DeepSpeed and Flash-Attention2 to maximize context length experiments and highlighting the impact of grou