<span style="color:blue">

We would like to create RAPTOR powered Question-Answering system based on some publicly available arxiv research papers in finance domain using OpenAI LLM on local system.

For this, we shall create a RAG application with RAPTOR retriever using OpenAI API. We shall feed the documents in it. After the RAG pipeline is ready, we shall use the questions to get responses from the RAG pipeline.

So the steps are as follows:

1. Download 2 research papers from arxiv.
2. Ingest the documents into a chroma vectordb through RAPTOR method
3. Define the Retriever
4. Define the RAG pipeline
5. Ask questions to the RAG application.

</span>

# Initialize modules

In [1]:
import nest_asyncio
import os

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
    
nest_asyncio.apply()

# 1. Download Data

In [2]:
!wget https://arxiv.org/pdf/2309.13064 -O ./invest_lm.pdf
!wget https://arxiv.org/pdf/2306.12659 -O ./instruct_fingpt.pdf

--2025-03-10 17:06:55--  https://arxiv.org/pdf/2309.13064
Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.3.42, 151.101.131.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 230648 (225K) [application/pdf]
Saving to: ‘./invest_lm.pdf’


2025-03-10 17:06:55 (4.95 MB/s) - ‘./invest_lm.pdf’ saved [230648/230648]

--2025-03-10 17:06:55--  https://arxiv.org/pdf/2306.12659
Resolving arxiv.org (arxiv.org)... 151.101.3.42, 151.101.131.42, 151.101.195.42, ...
connected. to arxiv.org (arxiv.org)|151.101.3.42|:443... 
200 OKequest sent, awaiting response... 
Length: 247127 (241K) [application/pdf]
Saving to: ‘./instruct_fingpt.pdf’


2025-03-10 17:06:55 (5.40 MB/s) - ‘./instruct_fingpt.pdf’ saved [247127/247127]



# 2. Ingest Finance research papers

## 2.1 Load documents

In [3]:
from llama_index.core import SimpleDirectoryReader

fin_documents = SimpleDirectoryReader(input_files=["./invest_lm.pdf","./instruct_fingpt.pdf"]).load_data()

## 2.2 Instantiate vectordb

In [4]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

client = chromadb.PersistentClient(path="./finance_knowledge_db")
collection = client.get_or_create_collection("fin_raptor")

vector_store = ChromaVectorStore(chroma_collection=collection)

## 2.3 Define the summary module

In [5]:
from llama_index.llms.openai import OpenAI
from llama_index.packs.raptor.base import SummaryModule

summary_prompt = "As a professional summarizer, create a concise and comprehensive summary of the provided text, \
                    be it an article, post, conversation, or passage with as much detail as possible."

summary_module = SummaryModule(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1), summary_prompt=summary_prompt, num_workers=16
)

## 2.4 Define the RAPTOR PACK and ingest the documents

In [6]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.packs.raptor import RaptorPack


raptor_pack = RaptorPack(
    fin_documents,
    embed_model=OpenAIEmbedding(
        model="text-embedding-3-small"
    ),  # used for embedding clusters
    vector_store=vector_store,  # used for storage
    similarity_top_k=2,  # top k for each layer, or overall top-k for collapsed
    mode="collapsed",  # sets default mode
    transformations=[
        SentenceSplitter(chunk_size=400, chunk_overlap=50)
    ],  # transformations applied for ingestion
    summary_module=summary_module,  # used for generating summaries
)

Generating embeddings for level 0.
Performing clustering for level 0.




Generating summaries for level 0 with 10 clusters.
Level 0 created summaries/clusters: 10
Generating embeddings for level 1.
Performing clustering for level 1.
Generating summaries for level 1 with 1 clusters.
Level 1 created summaries/clusters: 1
Generating embeddings for level 2.
Performing clustering for level 2.
Generating summaries for level 2 with 1 clusters.
Level 2 created summaries/clusters: 1


## 2.5 Test retrieval using collapsed mode

In [7]:
nodes = raptor_pack.run("What baselines is InvestLM compared against?", mode="collapsed")
print(len(nodes))
print(nodes[0].text)

2
Baselines. We compare InvestLM with three state-
of-the-art commercial models, GPT-3.5, GPT-4
and Claude-2. OpenAI’s GPT-3.5 and GPT-4 are
large language models tuned with reinforcement
learning from human feedback (RLHF) (Ouyang
et al., 2022). Anthropic’s Claude-2 is a large lan-
guage model that can take up to 100K tokens in the
user’s prompt. 3 Responses from all baselines are
sampled throughout August 2023.
We manually write 30 test questions that are
related to financial markets and investment. For
each question, we generate a single response from
InvestLM and the three commercial models. We
then ask the financial experts to compare InvestLM
responses to each of the baselines and label which
response is better or whether neither response is
significantly better than the other.
In addition to the expert evaluation, we also con-
duct a GPT-4 evaluation, following the same pro-
tocol used in (Zhou et al., 2023). Specifically, we
send GPT-4 with exactly the same instructions and
dat

## 2.6 Test retrieval using tree_traversal mode

In [8]:
nodes = raptor_pack.run(
    "What baselines is InvestLM compared against?", mode="tree_traversal"
)
print(len(nodes))
print(nodes[0].text)

Retrieved parent IDs from level 2: ['bd04e6bd-2b35-4cd7-a0c0-805a7d7d96cd']
Retrieved 1 from parents at level 2.
Retrieved parent IDs from level 1: ['0900c503-731b-4da1-9ad9-ec74cc600235']
Retrieved 2 from parents at level 1.
Retrieved parent IDs from level 0: ['881c7015-ad05-4454-9b0b-b52412c5b08f', '973bf4b8-70ce-4230-bb5a-7ae84b83b98f']
Retrieved 4 from parents at level 0.
4
Baselines. We compare InvestLM with three state-
of-the-art commercial models, GPT-3.5, GPT-4
and Claude-2. OpenAI’s GPT-3.5 and GPT-4 are
large language models tuned with reinforcement
learning from human feedback (RLHF) (Ouyang
et al., 2022). Anthropic’s Claude-2 is a large lan-
guage model that can take up to 100K tokens in the
user’s prompt. 3 Responses from all baselines are
sampled throughout August 2023.
We manually write 30 test questions that are
related to financial markets and investment. For
each question, we generate a single response from
InvestLM and the three commercial models. We
then ask the fi

# 3. Define the Retriever

In [9]:
from llama_index.packs.raptor import RaptorRetriever
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.llms.openai import OpenAI
import chromadb

client = chromadb.PersistentClient(path="./finance_knowledge_db")
collection = client.get_or_create_collection("fin_raptor")
vector_store = ChromaVectorStore(chroma_collection=collection)

retriever = RaptorRetriever(
    [],
    embed_model=OpenAIEmbedding(
        model="text-embedding-3-small"
    ),  
    vector_store=vector_store,  # used for storage
    similarity_top_k=2,  # top k for each layer, or overall top-k for collapsed
    mode="tree_traversal",  # sets default mode
)

# 4. Define the Query Engine (RAG)

In [10]:
from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(
    retriever, llm=OpenAI(model="gpt-4o-mini", temperature=0.1)
)

# 5. Test the query Engine

In [11]:
query = "What baselines is InvestLM compared against?"
response = query_engine.query(query)
print(str(response))

InvestLM is compared against three state-of-the-art commercial models: GPT-3.5, GPT-4, and Claude-2.
