# Lesson 4: Auto-merging Retrieval

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import utils

import os
import openai
openai.api_key = utils.get_openai_api_key()

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input response will be set to __record__.calls[-1].rets.source_nodes[:].node.text .
✅ In Groundedness, input source will be set to __record__.calls[-1].rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .


In [4]:
#from llama_index import SimpleDirectoryReader
from llama_index.core import SimpleDirectoryReader


documents = SimpleDirectoryReader(
    input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()

In [5]:
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])

<class 'list'> 

41 

<class 'llama_index.core.schema.Document'>
Doc ID: 68216949-af03-4f95-bf68-1db94674681b
Text: PAGE 1 Founder, DeepLearning.AI Collected Insights from Andrew
Ng How to  Build Your Career in AI A Simple Guide


## Auto-merging retrieval setup

In [6]:
#from llama_index import Document
from llama_index.core import Document


document = Document(text="\n\n".join([doc.text for doc in documents]))

In [8]:
#from llama_index.node_parser import HierarchicalNodeParser
from llama_index.core.node_parser import HierarchicalNodeParser, get_leaf_nodes


# create the hierarchical node parser w/ default settings
node_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128]
)

In [9]:
nodes = node_parser.get_nodes_from_documents([document])

In [11]:
#from llama_index.node_parser import get_leaf_nodes

leaf_nodes = get_leaf_nodes(nodes)
print(leaf_nodes[30].text)

Of course, I also encourage learning driven by curiosity. If something interests you, go ahead 
and learn it regardless of how useful it might turn out to be!  Maybe this will lead to a creative 
spark or technical breakthrough.
How much math do you need to know to be a machine learning engineer?


In [12]:
nodes_by_id = {node.node_id: node for node in nodes}

parent_node = nodes_by_id[leaf_nodes[30].parent_node.node_id]
print(parent_node.text)

On some days, maybe you’ll end up studying for an 
hour or longer.

PAGE 12
Should You 
Learn Math to 
Get a Job in AI? 
CHAPTER 3
LEARNING

PAGE 13
Should you Learn Math to Get a Job in AI? CHAPTER 3
Is math a foundational skill for AI? It’s always nice to know more math! But there’s so much to 
learn that, realistically, it’s necessary to prioritize. Here’s how you might go about strengthening 
your math background.
To figure out what’s important to know, I find it useful to ask what you need to know to make 
the decisions required for the work you want to do. At DeepLearning.AI, we frequently ask, 
“What does someone need to know to accomplish their goals?” The goal might be building a 
machine learning model, architecting a system, or passing a job interview.
Understanding the math behind algorithms you use is often helpful, since it enables you to 
debug them. But the depth of knowledge that’s useful changes over time. As machine learning 
techniques mature and become more reliabl

### Building the index

In [13]:
#from llama_index.llms import OpenAI
from llama_index.llms.openai import OpenAI


llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

In [19]:
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor, SentenceTransformerRerank
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.embeddings.openai import OpenAIEmbedding
import os

def _ensure_embed_model(embed_model):
    # Accept an embedding object or a string like "local:BAAI/bge-small-en-v1.5"
    if hasattr(embed_model, "get_text_embedding"):
        return embed_model
    if isinstance(embed_model, str):
        if embed_model.startswith("local:"):
            return HuggingFaceEmbedding(model_name=embed_model.split("local:", 1)[1])
        return HuggingFaceEmbedding(model_name=embed_model)  # treat as HF model name
    raise ValueError("embed_model must be an embedding object or model name string.")


def build_automerging_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index",
    chunk_sizes=None,
):
    chunk_sizes = chunk_sizes or [2048, 512, 128]

    # 1) parse hierarchy → leaf nodes
    node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
    nodes = node_parser.get_nodes_from_documents(documents)
    leaf_nodes = get_leaf_nodes(nodes)

    # 2) ensure embedding object
    embed = _ensure_embed_model(embed_model)

    # 3) build or load index (no ServiceContext)
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    if not os.path.exists(save_dir):
        index = VectorStoreIndex(
            leaf_nodes,
            storage_context=storage_context,
            llm=llm,
            embed_model=embed,
        )
        index.storage_context.persist(persist_dir=save_dir)
    else:
        storage = StorageContext.from_defaults(persist_dir=save_dir)
        index = load_index_from_storage(storage, llm=llm, embed_model=embed)

    return index

In [18]:
"""
from llama_index import ServiceContext

auto_merging_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    node_parser=node_parser,
)
"""

'\nfrom llama_index import ServiceContext\n\nauto_merging_context = ServiceContext.from_defaults(\n    llm=llm,\n    embed_model="local:BAAI/bge-small-en-v1.5",\n    node_parser=node_parser,\n)\n'

In [20]:
"""
from llama_index import VectorStoreIndex, StorageContext

storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

automerging_index = VectorStoreIndex(
    leaf_nodes, storage_context=storage_context, service_context=auto_merging_context
)

automerging_index.storage_context.persist(persist_dir="./merging_index")
"""

'\nfrom llama_index import VectorStoreIndex, StorageContext\n\nstorage_context = StorageContext.from_defaults()\nstorage_context.docstore.add_documents(nodes)\n\nautomerging_index = VectorStoreIndex(\n    leaf_nodes, storage_context=storage_context, service_context=auto_merging_context\n)\n\nautomerging_index.storage_context.persist(persist_dir="./merging_index")\n'

In [21]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
automerging_index = build_automerging_index(
    documents,                                 # list[Document]
    llm=llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index",
    chunk_sizes=[2048, 512, 128],
)


2025-08-30 17:08:54,011 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
2025-08-30 17:08:56,227 - INFO - 1 prompt is loaded, with the key: query


In [22]:
"""
# This block of code is optional to check
# if an index file exist, then it will load it
# if not, it will rebuild it

import os
from llama_index import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index import load_index_from_storage

if not os.path.exists("./merging_index"):
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    automerging_index = VectorStoreIndex(
            leaf_nodes,
            storage_context=storage_context,
            service_context=auto_merging_context
        )

    automerging_index.storage_context.persist(persist_dir="./merging_index")
else:
    automerging_index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="./merging_index"),
        service_context=auto_merging_context
    )
"""

'\n# This block of code is optional to check\n# if an index file exist, then it will load it\n# if not, it will rebuild it\n\nimport os\nfrom llama_index import VectorStoreIndex, StorageContext, load_index_from_storage\nfrom llama_index import load_index_from_storage\n\nif not os.path.exists("./merging_index"):\n    storage_context = StorageContext.from_defaults()\n    storage_context.docstore.add_documents(nodes)\n\n    automerging_index = VectorStoreIndex(\n            leaf_nodes,\n            storage_context=storage_context,\n            service_context=auto_merging_context\n        )\n\n    automerging_index.storage_context.persist(persist_dir="./merging_index")\nelse:\n    automerging_index = load_index_from_storage(\n        StorageContext.from_defaults(persist_dir="./merging_index"),\n        service_context=auto_merging_context\n    )\n'

### Defining the retriever and running the query engine

In [23]:
"""
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index.retrievers import AutoMergingRetriever
from llama_index.query_engine import RetrieverQueryEngine

automerging_retriever = automerging_index.as_retriever(
    similarity_top_k=12
)

retriever = AutoMergingRetriever(
    automerging_retriever, 
    automerging_index.storage_context, 
    verbose=True
)

rerank = SentenceTransformerRerank(top_n=6, model="BAAI/bge-reranker-base")

auto_merging_engine = RetrieverQueryEngine.from_args(
    automerging_retriever, node_postprocessors=[rerank]
)
"""

from llama_index.core.retrievers import AutoMergingRetriever
from llama_index.core.postprocessor import SentenceTransformerRerank
from llama_index.core.query_engine import RetrieverQueryEngine

automerging_retriever = automerging_index.as_retriever(similarity_top_k=12)

retriever = AutoMergingRetriever(
    automerging_retriever, 
    automerging_index.storage_context, 
    verbose=True
)

rerank = SentenceTransformerRerank(top_n=6, model="BAAI/bge-reranker-base")

auto_merging_engine = RetrieverQueryEngine.from_args(
    retriever, 
    node_postprocessors=[rerank]
)


In [24]:
auto_merging_response = auto_merging_engine.query(
    "What is the importance of networking in AI?"
)

2025-08-30 17:13:01,004 - INFO - > Merging 3 nodes into parent node.
> Parent node id: 84d8f26e-725b-4e02-bd9b-74289acff8e3.
> Parent node text: PAGE 35
Keys to Building a Career in AI CHAPTER 10
The path to career success in AI is more compl...

2025-08-30 17:13:01,004 - INFO - > Merging 1 nodes into parent node.
> Parent node id: bc95952a-474d-4e44-b358-ad526ebee589.
> Parent node text: PAGE 35
Keys to Building a Career in AI CHAPTER 10
The path to career success in AI is more compl...



> Merging 3 nodes into parent node.
> Parent node id: 84d8f26e-725b-4e02-bd9b-74289acff8e3.
> Parent node text: PAGE 35
Keys to Building a Career in AI CHAPTER 10
The path to career success in AI is more compl...

> Merging 1 nodes into parent node.
> Parent node id: bc95952a-474d-4e44-b358-ad526ebee589.
> Parent node text: PAGE 35
Keys to Building a Career in AI CHAPTER 10
The path to career success in AI is more compl...



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

2025-08-30 17:13:04,484 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [26]:
#from llama_index.response.notebook_utils import display_response
from llama_index.core.response.notebook_utils import display_response

display_response(auto_merging_response)


**`Final Response:`** Networking in AI is crucial as it helps in building a strong professional community and support system. By connecting with others in the field, individuals can gain valuable insights, advice, and opportunities that can propel their career forward. Additionally, networking allows individuals to stay updated on industry trends, collaborate on projects, and receive mentorship from experienced professionals. Building a network in AI can open doors to new job opportunities and help individuals navigate their career path more effectively.

## Putting it all Together

In [27]:
"""
import os

from llama_index import (
    ServiceContext,
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)
from llama_index.node_parser import HierarchicalNodeParser
from llama_index.node_parser import get_leaf_nodes
from llama_index import StorageContext, load_index_from_storage
from llama_index.retrievers import AutoMergingRetriever
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index.query_engine import RetrieverQueryEngine


def build_automerging_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index",
    chunk_sizes=None,
):
    chunk_sizes = chunk_sizes or [2048, 512, 128]
    node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
    nodes = node_parser.get_nodes_from_documents(documents)
    leaf_nodes = get_leaf_nodes(nodes)
    merging_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
    )
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    if not os.path.exists(save_dir):
        automerging_index = VectorStoreIndex(
            leaf_nodes, storage_context=storage_context, service_context=merging_context
        )
        automerging_index.storage_context.persist(persist_dir=save_dir)
    else:
        automerging_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=save_dir),
            service_context=merging_context,
        )
    return automerging_index


def get_automerging_query_engine(
    automerging_index,
    similarity_top_k=12,
    rerank_top_n=6,
):
    base_retriever = automerging_index.as_retriever(similarity_top_k=similarity_top_k)
    retriever = AutoMergingRetriever(
        base_retriever, automerging_index.storage_context, verbose=True
    )
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n, model="BAAI/bge-reranker-base"
    )
    auto_merging_engine = RetrieverQueryEngine.from_args(
        retriever, node_postprocessors=[rerank]
    )
    return auto_merging_engine
"""

'\nimport os\n\nfrom llama_index import (\n    ServiceContext,\n    StorageContext,\n    VectorStoreIndex,\n    load_index_from_storage,\n)\nfrom llama_index.node_parser import HierarchicalNodeParser\nfrom llama_index.node_parser import get_leaf_nodes\nfrom llama_index import StorageContext, load_index_from_storage\nfrom llama_index.retrievers import AutoMergingRetriever\nfrom llama_index.indices.postprocessor import SentenceTransformerRerank\nfrom llama_index.query_engine import RetrieverQueryEngine\n\n\ndef build_automerging_index(\n    documents,\n    llm,\n    embed_model="local:BAAI/bge-small-en-v1.5",\n    save_dir="merging_index",\n    chunk_sizes=None,\n):\n    chunk_sizes = chunk_sizes or [2048, 512, 128]\n    node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)\n    nodes = node_parser.get_nodes_from_documents(documents)\n    leaf_nodes = get_leaf_nodes(nodes)\n    merging_context = ServiceContext.from_defaults(\n        llm=llm,\n        embed_model=em

In [29]:
#from llama_index.llms import OpenAI

index = build_automerging_index(
    [document],
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    save_dir="./merging_index",
)


2025-08-30 17:18:44,240 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5
2025-08-30 17:18:46,905 - INFO - 1 prompt is loaded, with the key: query
2025-08-30 17:18:47,036 - INFO - Loading all indices.


Loading llama_index.core.storage.kvstore.simple_kvstore from ./merging_index/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./merging_index/index_store.json.


In [30]:
from utils import get_automerging_query_engine

query_engine = get_automerging_query_engine(index, similarity_top_k=6)

#automerging_query_engine = get_automerging_query_engine(
#    automerging_index,
#)

## TruLens Evaluation

In [32]:
from trulens_eval import Tru

Tru().reset_database()

2025-08-30 17:20:53,029 - INFO - Context impl SQLiteImpl.
2025-08-30 17:20:53,030 - INFO - Will assume non-transactional DDL.
Updating app_name and app_version in apps table: 0it [00:00, ?it/s]
Updating app_id in records table: 0it [00:00, ?it/s]
Updating app_json in apps table: 0it [00:00, ?it/s]


### Two layers

In [33]:
auto_merging_index_0 = build_automerging_index(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index_0",
    chunk_sizes=[2048,512],
)

In [34]:
auto_merging_engine_0 = get_automerging_query_engine(
    auto_merging_index_0,
    similarity_top_k=12,
    rerank_top_n=6,
)

In [35]:
from utils import get_prebuilt_trulens_recorder

tru_recorder = get_prebuilt_trulens_recorder(
    auto_merging_engine_0,
    app_id ='app_0'
)

instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.embeddings.multi_modal_base.MultiModalEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.base.embeddings.base.BaseEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.TransformComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.BaseComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'pydantic.main.BaseModel'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base

In [36]:
eval_questions = []
with open('generated_questions.text', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        eval_questions.append(item)

In [37]:
def run_evals(eval_questions, tru_recorder, query_engine):
    for question in eval_questions:
        with tru_recorder as recording:
            response = query_engine.query(question)

In [38]:
run_evals(eval_questions, tru_recorder, auto_merging_engine_0)

> Merging 1 nodes into parent node.
> Parent node id: 759576fe-3ecf-424b-9e19-c1af1c0ab747.
> Parent node text: PAGE 26
If you’re considering a role switch, a startup can be an easier place to do it than a big...

> Merging 1 nodes into parent node.
> Parent node id: d07d4c21-d33c-458f-954f-bdda8c12d205.
> Parent node text: PAGE 25
Finding a job has a few predictable steps that include selecting the companies to which y...

> Merging 1 nodes into parent node.
> Parent node id: 44e8a07d-45b4-466d-92ef-f6501ce4219b.
> Parent node text: PAGE 33
Choose who to work with. It’s tempting to take a position because of the projects you’ll ...

> Merging 1 nodes into parent node.
> Parent node id: 538b4792-5c97-4070-be81-2d431688abab.
> Parent node text: PAGE 23
Each project is only one step on a longer journey, hopefully one that has a positive impa...

> Merging 1 nodes into parent node.
> Parent node id: f70b01a2-5dc8-47ef-935f-14f2327b7b5e.
> Parent node text: PAGE 29
If you’re preparing to s

In [39]:
from trulens_eval import Tru

Tru().get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Unnamed: 1_level_0,Answer Relevance,Context Relevance,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
app_0,base,0.982456,0.222222,1.924546,0.003636


In [40]:
Tru().run_dashboard()

Starting dashboard ...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://localhost:54220 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

### Three layers

In [41]:
auto_merging_index_1 = build_automerging_index(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index_1",
    chunk_sizes=[2048,512,128],
)

In [42]:
auto_merging_engine_1 = get_automerging_query_engine(
    auto_merging_index_1,
    similarity_top_k=12,
    rerank_top_n=6,
)


In [43]:
tru_recorder = get_prebuilt_trulens_recorder(
    auto_merging_engine_1,
    app_id ='app_1'
)

instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.embeddings.multi_modal_base.MultiModalEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.base.embeddings.base.BaseEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.TransformComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.BaseComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'pydantic.main.BaseModel'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base

In [44]:
run_evals(eval_questions, tru_recorder, auto_merging_engine_1)

> Merging 4 nodes into parent node.
> Parent node id: f655159a-2b9d-4187-84a9-0d7089feef4e.
> Parent node text: PAGE 26
If you’re considering a role switch, a startup can be an easier place to do it than a big...

> Merging 4 nodes into parent node.
> Parent node id: f0f168ad-5f2c-4d35-a50c-a2eb53bca682.
> Parent node text: PAGE 25
Finding a job has a few predictable steps that include selecting the companies to which y...

> Merging 1 nodes into parent node.
> Parent node id: ed9e26e3-682c-4aeb-8491-171f3a4cca2e.
> Parent node text: PAGE 26
If you’re considering a role switch, a startup can be an easier place to do it than a big...

> Merging 1 nodes into parent node.
> Parent node id: feb2f0e2-2e37-490f-b777-c1fd856d7344.
> Parent node text: PAGE 25
Finding a job has a few predictable steps that include selecting the companies to which y...

> Merging 5 nodes into parent node.
> Parent node id: e4e4c7c2-1478-45c9-a72e-677850c58e6a.
> Parent node text: PAGE 27
There’s a lot we don’t k

In [45]:
from trulens_eval import Tru

Tru().get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Unnamed: 1_level_0,Answer Relevance,Context Relevance,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
app_0,base,0.986111,0.222222,1.924546,0.003636
app_1,base,,,1.646087,0.001863


In [46]:
Tru().run_dashboard()

Starting dashboard ...
Dashboard already running at path:   Local URL: http://localhost:54220



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>