# Ensemble Retrieval Guide - Feedback Rerankers

Oftentimes when building a RAG applications there are many retreival parameters/strategies to decide from (from chunk size to vector vs. keyword vs. hybrid search, for instance). 

This example builds on top of work from Llama-Index to do ensemble retrieval over different chunk sizes and also different indices by extending the rerankers to any feedback function.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/frameworks/llama_index/ensemble_retrieval_feedback_reranker.ipynb)

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "..."

## Setup

In [2]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

In [3]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().handlers = []
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import (
    VectorStoreIndex,
    ListIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext,
    SimpleKeywordTableIndex,
)
from llama_index.response.notebook_utils import display_response
from llama_index.llms import OpenAI

Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
NumExpr defaulting to 8 threads.


## Load Data

We first show how to convert a Document into a set of Nodes, and insert into a DocumentStore.

In [4]:
# try loading great gatsby

from llama_index import VectorStoreIndex, SimpleWebPageReader

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)

In [5]:
# initialize service context (set chunk size)
llm = OpenAI(model="gpt-4")
chunk_sizes = [128, 256, 512, 1024]
service_contexts = []
nodes_list = []
vector_indices = []
query_engines = []
for chunk_size in chunk_sizes:
    print(f"Chunk Size: {chunk_size}")
    service_context = ServiceContext.from_defaults(chunk_size=chunk_size, llm=llm)
    service_contexts.append(service_context)
    nodes = service_context.node_parser.get_nodes_from_documents(documents)

    # add chunk size to nodes to track later
    for node in nodes:
        node.metadata["chunk_size"] = chunk_size
        node.excluded_embed_metadata_keys = ["chunk_size"]
        node.excluded_llm_metadata_keys = ["chunk_size"]

    nodes_list.append(nodes)

    # build vector index
    vector_index = VectorStoreIndex(nodes)
    vector_indices.append(vector_index)

    # query engines
    query_engines.append(vector_index.as_query_engine())

Chunk Size: 128
Chunk Size: 256
Chunk Size: 512
Chunk Size: 1024


## Set up Retrievers with each query engine

In [6]:
# try ensemble retrieval

from llama_index.tools.retriever_tool import RetrieverTool

retriever_tools = []
for chunk_size, vector_index in zip(chunk_sizes, vector_indices):
    retriever_tool = RetrieverTool.from_defaults(
        retriever=vector_index.as_retriever(),
        description=f"Retrieves relevant context from Paul Graham's essay (chunk size {chunk_size})",
    )
    retriever_tools.append(retriever_tool)

In [7]:
from llama_index.selectors.pydantic_selectors import PydanticMultiSelector
from llama_index.retrievers import RouterRetriever


retriever = RouterRetriever(
    selector=PydanticMultiSelector.from_defaults(llm=llm, max_outputs=4),
    retriever_tools=retriever_tools
)

In [8]:
nodes = await retriever.aretrieve(
    "Describe and summarize Paul Graham's journey to founding YC"
)

Selecting retriever 0: This choice retrieves relevant context from Paul Graham's essay with a manageable chunk size, which may contain information about his journey to founding YC..
Selecting retriever 1: This choice retrieves a larger context from the essay, which could provide more detailed information about Paul Graham's journey to founding YC..
Selecting retriever 2: This choice retrieves an even larger context from the essay, which could provide a comprehensive understanding of Paul Graham's journey to founding YC..
Selecting retriever 3: This choice retrieves the largest context from the essay, which could provide the most detailed and comprehensive understanding of Paul Graham's journey to founding YC..
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=23 request_id=933485d101cc05e4799998cce4dc3bcb response_code=200
message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=38 request_id=eb0292eea29b71e0abfdc7bb70082

In [9]:
for node in nodes:
    print(node.node.metadata["chunk_size"])
    print(node.node.get_text())

128
two:
writing essays and working on YC.  
  
YC was different from other kinds of work I've done. Instead of deciding for
myself what to work on, the problems came to me. Every 6 months there was a
new batch of startups, and their problems, whatever they were, became our
problems. It was very engaging work, because their problems were quite varied,
and the good founders were very effective. If you were trying to learn the
most you could about startups in the shortest possible
128
set of customers almost entirely from among
their batchmates.  
  
I had not originally intended YC to be a full-time job. I was going to do
three things: hack, write essays, and work on YC. As YC grew, and I grew more
excited about it, it started to take up a lot more than a third of my
attention. But for the first few years I was still able to work on other
things.  
  
In the summer of 2006, Robert and I started working on a new version of
256
Now lots
of startups get their initial set of customers almos

## Set up Reranker

Here we'll subclass LLM Reranker, replacing the reranking prompt with our own.

In [10]:
from llama_index.indices.postprocessor import LLMRerank
from trulens_eval.feedback_prompts import QS_RELEVANCE

In [33]:
choice_select_prompt = ("A list of documents is shown below. Each document has a number next to it along "
    "with a summary of the document. A question is also provided. \n"
    "Respond with the numbers of the documents "
    "you should consult to answer the question, in order of its score.\n"
    "Do not include any documents that are not relevant to the question. \n"
    f"""The score definition is {QS_RELEVANCE.replace("{question}","{query_str}").replace("{statement}","{context_str}")}\n"""
    "Example format: \n"
    "Document 1:\n<summary of document 1>\n\n"
    "...\n\n"
    "Document 10:\n<summary of document 10>\n\n"
    "Question: <question>\n"
    "Answer:\n"
    "Doc: 9, Relevance: 7\n"
    "Doc: 7, Relevance: 3\n\n"
        )

In [12]:
from typing import Callable, List, Optional
from llama_index.prompts.prompts import QuestionAnswerPrompt

class Feedback_Rerank(LLMRerank):
    """LLM-based reranker with a different choice select prompt."""

    def __init__(
        self,
        choice_select_prompt: Optional[QuestionAnswerPrompt] = None,
        choice_batch_size: int = 10,
        format_node_batch_fn: Optional[Callable] = None,
        parse_choice_select_answer_fn: Optional[Callable] = None,
        service_context: Optional[ServiceContext] = None,
        top_n: int = 10,
    ) -> None:
        # Create a QuestionAnswerPrompt instance from the formatted string
        super().__init__(
            choice_select_prompt=choice_select_prompt,
            choice_batch_size=choice_batch_size,
            format_node_batch_fn=format_node_batch_fn,
            parse_choice_select_answer_fn=parse_choice_select_answer_fn,
            service_context=service_context,
            top_n=top_n,
        )

# Usage example
feedback_reranker = Feedback_Rerank(choice_select_prompt)

In [13]:
# define RetrieverQueryEngine
from llama_index.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine(retriever, node_postprocessors=[feedback_reranker])

In [14]:
response = query_engine.query(
    "Describe and summarize Paul Graham's journey to founding YC"
)

Selecting retriever 0: This choice might provide a concise summary of Paul Graham's journey to founding YC, but it might not contain all the necessary details due to the small chunk size..
Selecting retriever 1: This choice might provide a more detailed summary of Paul Graham's journey to founding YC, but it might still miss some important details due to the medium chunk size..
Selecting retriever 2: This choice is likely to provide a comprehensive summary of Paul Graham's journey to founding YC due to the large chunk size..
Selecting retriever 3: This choice is likely to provide the most detailed and comprehensive summary of Paul Graham's journey to founding YC due to the largest chunk size. However, it might also include unnecessary details..


In [32]:
display_response(response, show_source=True)

**`Final Response:`** Paul Graham's journey to founding Y Combinator began with his work in the programming language Arc. He gradually stopped working on Arc due to lack of time and lack of interest, and instead focused on writing essays and working on YC. YC was different from other kinds of work he had done, as the problems came to him and he was able to learn a lot about startups in a short amount of time. He worked hard, even at the parts he didn't like, and was motivated by the idea that his hard work would set the upper bound for how hard everyone else worked. 

In 2010, Robert Morris gave him unsolicited advice to make sure YC wasn't the last cool thing he did. Paul Graham then created the Summer Founders Program, which offered $6k per founder in return for 6% equity. This was twice as good as the deal he and Julian had taken, and Jessica provided free air conditioners to the founders. He quickly realized that funding startups in batches was more convenient and beneficial for both parties, and as YC grew, he noticed other advantages of scale, such as the alumni community and startups becoming one another's customers.

---

**`Source Node 1/3`**

**Node ID:** 6c4018b2-4ab3-48b0-ad03-3f2d93ad0749<br>**Similarity:** 10.0<br>**Text:** to work a good deal _in_ Arc, I gradually stopped working _on_ Arc,
partly because I didn't have ...<br>

---

**`Source Node 2/3`**

**Node ID:** 3d8a614f-e0e8-4d19-b7c7-1ccc16b95768<br>**Similarity:** 8.0<br>**Text:** and from those
we picked 8 to fund. They were an impressive group. That first batch included
redd...<br>

---

**`Source Node 3/3`**

**Node ID:** 68a3594e-d50a-4ef0-9d32-dca0565d5c7b<br>**Similarity:** 5.0<br>**Text:** Aaron
Swartz, who had already helped write the RSS spec and would a few years later
become a mart...<br>

In [26]:
# compute the average precision for each chunk size based on positioning in combined ranking
from collections import defaultdict
import pandas as pd

def mrr_all(metadata_values, metadata_key, source_nodes):
    # source nodes is a ranked list
    # go through each value, find out positioning in source_nodes
    value_to_mrr_dict = {}
    for metadata_value in metadata_values:
        mrr = 0
        for idx, source_node in enumerate(source_nodes):
            if source_node.node.metadata[metadata_key] == metadata_value:
                mrr = 1 / (idx + 1)
                break
            else:
                continue

        # normalize AP, set in dict
        value_to_mrr_dict[metadata_value] = mrr

    df = pd.DataFrame(value_to_mrr_dict, index=["MRR"])
    df.style.set_caption("Mean Reciprocal Rank")
    return df

In [27]:
# Compute the Mean Reciprocal Rank for each chunk size (higher is better)
# we can see that chunk size of 256 has the highest ranked results.
print("Mean Reciprocal Rank for each Chunk Size")
mrr_all(chunk_sizes, "chunk_size", response.source_nodes)

Mean Reciprocal Rank for each Chunk Size


Unnamed: 0,128,256,512,1024
MRR,0,0.333333,1.0,0


## Compare Against Baseline

Compare against a baseline of chunk size 1024 (k=2)

In [28]:
query_engine_1024 = query_engines[-1]

In [30]:
response_1024 = query_engine_1024.query(
    "Describe and summarize the journey of Paul Graham to founding YC"
)

In [31]:
display_response(response_1024, show_source=True, source_length=500)

**`Final Response:`** Paul Graham founded YC in 2005 with the intention of providing seed investments to startups and helping them in the same way that Julian had helped him. He and his co-founders funded YC with their own money, and created the batch model of funding a bunch of startups all at once, twice a year, and then spending three months focusing intensively on helping them. They invited 225 applicants to interview for the Summer Founders Program, and from those they picked 8 to fund. 

The deal for startups was based on a combination of the deal they had taken and what MIT grad students got for the summer. They invested $6k per founder, which in the typical two-founder case was $12k, in return for 6%. YC grew quickly, and Paul noticed other advantages of scale, such as the alumni becoming a tight community, dedicated to helping one another, and the startups becoming one another's customers. 

Paul had not originally intended YC to be a full-time job, but it eventually took up most of his attention. In 2006, he and Robert started working on a new version of Arc, and Paul wrote Hacker News in it. Hacker News was a source of stress for Paul, but it was also good for YC. In 2010, Paul decided to make YC his full-time job.

---

**`Source Node 1/2`**

**Node ID:** beb0fa9c-e983-4107-b47b-96527ee0f579<br>**Similarity:** 0.8423639909762414<br>**Text:** with bylaws and stock and all that stuff, how on earth did you
do that? Our plan was not only to make seed investments, but to do for
startups everything Julian had done for us.  
  
YC was not organized as a fund. It was cheap enough to run that we funded it
with our own money. That went right by 99% of readers, but professional
investors are thinking "Wow, that means they got all the returns." But once
again, this was not due to any particular insight on our part. We didn't know
how VC firm...<br>

---

**`Source Node 2/2`**

**Node ID:** 4f124164-e957-4e21-aa3e-c521b7534bdc<br>**Similarity:** 0.8359661885485107<br>**Text:** at once, but being part of a
batch was better for the startups too. It solved one of the biggest problems
faced by founders: the isolation. Now you not only had colleagues, but
colleagues who understood the problems you were facing and could tell you how
they were solving them.  
  
As YC grew, we started to notice other advantages of scale. The alumni became
a tight community, dedicated to helping one another, and especially the
current batch, whose shoes they remembered being in. We also no...<br>