# Post Retrieval Strategies

## Reranking Retrieved Chunks

### What is a Reranker?

A reranker is a type of machine learning model used in search systems to reorder a set of retrieved documents by relevance to a user query. Imagine you search for something, and the system pulls a bunch of documents. Not all of them are equally useful, though. The reranker steps in after the initial search to figure out which of those documents are most relevant to what you’re asking. 

At its core, a reranker is typically built using a "cross-encoder" model. Unlike traditional search methods, which compress each document and query separately into vectors, a reranker considers the relationship between the query and each document individually. This allows it to provide a more accurate score for how closely a document matches a query, improving the relevance of the results presented to the user.

### Why a Reranker is Needed

#### Inadequacy of Vector/Keyword Search

Let’s start with the basic problem: vector search, or even traditional keyword search, has limitations. In a typical retrieval-augmented generation (RAG) setup, you’re dealing with a lot of documents—sometimes tens of thousands, other times millions. The first step in a RAG pipeline is usually vector search. Here, documents are turned into numerical representations, or "vectors," and stored in a large vector database. When a user submits a query, it’s also turned into a vector, and the system retrieves documents that are mathematically closest to this query vector.

While this sounds straightforward, there’s a catch. Vector search involves compressing the "meaning" of a document into a fixed-length vector, typically 768 or 1536 dimensions. This compression inevitably leads to information loss. When we’re crunching documents into smaller vectors, there’s no guarantee that every subtle detail of the document's meaning will be preserved. As a result, highly relevant information might be hidden in documents that don’t make it to the top results of the vector search. You might end up retrieving documents that are good but not great, missing key information that could answer the user’s query better.

This problem becomes even more apparent with large datasets. Vector search is good for finding “close enough” documents fast, but it’s often too blunt for identifying nuanced, highly relevant documents. It doesn’t always account for context either. That’s because the vector embeddings are created before the user query even arrives, meaning the search system doesn’t have a chance to fine-tune those embeddings based on the specific question asked.

#### How a Reranker Overcomes This

This is where rerankers shine. A reranker takes the top documents retrieved by the vector search and refines their order based on a deeper understanding of both the query and each document. Instead of treating the query and document separately, as vector searches do, a reranker looks at them together. It applies a large transformer model (like BERT) to both the query and the retrieved document, allowing it to understand the relationship between the two in much greater detail.

Here’s how it works: after the vector search pulls a set of documents, the reranker model pairs the query with each document, feeds them both into the transformer model, and then calculates a similarity score. This score is based on how well the document answers the specific query. In short, the reranker makes decisions based on the exact words in both the query and the document rather than just their vectorized representations.

For example, if the query is “How do rerankers improve RAG pipelines?” a reranker would look at every document retrieved and determine which ones specifically talk about how rerankers improve RAG pipelines—not just documents that vaguely match the topic. This precision comes at a cost: rerankers are slower because they perform a full transformer computation for each query-document pair. But the accuracy boost makes it worth it in many cases.

### Tradeoffs of Using a Reranker

While rerankers improve the accuracy of search results, they come with tradeoffs, primarily in terms of speed and computational cost. Rerankers, especially those based on large transformer models, require significant processing power because they perform a full transformer inference for each query-document pair. This makes them much slower than vector search, which only needs to compute a single query vector and compare it with pre-stored document vectors. For real-time systems with high user traffic, this added latency can be a bottleneck. 

Additionally, the computational cost of reranking increases with the number of documents being reranked. As a result, rerankers are typically used only after an initial retrieval step has reduced the candidate set, balancing the need for accuracy with the need for performance.

### How Rerankers Are Used In A RAG Pipeline

In a typical RAG pipeline, rerankers are used as part of a two-stage retrieval system. The first stage involves the fast retrieval of documents using vector or keyword search. This stage is designed for speed because we want to narrow down millions of documents to just a handful as quickly as possible. 

Once the vector search has pulled the top documents (say, the top 25), the reranker steps in. The reranker takes this smaller set of documents and reorders them based on how well they match the user’s query, using its deeper understanding of the content. This ensures that the top results shown to the user are not just “close enough” but are actually the most relevant documents available.

This combination of vector search for speed and reranking for accuracy strikes a balance between performance and relevance. By using vector search as a first pass to trim down the number of documents and then applying rerankers for fine-tuning, the RAG pipeline can deliver better, more precise results to the LLM, improving the quality of the final output.

### Implementation 
We are going to use Cohere Reranker for this task. It is one of the best rerankers out there.

In [1]:
import os

os.getenv("OPENAI_API_KEY")

In [2]:
import weaviate 
from dotenv import load_dotenv
import os
from weaviate.embedded import EmbeddedOptions


load_dotenv("./../.env")

client = weaviate.WeaviateClient(
    embedded_options=EmbeddedOptions(
        additional_env_vars={
            "ENABLE_MODULES": "backup-filesystem,text2vec-openai,text2vec-cohere,text2vec-huggingface,ref2vec-centroid,generative-openai,qna-openai",
            "BACKUP_FILESYSTEM_PATH": "/tmp/backups",
            "DEFAULT_VECTORIZER_MODULE": "text2vec-openai"
        }
    ),
    additional_headers={
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
    }
)

client.connect()

{"action":"startup","default_vectorizer_module":"text2vec-openai","level":"info","msg":"the default vectorizer modules is set to \"text2vec-openai\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-09-29T22:58:41+05:30"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-09-29T22:58:41+05:30"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-09-29T22:58:41+05:30"}
{"level":"info","msg":"module offload-s3 is enabled","time":"2024-09-29T22:58:41+05:30"}
{"level":"info","msg":"open cluster service","servers":{"Embedded_at_8079":51164},"time":"2024-09-29T22:58:41+05:30"}
{"address":"192.168.69.215:51165","level":"info","msg":"starting cloud rpc server ...","time":"2024-09-29T22:58:41+05:30"}
{"level":"info","msg":"starting raft sub-system ...",

In [3]:
from weaviate.classes.config import Property, DataType, Configure
import json

if client.collections.exists("Health"):
    client.collections.delete("Health")
else:
    client.collections.create(
        "Health",
        properties=[
            Property(name="title", data_type=DataType.TEXT),
            Property(name="body", data_type=DataType.TEXT),
        ]
    )

with open("./health.json", "r") as f:
    health_json = json.load(f)

health = client.collections.get("Health")

with health.batch.dynamic() as batch:
    for h in health_json:
        batch.add_object(h)


{"action":"load_all_shards","level":"error","msg":"failed to load all shards: context canceled","time":"2024-09-29T22:58:44+05:30"}
{"action":"hnsw_prefill_cache_async","level":"info","msg":"not waiting for vector cache prefill, running in background","time":"2024-09-29T22:58:45+05:30","wait_for_cache_prefill":false}
{"level":"info","msg":"Created shard health_VeXaKjf2h4Vs in 3.00275ms","time":"2024-09-29T22:58:45+05:30"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-09-29T22:58:45+05:30","took":99208}


In [4]:
import textwrap

def print_objects(objects):
    """
        a function to print the retrieved objects
    """
    for obj in objects:
        print(f"ID: {obj.uuid.int}")
        metadata = [{k: round(v, 2) if isinstance(v, float) else v} for k, v in obj.metadata.__dict__.items() if v is not None]
        print(f"Metadata: {metadata}")
        print(f"Title: {obj.properties['title']}")
        print(f"Body: {textwrap.shorten(obj.properties['body'], width=100)}")
        print()

In [5]:
from weaviate.classes.query import MetadataQuery

queries = [
    "How does diet affect muscle recovery after exercise?",
    "Can intermittent fasting and weight training improve both heart health and muscle growth?",
    "How can yoga and mindfulness help reduce stress and improve digestion?",
    "What role do hydration and nutrition play in speeding up muscle recovery?",
    "What's the best way to combine intermittent fasting and HIIT to improve cardio?"
]


query = queries[4]

chunks = health.query.near_text(
    query=query,
    limit=10,
    return_metadata=MetadataQuery(distance=True, certainty=True)
)

print_objects(chunks.objects)

ID: 79750210735445999554422007586051561295
Metadata: [{'distance': 0.13}, {'certainty': 0.93}]
Title: Benefits of HIIT Workouts
Body: High-intensity interval training (HIIT) is an efficient way to burn fat and improve [...]

ID: 110722175561862500784486769915905468074
Metadata: [{'distance': 0.16}, {'certainty': 0.92}]
Title: Intermittent Fasting and Cardiovascular Health
Body: Research suggests that intermittent fasting may improve cardiovascular health by reducing [...]

ID: 311091617027233937846831049884227850741
Metadata: [{'distance': 0.19}, {'certainty': 0.91}]
Title: Intermittent Fasting for Muscle Gain
Body: Although intermittent fasting is often associated with weight loss, it can also be used to [...]

ID: 161858462270752632728592141290163058648
Metadata: [{'distance': 0.19}, {'certainty': 0.9}]
Title: Intermittent Fasting for Weight Loss
Body: Intermittent fasting has become popular for its potential to aid in weight loss. By [...]

ID: 20160001484484614156347387032633237118

In [6]:
def convert_chunks_to_dict(chunks):
    chunks_dict = []
    for chunk in chunks:
        chunk_dict = {
            "id": chunk.uuid.int,
            "title": chunk.properties["title"],
            "body": chunk.properties["body"],
            "metadata": chunk.metadata.__dict__
        }
        chunks_dict.append(chunk_dict)
    return chunks_dict

In [7]:
import cohere

co = cohere.Client(api_key=os.getenv("COHERE_API_KEY"))

chunks_dict = convert_chunks_to_dict(chunks.objects)

reranked_chunks = co.rerank(
    model="rerank-english-v3.0", 
    query=query, 
    documents=chunks_dict,
    rank_fields=["title", "body"], # fields in the docs list to consider for ranking
    top_n=5, 
    return_documents=True
)

[print(chunk) for chunk in reranked_chunks.results]

/Users/vishwasgowda/code/ai-course/.venv/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py:312: PydanticDeprecatedSince20: `json_encoders` is deprecated. See https://docs.pydantic.dev/2.9/concepts/serialization/#custom-serializers for alternatives. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/


document=RerankResponseResultsItemDocument(text=None, body='Research suggests that intermittent fasting may improve cardiovascular health by reducing blood pressure, cholesterol levels, and inflammation. However, it is essential to balance fasting with nutrient-rich foods to ensure overall health.', id=1.107221755618625e+38, metadata={'certainty': 0.9217401146888733, 'creation_time': None, 'distance': 0.15651977062225342, 'explain_score': None, 'is_consistent': None, 'last_update_time': None, 'rerank_score': None, 'score': None}, title='Intermittent Fasting and Cardiovascular Health') index=1 relevance_score=0.70181525
document=RerankResponseResultsItemDocument(text=None, body='High-intensity interval training (HIIT) is an efficient way to burn fat and improve cardiovascular fitness. HIIT involves short bursts of intense exercise followed by rest or low-intensity periods. This type of workout can also boost metabolism.', id=7.9750210735446e+37, metadata={'certainty': 0.9332375526428223

[None, None, None, None, None]

#### Comparing retrieved chunks with reranked chunks


In [8]:
# create a pandas df comparing the sequence of the original chunks and the reranked chunks using the titles
import pandas as pd

df = pd.DataFrame({
    "original": [chunk.properties["title"] for chunk in chunks.objects[:5]],
    "similarity": [chunk.metadata.certainty for chunk in chunks.objects[:5]],
    "reranked": [chunk["document"]["title"] for chunk in reranked_chunks.dict()["results"]],
    "relevance score": [chunk["relevance_score"] for chunk in reranked_chunks.dict()["results"]]
})

df

/Users/vishwasgowda/code/ai-course/.venv/lib/python3.11/site-packages/pydantic/main.py:1097: PydanticDeprecatedSince20: The `__fields_set__` attribute is deprecated, use `model_fields_set` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/


Unnamed: 0,original,similarity,reranked,relevance score
0,Benefits of HIIT Workouts,0.933238,Intermittent Fasting and Cardiovascular Health,0.701815
1,Intermittent Fasting and Cardiovascular Health,0.92174,Benefits of HIIT Workouts,0.113383
2,Intermittent Fasting for Muscle Gain,0.907226,Intermittent Fasting for Muscle Gain,0.00516
3,Intermittent Fasting for Weight Loss,0.902607,Intermittent Fasting for Weight Loss,0.003722
4,Cardio Exercises for Heart Health,0.888323,Cardio Exercises for Heart Health,0.000389


## Response Generation

Response Generation is basically the last step in the RAG pipeline. It is the process of generating a response based on the retrieved and reranked chunks. This is basically done by prompting the model with the query, context (retrieved chunks), instructions/guidelines on generating the response, and any other relevant information. The model then generates a response based on this information. These instructions and other relvant information is highly specific to the domain and the task at hand.



In [9]:
from openai import OpenAI
import json
from IPython.display import Markdown, display


client = OpenAI()

PROMPT_TEMPLATE = """
    You are a health expert and you will be provided with a question and related context on which needs to provide a well constructed and structured answer.

    ## Instructions:
    - Avoid fluff and clichés: Generate a concise answers and avoid words, phrases, and sentences that do not add any substantial value to the response.
    - Tone: needs to be conversational, spartan, use less corporate jargon.
    - The answers should have a natural flow and should be easy to understand.
    - Assume that the reader has a {level} level of understanding of the topic, so generate response and use terminology accordingly.
    
    ## Question:
    {question}

    ## Context:
    The context provided below is order from most relevant to least relevant to the question. So use the context to accordingly to structure your response.
    {context}

    ## Response Format:
    {{"response": "provide the response using proper markdown formatting."}}

    ## Response:
"""

def generate_response(question, context, level="beginner"):

    formatted_context = "\n".join([f"---\n{chunk['document']['title']}\n{chunk['document']['body']}" for chunk in context])

    prompt = PROMPT_TEMPLATE.format(question=question, context=formatted_context, level=level)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "you are an assistant who responds in json format"},
            {"role": "user", "content": prompt}
        ],
        response_format={"type": "json_object"}
    )

    return json.loads(response.choices[0].message.content)["response"]


response = generate_response(
    question=query, 
    context=reranked_chunks.dict()["results"],
    level="advanced"
)

display(Markdown(response))

/Users/vishwasgowda/code/ai-course/.venv/lib/python3.11/site-packages/pydantic/main.py:1097: PydanticDeprecatedSince20: The `__fields_set__` attribute is deprecated, use `model_fields_set` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/


To effectively combine intermittent fasting (IF) with high-intensity interval training (HIIT) for improved cardio, follow these guidelines:

1. **Timing Your Workouts**: Schedule HIIT sessions towards the end of your fasting window. This ensures that you have higher energy levels for intense workouts while still benefiting from the metabolic advantages of fasting.

2. **Post-Workout Nutrition**: After your HIIT workout, break your fast with nutrient-dense foods. Focus on a balance of protein, healthy fats, and carbohydrates to replenish glycogen and support muscle recovery. This strategy promotes muscle gain and enhances cardiovascular health.

3. **Hydration**: Stay well-hydrated. Proper hydration optimizes performance during HIIT and aids recovery afterwards. Make sure to drink water during fasting periods as well.

4. **Listen to Your Body**: Monitor your energy levels and overall well-being. If you experience excessive fatigue or decreased performance, consider adjusting your fasting schedule or the intensity of your HIIT sessions.

5. **Consistency**: Maintain a consistent routine. Results from both intermittent fasting and HIIT build over time, so sticking to a regular schedule is vital for maximizing cardiovascular benefits.

By integrating these strategies, you can leverage the advantages of both intermittent fasting and HIIT to enhance your cardiovascular fitness.