## Reranking Hybrid Search Strategies
Re-ranking is a second-stage filtering process in retrieval systems in RAG pipelines where we:
- 1- First use a fast retriever (like BM25, FAISS, hybrid) to fetch top-k documents quickly

- 2- Then use a more accurate but slower model (like a cross-encoder or LLM) to re-score and reorder those documents by relevance to the query

it ensure that the most relevant documents appear at top, improving the final answer from the LLM

In [1]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain_core.output_parsers import StrOutputParser 

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")

In [11]:
## laod text file

loader = TextLoader("../langchain_sample.txt")
raw_docs = loader.load()

# Split text into document chunks
splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 40)
docs = splitter.split_documents(raw_docs)
docs



[Document(metadata={'source': '../langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(metadata={'source': '../langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.'),
 Document(metadata={'source': '../langchain_sample.txt'}, page_content='Retrieval-Augmented Generation (RAG) is a powerful technique where external knowledge is retrieved and passed into the prompt to ground LLM responses. LangChain makes it easy to implement RAG using vector databases like FAISS, Chroma, and Pinecone.\nBM25 is a tra

In [13]:
## User Query
query = "How can I use langchain to build an application with memory and tools"


In [15]:
### FAISS and Huggingface model Embeddings

from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings


embedding_model = HuggingFaceEmbeddings(model_name = "all-MiniLM-L6-v2")
vectore_store = FAISS.from_documents(docs, embedding_model)
retriever = vectore_store.as_retriever(search_kwargs = {"k":8})
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000002405EE18830>, search_kwargs={'k': 8})

In [153]:
## OpenAI Embedding

from langchain_openai import OpenAIEmbeddings, ChatOpenAI

emb_op  = OpenAIEmbeddings()
vectore_store_open_ai = FAISS.from_documents(docs, emb_op)
retriever_openAI = vectore_store_open_ai.as_retriever(search_kwargs = {"k":8})
retriever_openAI

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x00000240E51981D0>, search_kwargs={'k': 8})

In [155]:
## prompt and use the LLM
from langchain.chat_models import init_chat_model
from dotenv import load_dotenv
load_dotenv()

os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")


from langchain_groq import ChatGroq

# llm = ChatGroq(
#     model="llama-3.1-8b-instant",  
#     api_key=os.getenv("GROQ_API_KEY")
# )

# llm

llm = ChatOpenAI(
    model="gpt-4o-mini",   
    temperature=0.2
)

llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x0000024037647A10>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x00000240E67C6B40>, root_client=<openai.OpenAI object at 0x00000240E5281370>, root_async_client=<openai.AsyncOpenAI object at 0x00000240E683F380>, model_name='gpt-4o-mini', temperature=0.2, model_kwargs={}, openai_api_key=SecretStr('**********'))

In [157]:
## Prompt Template

promt = PromptTemplate.from_template("""
You are a helpful assistant. Your task is to rank the following documents from most to least relevant to the user's question.

User Question: "{question}"

Documents:
{documents}

Instructions:
- Think about the relevance of each document to the user's question.
- Return a list of document indices in ranked order, starting from the most relevant.

Output format: comma-separated document indices (e.g., 2,1,3,0,...)

""")

In [159]:
retriver_docs = retriever.invoke(query)
retriver_docs

[Document(id='fa9d1d36-b76c-4590-87cb-8df7b36909eb', metadata={'source': '../langchain_sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.'),
 Document(id='643e72a3-8825-4309-ac72-f92f67a34af0', metadata={'source': '../langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='b9f2ddec-67ec-483a-9291-e776ff51c9cf', metadata={'source': '../langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, 

In [161]:
chain = promt | llm | StrOutputParser()
chain

PromptTemplate(input_variables=['documents', 'question'], input_types={}, partial_variables={}, template='\nYou are a helpful assistant. Your task is to rank the following documents from most to least relevant to the user\'s question.\n\nUser Question: "{question}"\n\nDocuments:\n{documents}\n\nInstructions:\n- Think about the relevance of each document to the user\'s question.\n- Return a list of document indices in ranked order, starting from the most relevant.\n\nOutput format: comma-separated document indices (e.g., 2,1,3,0,...)\n\n')
| ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x0000024037647A10>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x00000240E67C6B40>, root_client=<openai.OpenAI object at 0x00000240E5281370>, root_async_client=<openai.AsyncOpenAI object at 0x00000240E683F380>, model_name='gpt-4o-mini', temperature=0.2, model_kwargs={}, openai_api_key=SecretStr('**********'))
| StrOutpu

In [163]:
doc_line = [f"{i+1}. {doc.page_content}" for i, doc in enumerate(retriver_docs)]
doc_line
formated_docs = "\n".join(doc_line)
formated_docs

'1. LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.\n2. LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.\n3. LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.\n4. FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spaces. It supports both flat and compressed ind

In [165]:
response = chain.invoke({"question": query, "documents": formated_docs})
response

'2,1,3,4,5,6'

In [197]:
indice = [int(x.strip()) -1 for x in response.split(",") if x.strip().isdigit()]
indice

[1, 0, 2, 3, 4, 5]

In [199]:
retriver_docs

[Document(id='fa9d1d36-b76c-4590-87cb-8df7b36909eb', metadata={'source': '../langchain_sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.'),
 Document(id='643e72a3-8825-4309-ac72-f92f67a34af0', metadata={'source': '../langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='b9f2ddec-67ec-483a-9291-e776ff51c9cf', metadata={'source': '../langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, 

In [201]:
reranked_docs  = [retriver_docs[i] for i in indice if 0<=i < len(retriver_docs)]
reranked_docs

[Document(id='643e72a3-8825-4309-ac72-f92f67a34af0', metadata={'source': '../langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='fa9d1d36-b76c-4590-87cb-8df7b36909eb', metadata={'source': '../langchain_sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.'),
 Document(id='b9f2ddec-67ec-483a-9291-e776ff51c9cf', metadata={'source': '../langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, 

In [203]:
print("\n Final Reranked Results:")
for i, doc in enumerate(reranked_docs,1):
    print(f"\nRank {i}: \n{doc.page_content}")


 Final Reranked Results:

Rank 1: 
LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.

Rank 2: 
LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.
Memory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.

Rank 3: 
LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.

Rank 4: 
FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional