### Hybrid Retrieval (Dense + Sparse) with LangChain v1.1
- Blend semantic vectors and keyword scoring to boost recall and precision

**What we will build**
- Dense retriever with HuggingFace embeddings (semantic match)
- Sparse retriever with BM25 (keyword match)
- Ensemble retriever to merge both
- Agent-driven RAG flow using LangChain v1.1 `create_agent`

**Prereqs**
- `GROQ_API_KEY` in environment
- `langchain`, `langchain-community`, `langchain-huggingface`, `langchain-groq`, `faiss-cpu` installed

In [11]:
import os
from dotenv import load_dotenv

from langchain_community.document_loaders import WikipediaLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.retrievers import BM25Retriever
from langchain_classic.retrievers import EnsembleRetriever
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_core.messages import HumanMessage
from langchain_groq import ChatGroq
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore

load_dotenv()

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "")
if not os.environ["GROQ_API_KEY"]:
    raise ValueError("Set GROQ_API_KEY in your environment before running")

In [7]:
# Load a small corpus from Wikipedia to keep the demo self contained
loader = WikipediaLoader(query="Transformer (deep learning)", load_max_docs=8)
documents = loader.load()
print(f"Loaded {len(documents)} articles")
print("First doc preview:\n", documents[0].page_content[:240], "...")

Loaded 7 articles
First doc preview:
 In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via ...




  lis = BeautifulSoup(html).find_all('li')


In [8]:
# Chunk the corpus so retrieval is focused and fits the LLM context
splitter = RecursiveCharacterTextSplitter(chunk_size=900, chunk_overlap=150)
docs = splitter.split_documents(documents)
print(f"Split into {len(docs)} chunks")
print("Chunk preview:\n", docs[0].page_content[:200], "...")

Split into 40 chunks
Chunk preview:
 In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and e ...


In [9]:
# HuggingFace embeddings for dense retrieval (runs locally, no extra API cost)
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True},
)

test_vector = embeddings.embed_query(docs[0].page_content)
print("Vector dimension:", len(test_vector))

Vector dimension: 384


In [12]:
# Dense retriever: FAISS vector store + embedding model

index = faiss.IndexFlatL2(len(embeddings.embed_query("hello world")))
vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

dense_store = vector_store.from_documents(docs, embeddings)
dense_retriever = dense_store.as_retriever(search_kwargs={"k": 4})
print("Dense retriever ready (semantic search)")

Dense retriever ready (semantic search)


In [14]:
# Sparse retriever: BM25 ranks by keyword overlap
sparse_retriever = BM25Retriever.from_documents(docs)
sparse_retriever.k = 4
print("Sparse retriever ready (keyword search)")

Sparse retriever ready (keyword search)


#### Why hybrid?
- Dense: great for meaning and paraphrases, can miss exact phrasing
- Sparse: great for precise terms, can miss synonyms
- Ensemble: weighted blend to keep both semantic coverage and lexical precision

In [15]:
# The Hybrid Approach: Combining Both Worlds
# Mathematical Formulation
# For each document, you compute two separate scores and combine them:
# final_score = (alpha × sparse_score) + ((1 - alpha) × dense_score)
# alpha is typically 0.5-0.7, tuned for your use case
# Hybrid retriever combines both signals
hybrid_retriever = EnsembleRetriever(
    retrievers=[dense_retriever, sparse_retriever],
    weights=[0.5, 0.5],  # tune these for your corpus
)
print("Hybrid retriever assembled (dense + sparse)")

Hybrid retriever assembled (dense + sparse)


In [16]:
# Quick probe to see what hybrid returns
query = "How do transformers differ from RNNs?"
results = hybrid_retriever.invoke(query)

print(f"Query: {query}")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content[:160]}...")

Query: How do transformers differ from RNNs?
1. The modern version of the transformer was proposed in the 2017 paper "Attention Is All You Need" by researchers at Google. The predecessors of transformers were...
2. Modern transformers overcome this problem, but unlike RNNs, they require computation time that is quadratic in the size of the context window. The linearly scal...
3. == Views ==
Shazeer said about artificial general intelligence that he doesn't "particularly care about AGI in the sense of wanting something that can do absolu...
4. In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism, in which text is converted to numeri...
5. === Key components ===
Selective state spaces (SSM): The core of Mamba, SSMs are recurrent models that selectively process information based on the current inpu...
6. A key breakthrough was LSTM (1995), an RNN which used various innovations to overcome the vanishing gradient problem,

In [17]:
# Compare dense vs sparse vs hybrid for the same query
test_query = "semantic vector search databases"

dense_only = dense_retriever.invoke(test_query)
sparse_only = sparse_retriever.invoke(test_query)
hybrid = hybrid_retriever.invoke(test_query)

print(f"Query: {test_query}\n")
print("Dense:")
for i, doc in enumerate(dense_only, 1):
    print(f" {i}. {doc.page_content[:120]}...")

print("\nSparse:")
for i, doc in enumerate(sparse_only, 1):
    print(f" {i}. {doc.page_content[:120]}...")

print("\nHybrid:")
for i, doc in enumerate(hybrid, 1):
    print(f" {i}. {doc.page_content[:120]}...")

Query: semantic vector search databases

Dense:
 1. ×
            P
            ×
            C
          
        
      
    
    {\displaystyle \mathbb {R} ^{P\times P\t...
 2. Sometimes, alignment can be multiple-to-multiple. For example, the English phrase look it up corresponds to cherchez-le....
 3. == History ==
On June 11, 2018, OpenAI researchers and engineers published a paper called "Improving Language Understand...
 4. In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechani...

Sparse:
 1. ×
            P
            ×
            C
          
        
      
    
    {\displaystyle \mathbb {R} ^{P\times P\t...
 2. == Career ==
Noam Shazeer joined Google in 2000. One of his first major achievements was improving the spelling correcto...
 3. Sometimes, alignment can be multiple-to-multiple. For example, the English phrase look it up corresponds to cherchez-le....
 4. === Attention with seq2seq ===

The idea of

### Build an agentic RAG flow (LangChain `create_agent`)
- Tool: the hybrid retriever exposed via a `@tool` wrapper
- Agent: `create_agent` orchestrates model + tool calls
- Model: ChatGroq (LLM) drives tool selection and final answer

In [18]:
# Groq LLM (deterministic for repeatable demos)
llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0)

In [19]:
# Turn the retriever into a tool that the agent can call
@tool("hybrid_search", description="Hybrid dense+sparse retrieval over the wiki corpus")
def hybrid_search(query: str):
    """Retrieve semantically and lexically relevant chunks for a user question."""
    return hybrid_retriever.invoke(query)

tools = [hybrid_search]
print("Tool registered:", tools[0].name)

Tool registered: hybrid_search


In [20]:
# System prompt keeps the agent grounded in retrieved context
system_prompt = (
    "You are a concise technical assistant. Use the hybrid_search tool to fetch context "
    "before answering. If nothing relevant is found, say you do not have context."
)

agent = create_agent(
    model=llm,
    tools=tools,
    system_prompt=system_prompt,
)

print("Agent ready with tools:", [t.name for t in tools])

Agent ready with tools: ['hybrid_search']


In [22]:
# Ask a question and let the agent decide how to use the hybrid retriever
user_question = "What advantage do transformers have over RNNs?"

response = agent.invoke({
    "messages": [
        HumanMessage(content=user_question)
    ]
})

print(response["messages"][-1].pretty_print())


Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM). They are also able to operate in parallel over all tokens in a sequence, whereas RNNs operate one token at a time from first to last. This allows transformers to be more efficient and scalable for large-scale natural language processing tasks.
None


In [23]:
# Quick weight sweep to see how ranking changes
def preview_weights(dense_weight: float, sparse_weight: float, user_query: str):
    test_retriever = EnsembleRetriever(
        retrievers=[dense_retriever, sparse_retriever],
        weights=[dense_weight, sparse_weight],
    )
    docs_out = test_retriever.invoke(user_query)
    print(f"Weights dense/sparse: {dense_weight}/{sparse_weight}")
    for i, doc in enumerate(docs_out[:2], 1):
        print(f" {i}. {doc.page_content[:120]}...")
    print()


sample_query = "LangChain retrieval methods"
preview_weights(0.7, 0.3, sample_query)
preview_weights(0.5, 0.5, sample_query)
preview_weights(0.3, 0.7, sample_query)

Weights dense/sparse: 0.7/0.3
 1. == History ==
On June 11, 2018, OpenAI researchers and engineers published a paper called "Improving Language Understand...
 2. === Predecessors ===
For many years, sequence modelling and generation was done by using plain recurrent neural networks...

Weights dense/sparse: 0.5/0.5
 1. == History ==
On June 11, 2018, OpenAI researchers and engineers published a paper called "Improving Language Understand...
 2. SSD Layer: The main contribution of structured state space duality in Mamba-2 is through the SSD layer. In Mamba-1, the ...

Weights dense/sparse: 0.3/0.7
 1. SSD Layer: The main contribution of structured state space duality in Mamba-2 is through the SSD layer. In Mamba-1, the ...
 2. === Mamba-2 ===
Mamba-2 serves as a successor to Mamba by introducing a new theoretical and computational framework call...



### Notes and production tips
- Tune `k` per retriever and weight blend based on evaluation, not guesses
- Normalize embeddings for FAISS cosine similarity (done above)
- For larger corpora, switch FAISS to IVF or move to managed vector DBs
- Cache frequent queries and reuse embeddings to keep cost down
- Keep prompts short; hybrid retrieval already improves recall