# Custom Retrievers and Reranking

Retrieval quality is often the bottleneck in RAG systems. This notebook covers advanced retrieval techniques including custom retrievers, reranking, and fusion strategies.

## Learning Objectives

By the end of this notebook, you will:
1. Build custom retrievers for specific needs
2. Implement reranking to improve retrieval quality
3. Use hybrid search (vector + keyword)
4. Apply retrieval fusion strategies
5. Evaluate retrieval performance

---

In [None]:
# Setup
import nest_asyncio
nest_asyncio.apply()

from dotenv import load_dotenv
load_dotenv()

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    Settings,
)
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.retrievers import BaseRetriever
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from typing import List

# Configure
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

print("✓ Setup complete!")

In [None]:
# Load and index documents
documents = SimpleDirectoryReader("../data/sample_docs").load_data()
index = VectorStoreIndex.from_documents(documents, show_progress=True)

print(f"\n✓ Loaded {len(documents)} documents and built index!")

## 1. Understanding the Retriever Interface

All retrievers in LlamaIndex inherit from `BaseRetriever` and implement the `_retrieve` method:

In [None]:
# Default vector retriever
default_retriever = index.as_retriever(
    similarity_top_k=5,
)

# Test retrieval
query = "What is machine learning?"
retrieved_nodes = default_retriever.retrieve(query)

print(f"Query: {query}")
print(f"\nRetrieved {len(retrieved_nodes)} nodes:\n")

for i, node in enumerate(retrieved_nodes):
    print(f"Node {i+1}:")
    print(f"  Score: {node.score:.4f}")
    print(f"  Text: {node.text[:100]}...")
    print()

## 2. Building a Custom Retriever

Let's create a custom retriever that filters by metadata and applies custom scoring:

In [None]:
from llama_index.core.retrievers import VectorIndexRetriever

class MetadataFilterRetriever(BaseRetriever):
    """Custom retriever with metadata filtering and score boosting."""
    
    def __init__(
        self,
        index: VectorStoreIndex,
        similarity_top_k: int = 5,
        required_keywords: List[str] = None,
        score_threshold: float = 0.0,
    ):
        super().__init__()
        self._index = index
        self._similarity_top_k = similarity_top_k
        self._required_keywords = required_keywords or []
        self._score_threshold = score_threshold
        
        # Create base retriever
        self._base_retriever = VectorIndexRetriever(
            index=index,
            similarity_top_k=similarity_top_k * 2,  # Retrieve more for filtering
        )
    
    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        # Get initial results
        nodes = self._base_retriever.retrieve(query_bundle)
        
        # Apply filters
        filtered_nodes = []
        for node in nodes:
            # Score threshold filter
            if node.score < self._score_threshold:
                continue
            
            # Keyword filter (optional)
            if self._required_keywords:
                text_lower = node.text.lower()
                has_keyword = any(kw.lower() in text_lower for kw in self._required_keywords)
                if not has_keyword:
                    continue
            
            filtered_nodes.append(node)
        
        # Return top k after filtering
        return filtered_nodes[:self._similarity_top_k]

print("✓ Custom retriever class defined!")

In [None]:
# Use the custom retriever
custom_retriever = MetadataFilterRetriever(
    index=index,
    similarity_top_k=3,
    required_keywords=["learning"],  # Must contain "learning"
    score_threshold=0.5,  # Minimum similarity score
)

query = "How do systems improve over time?"
filtered_nodes = custom_retriever.retrieve(query)

print(f"Query: {query}")
print(f"Filter: Must contain 'learning', score >= 0.5")
print(f"\nRetrieved {len(filtered_nodes)} nodes after filtering:\n")

for i, node in enumerate(filtered_nodes):
    print(f"Node {i+1}: score={node.score:.4f}")
    print(f"  {node.text[:150]}...\n")

## 3. Reranking Retrieved Results

Reranking uses a more sophisticated model to re-score and reorder retrieved results. This often improves quality significantly.

In [None]:
from llama_index.core.postprocessor import SentenceTransformerRerank

# Create a reranker using a cross-encoder model
reranker = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2",
    top_n=3,  # Return top 3 after reranking
)

print("✓ Reranker initialized!")

In [None]:
# Compare results with and without reranking
query = "What are the ethical concerns with AI systems?"

# Without reranking
retriever_no_rerank = index.as_retriever(similarity_top_k=5)
nodes_no_rerank = retriever_no_rerank.retrieve(query)

print("WITHOUT Reranking:")
print("-" * 50)
for i, node in enumerate(nodes_no_rerank[:3]):
    print(f"{i+1}. Score: {node.score:.4f}")
    print(f"   {node.text[:100]}...\n")

In [None]:
# With reranking
from llama_index.core.query_engine import RetrieverQueryEngine

query_engine_with_rerank = RetrieverQueryEngine.from_args(
    retriever=retriever_no_rerank,
    node_postprocessors=[reranker],
)

# Get reranked nodes manually for comparison
query_bundle = QueryBundle(query_str=query)
reranked_nodes = reranker.postprocess_nodes(
    nodes_no_rerank,
    query_bundle=query_bundle,
)

print("\nWITH Reranking:")
print("-" * 50)
for i, node in enumerate(reranked_nodes):
    print(f"{i+1}. Score: {node.score:.4f}")
    print(f"   {node.text[:100]}...\n")

## 4. LLM-Based Reranking

For even better quality, you can use an LLM to rerank results:

In [None]:
from llama_index.core.postprocessor import LLMRerank

# LLM-based reranker
llm_reranker = LLMRerank(
    llm=Settings.llm,
    choice_batch_size=5,
    top_n=3,
)

print("✓ LLM Reranker initialized!")

In [None]:
# Compare LLM reranking
query = "How can machine learning be applied to real-world problems?"

# Get initial results
initial_nodes = retriever_no_rerank.retrieve(query)

print("Initial retrieval (vector similarity):")
for i, node in enumerate(initial_nodes[:3]):
    print(f"  {i+1}. {node.text[:80]}...")

# LLM rerank
print("\nAfter LLM Reranking:")
llm_reranked = llm_reranker.postprocess_nodes(
    initial_nodes,
    query_bundle=QueryBundle(query_str=query),
)

for i, node in enumerate(llm_reranked):
    print(f"  {i+1}. {node.text[:80]}...")

## 5. Hybrid Search (Vector + BM25)

Combining semantic search with traditional keyword search often yields better results:

In [None]:
from llama_index.core.retrievers import BM25Retriever
from llama_index.core.schema import TextNode

# Extract nodes from the index for BM25
docstore = index.docstore
nodes = list(docstore.docs.values())

# Create BM25 retriever (keyword-based)
bm25_retriever = BM25Retriever.from_defaults(
    nodes=nodes,
    similarity_top_k=5,
)

# Create vector retriever
vector_retriever = index.as_retriever(similarity_top_k=5)

print("✓ Both retrievers ready!")

In [None]:
# Compare retrieval methods
query = "supervised learning algorithms"

print(f"Query: '{query}'\n")

# BM25 (keyword)
bm25_nodes = bm25_retriever.retrieve(query)
print("BM25 (Keyword) Results:")
for i, node in enumerate(bm25_nodes[:3]):
    print(f"  {i+1}. {node.text[:80]}...")

# Vector (semantic)
vector_nodes = vector_retriever.retrieve(query)
print("\nVector (Semantic) Results:")
for i, node in enumerate(vector_nodes[:3]):
    print(f"  {i+1}. {node.text[:80]}...")

In [None]:
from llama_index.core.retrievers import QueryFusionRetriever

# Create a fusion retriever that combines both
hybrid_retriever = QueryFusionRetriever(
    retrievers=[
        vector_retriever,
        bm25_retriever,
    ],
    similarity_top_k=5,
    num_queries=1,  # Don't generate additional queries
    mode="reciprocal_rerank",  # Fusion method
)

print("✓ Hybrid retriever created!")

In [None]:
# Test hybrid retrieval
query = "Python functions and methods"

hybrid_nodes = hybrid_retriever.retrieve(query)

print(f"Query: '{query}'")
print(f"\nHybrid (Vector + BM25 Fusion) Results:")
for i, node in enumerate(hybrid_nodes):
    print(f"  {i+1}. Score: {node.score:.4f}")
    print(f"     {node.text[:100]}...\n")

## 6. Query Expansion for Better Recall

Generate multiple query variations to improve recall:

In [None]:
# Query fusion with query generation
fusion_retriever = QueryFusionRetriever(
    retrievers=[vector_retriever],
    similarity_top_k=5,
    num_queries=4,  # Generate 4 query variations
    mode="reciprocal_rerank",
    use_async=True,
    verbose=True,
)

print("✓ Query fusion retriever ready!")

In [None]:
# Test with query expansion
query = "How do AI systems make decisions?"

print(f"Original Query: '{query}'\n")
print("(Watch for generated query variations...)\n")

expanded_nodes = fusion_retriever.retrieve(query)

print(f"\nRetrieved {len(expanded_nodes)} nodes after fusion:")
for i, node in enumerate(expanded_nodes):
    print(f"  {i+1}. Score: {node.score:.4f} - {node.text[:80]}...")

## 7. Custom Node Postprocessors

Create custom postprocessors for specific filtering logic:

In [None]:
from llama_index.core.postprocessor import BaseNodePostprocessor
from llama_index.core.schema import NodeWithScore
from typing import Optional

class KeywordBoostPostprocessor(BaseNodePostprocessor):
    """Boost scores for nodes containing specific keywords."""
    
    boost_keywords: List[str] = []
    boost_factor: float = 1.5
    
    def _postprocess_nodes(
        self,
        nodes: List[NodeWithScore],
        query_bundle: Optional[QueryBundle] = None,
    ) -> List[NodeWithScore]:
        for node in nodes:
            text_lower = node.text.lower()
            # Check if any boost keyword is present
            for keyword in self.boost_keywords:
                if keyword.lower() in text_lower:
                    node.score *= self.boost_factor
                    break
        
        # Re-sort by new scores
        return sorted(nodes, key=lambda x: x.score, reverse=True)

print("✓ Custom postprocessor defined!")

In [None]:
# Use the custom postprocessor
keyword_booster = KeywordBoostPostprocessor(
    boost_keywords=["neural", "deep learning"],
    boost_factor=2.0,
)

# Create query engine with postprocessor
boosted_engine = RetrieverQueryEngine.from_args(
    retriever=vector_retriever,
    node_postprocessors=[keyword_booster],
)

query = "How do advanced AI systems work?"
print(f"Query: {query}")
print("Boosting nodes with 'neural' or 'deep learning'\n")

response = boosted_engine.query(query)
print(f"Response: {response}")

## 8. Retrieval Evaluation

Measure retrieval quality with metrics like hit rate and MRR:

In [None]:
from llama_index.core.evaluation import RetrieverEvaluator

# Generate evaluation questions
eval_questions = [
    "What is machine learning?",
    "How does Python handle exceptions?",
    "What are neural networks?",
    "What is object-oriented programming?",
]

# We'll use a simple evaluation approach
def evaluate_retriever(retriever, questions, expected_keywords):
    """Simple retrieval evaluation based on keyword presence."""
    results = []
    
    for question, keywords in zip(questions, expected_keywords):
        nodes = retriever.retrieve(question)
        
        # Check if any retrieved node contains expected keywords
        hit = False
        for node in nodes:
            if any(kw.lower() in node.text.lower() for kw in keywords):
                hit = True
                break
        
        results.append({
            "question": question,
            "hit": hit,
            "num_nodes": len(nodes),
            "top_score": nodes[0].score if nodes else 0,
        })
    
    return results

# Expected keywords for each question
expected = [
    ["machine learning", "learn"],
    ["exception", "error", "try", "except"],
    ["neural", "network", "deep"],
    ["object", "class", "inheritance"],
]

print("Evaluating retriever...\n")

In [None]:
# Evaluate different retrievers
retrievers_to_eval = {
    "Vector": vector_retriever,
    "BM25": bm25_retriever,
    "Hybrid": hybrid_retriever,
}

print("Retrieval Evaluation Results")
print("=" * 60)

for name, retriever in retrievers_to_eval.items():
    results = evaluate_retriever(retriever, eval_questions, expected)
    hit_rate = sum(r["hit"] for r in results) / len(results)
    avg_score = sum(r["top_score"] for r in results) / len(results)
    
    print(f"\n{name} Retriever:")
    print(f"  Hit Rate: {hit_rate:.1%}")
    print(f"  Avg Top Score: {avg_score:.4f}")
    
    for r in results:
        status = "✓" if r["hit"] else "✗"
        print(f"    {status} {r['question'][:40]}... (score: {r['top_score']:.3f})")

## 9. Summary

You've learned advanced retrieval techniques in LlamaIndex:

### Key Takeaways

| Technique | When to Use | Impact |
|-----------|-------------|--------|
| **Custom Retriever** | Special filtering logic | Flexibility |
| **Reranking** | Improve top results | Quality |
| **Hybrid Search** | Balance semantic + keyword | Recall |
| **Query Expansion** | Improve recall | Coverage |
| **Postprocessors** | Custom scoring logic | Customization |

### Best Practices

1. **Start with vector retrieval** as baseline
2. **Add reranking** for quality-critical applications
3. **Use hybrid search** when exact terms matter
4. **Evaluate systematically** to measure improvements
5. **Profile latency** - complex retrieval adds time

### Next Steps

In the next notebook, we'll build chat engines with conversational memory.

---

## Exercises

1. **Custom filter**: Create a retriever that filters by document source

2. **Ensemble reranking**: Combine multiple reranking strategies

3. **Ablation study**: Measure impact of each component on quality

4. **Latency analysis**: Profile retrieval time for different configurations

In [None]:
# Exercise space
# Build your custom retrieval pipeline here!