# Complete Guide to Retrieval Methods in LangChain

## What is Retrieval?
Retrieval is the process of finding the most relevant documents from a large collection based on a user's question. Think of it like a smart search engine that understands meaning, not just keywords.

**Why do we need different retrieval methods?**
- Different situations require different approaches
- Some methods are faster, others are more accurate
- Your choice depends on your specific use case

In this tutorial, we'll explore various retrieval methods and learn when to use each one.

## 1. Setup and Installation

Before we start, let's install the required packages and set up our environment.

**Why these packages?**
- `langchain_community`: Community-contributed components
- `langchain`: Core LangChain functionality
- `pypdf`: For reading PDF documents
- `langchain-openai`: OpenAI integrations (embeddings, LLMs)
- `chromadb`: Vector database for storing embeddings
- `lark`: Parser for query construction

In [None]:
# Install required packages
!pip install langchain_community
!pip install langchain
!pip install pypdf
!pip install langchain-openai
!pip install chromadb
!pip install lark

### Setting Up Environment Variables

**Why do we need API keys?**
- LangSmith: For tracing and debugging our applications
- OpenAI: For embeddings (converting text to vectors) and language models

**What are embeddings?**
Embeddings are numerical representations of text that capture semantic meaning. Words/sentences with similar meanings have similar embeddings.

In [None]:
import os
os.environ['LANGSMITH_TRACING'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = '<your-api-key>'  # Replace with your actual key
os.environ['OPENAI_API_KEY'] = '<your-api-key>'     # Replace with your actual key

## 2. Basic Vector Database Setup

**What is a Vector Database?**
A vector database stores documents as vectors (numerical representations) that capture their meaning. This allows us to find similar documents based on semantic similarity, not just keyword matching.

**Why Chroma?**
- Easy to use and set up
- Works well for small to medium datasets
- Supports persistence (saves data to disk)
- Good for learning and prototyping

In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

# Create embeddings object - this converts text to vectors
# Think of this as a translator that turns words into numbers
embedding = OpenAIEmbeddings()

# Set up the vector database directory
# This is where our vectors will be stored on disk
persist_directory = 'docs/chroma/'

# Create or load existing vector database
# If the directory exists, it loads the existing data
# If not, it creates a new empty database
vectordb = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding
)

# Check how many documents are in our database
print(f"Number of documents in database: {vectordb._collection.count()}")

## 3. Basic Similarity Search

**What is Similarity Search?**
Similarity search finds documents that are most similar to your query. It works by:
1. Converting your question to a vector
2. Comparing it with all document vectors
3. Returning the most similar ones

**When to use it?**
- When you want quick, straightforward results
- For general question-answering
- When you don't need diverse results

Let's start with a simple example using mushroom data to understand how similarity search works.

In [None]:
# Create sample documents about mushrooms
# These are our "knowledge base" - like having a small encyclopedia
texts = [
    "The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).",
    "A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.",
    "A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.",
]

# Create a small vector database from these texts
# This converts our text into vectors and stores them
smalldb = Chroma.from_texts(texts, embedding=embedding)

# Define our question
question = "Tell me about all-white mushrooms with large fruiting bodies"

# Perform similarity search - find the 2 most similar documents
# k=2 means "give me the top 2 most similar results"
results = smalldb.similarity_search(question, k=2)

print("🔍 Similarity Search Results:")
for i, doc in enumerate(results, 1):
    print(f"\nResult {i}:")
    print(f"{doc.page_content}")
    print(f"Metadata: {doc.metadata}")

**What happened?** 
The search found documents that mention "large fruiting body" and "all-white" mushrooms, which are most relevant to our question. Notice how it understood the meaning, not just exact word matches!

## 4. Maximum Marginal Relevance (MMR)

**What is the problem with regular similarity search?**
Sometimes similarity search returns very similar documents - like getting the same information repeated multiple times. This isn't very helpful!

**What is MMR?**
Maximum Marginal Relevance (MMR) tries to solve this by:
1. Finding relevant documents (like similarity search)
2. **AND** ensuring diversity among the results
3. Balancing relevance vs. diversity

**When to use MMR?**
- When you want diverse information
- When regular search gives you repetitive results
- When you need a comprehensive view of a topic

In [None]:
# Use MMR to get diverse results
# fetch_k=3 means "look at 3 candidates first"
# k=2 means "give me 2 final results"
mmr_results = smalldb.max_marginal_relevance_search(question, k=2, fetch_k=3)

print("🎯 Maximum Marginal Relevance Results:")
for i, doc in enumerate(mmr_results, 1):
    print(f"\nResult {i}:")
    print(f"{doc.page_content}")
    print(f"Metadata: {doc.metadata}")

**Notice the difference:** MMR gives us diverse information - one about physical characteristics and another about toxicity. This provides a more complete picture!

## 5. Comparing Regular vs MMR Search

Let's see how regular search can return duplicate content while MMR provides variety using real documents.

**Why are we loading PDFs?**
- Real documents are more complex than our simple mushroom examples
- PDFs often contain repetitive information
- This makes the difference between similarity search and MMR more obvious

**What is text splitting?**
Large documents need to be split into smaller chunks because:
- Most AI models have token limits
- Smaller chunks are easier to match with queries
- Better retrieval accuracy

In [None]:
# Import the text splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

source_path = r"<your-source-path>"  # Replace with your actual path

# Create a list of PDF files to load
# Note: We're intentionally adding the same PDF twice to show how RAG handles duplicates
loaders = [
    # First lecture PDF (added twice on purpose to simulate messy real-world data)
    PyPDFLoader(rf"{source_path}/MachineLearning-Lecture01.pdf"),
    PyPDFLoader(rf"{source_path}/MachineLearning-Lecture01.pdf"),  # Duplicate!
    # Additional lecture PDFs
    PyPDFLoader(rf"{source_path}/MachineLearning-Lecture02.pdf"),
    PyPDFLoader(rf"{source_path}/MachineLearning-Lecture03.pdf")
]

# Create an empty list to store all our document pages
docs = []

# Load each PDF and add its pages to our docs list
for loader in loaders:
    # Each PDF might have multiple pages, so we extend (not append) the list
    docs.extend(loader.load())

print(f"📄 Loaded {len(docs)} pages from {len(loaders)} PDFs")

### Text Splitting Strategy

**Why these specific settings?**
- `chunk_size=1500`: Each chunk is about 1500 characters (roughly 300-400 words)
- `chunk_overlap=150`: 150 characters overlap between chunks to maintain context

**What does overlap do?**
Overlap ensures that if a sentence or concept spans across chunks, it won't be completely lost.

In [None]:
# Create a text splitter with specific settings
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,      # Each chunk will be about 1500 characters long
    chunk_overlap=150     # Overlap between chunks to maintain context
)

# Split all documents into smaller chunks
splits = text_splitter.split_documents(docs)

print(f"📄 Created {len(splits)} chunks from {len(docs)} pages")

# Create a new vector database with our document chunks
vectordb = Chroma.from_documents(
    documents=splits,              # Our document chunks
    embedding=embedding,           # The embedding model to convert text to vectors
    persist_directory=persist_directory  # Where to save the database
)

print(f"💾 Vector database created with {vectordb._collection.count()} documents")

### Comparing Search Methods

Now let's test both similarity search and MMR on the same question to see the difference:

In [None]:
# Test with a question about MATLAB
question = "what did they say about matlab?"

print(f"Question: {question}")
print("\n" + "="*50 + "\n")

# Regular similarity search
docs_ss = vectordb.similarity_search(question, k=3)

print("📋 Regular Similarity Search - First 100 characters:")
for i, doc in enumerate(docs_ss, 1):
    print(f"Result {i}: {doc.page_content[:100]}...")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")
    print()

print("\n" + "="*50 + "\n")

# MMR search
docs_mmr = vectordb.max_marginal_relevance_search(question, k=3)

print("🎯 MMR Search - First 100 characters:")
for i, doc in enumerate(docs_mmr, 1):
    print(f"Result {i}: {doc.page_content[:100]}...")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")
    print()

**Key Insight:** Regular search might return very similar results (especially since we have duplicate PDFs), while MMR provides more diverse information from different sources or contexts.

## 6. Using Metadata for Specific Searches

**What is Metadata?**
Metadata is extra information about each document, like:
- Source file name
- Page number
- Creation date
- Author

**Why use metadata filtering?**
- Search within specific documents or sources
- Filter by date ranges
- Find information from particular authors
- Make searches more precise

**When to use metadata filtering?**
- When you know the source you want to search in
- When you need information from a specific time period
- When you want to compare information across different sources

In [None]:
# Search only in a specific lecture using metadata filter
question = "what did they say about regression in the third lecture?"

print(f"Question: {question}")
print("\n🔍 Searching only in Lecture 3...")

# Filter results to only include documents from lecture 3
# This is like saying "only search in this specific file"
docs = vectordb.similarity_search(
    question,
    k=3,
    # The filter matches documents with this exact source path
    filter={"source": rf"{source_path}/MachineLearning-Lecture03.pdf"}
)

print("📚 Search Results from Lecture 3 Only:")
for i, doc in enumerate(docs, 1):
    print(f"\nResult {i}:")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")
    print(f"Page: {doc.metadata.get('page', 'Unknown')}")
    print(f"Content preview: {doc.page_content[:200]}...")

**What happened?** 
We filtered results to only show content from the third lecture, making our search more specific. This is useful when you want to find information from a particular source or time period.

## 7. Self-Query Retriever (Smart Filtering)

**What's the problem with manual filtering?**
In the previous example, we had to manually specify the exact filter. But what if users ask questions like "What did they say about regression in the third lecture?" - they don't want to specify technical filters!

**What is Self-Query Retriever?**
Self-Query Retriever is smart because it:
1. Analyzes the user's question
2. Automatically determines what filters to apply
3. Extracts the actual search query
4. Applies filters based on the question's context

**When to use Self-Query?**
- When users ask questions that include filtering criteria
- When you want a more natural search experience
- When you have structured metadata that users might reference

In [None]:
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

# Define what metadata fields are available
# This tells the AI what filters it can use
metadata_field_info = [
    AttributeInfo(
        name="source",
        description=f"The lecture the chunk is from, should be one of `{source_path}/MachineLearning-Lecture01.pdf`, `{source_path}/MachineLearning-Lecture02.pdf`, or `{source_path}/MachineLearning-Lecture03.pdf`",
        type="string",
    ),
    AttributeInfo(
        name="page",
        description="The page from the lecture",
        type="integer",
    ),
]

# Describe what the documents contain
# This gives context to the AI about what it's searching through
document_content_description = "Lecture notes from machine learning classes"

# Create the language model for filtering
# This AI model will understand the question and create appropriate filters
llm = OpenAI(model='gpt-3.5-turbo-instruct', temperature=0)

# Create the self-query retriever
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectordb,
    document_content_description,
    metadata_field_info,
    verbose=True  # Show us what it's doing
)

print("🤖 Self-Query Retriever created successfully!")

### Testing Self-Query Retriever

Watch how the AI automatically understands "third lecture" and applies the appropriate filter:

In [None]:
# Test the self-query retriever
question = "what did they say about regression in the third lecture?"

print(f"Question: {question}")
print("\n🔍 Processing...")

# Get relevant documents - the AI will automatically filter for lecture 3
# Watch the verbose output to see how it constructs the query!
docs = retriever.get_relevant_documents(question)

print("\n📚 Results:")
for i, doc in enumerate(docs, 1):
    print(f"\nResult {i}:")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")
    print(f"Page: {doc.metadata.get('page', 'Unknown')}")
    print(f"Content preview: {doc.page_content[:200]}...")

**Amazing!** The AI automatically understood that "third lecture" means we want to filter by the third lecture source file. No manual filter specification needed!

View your langsmith dashboard to understand Query calls under the hood.

## 8. Contextual Compression

**What's the problem with long documents?**
Sometimes retrieved documents contain a lot of irrelevant information mixed with the relevant parts. This:
- Wastes tokens (costs more money)
- Makes it harder for the AI to find the important information
- Can lead to poorer responses

**What is Contextual Compression?**
Contextual compression:
1. Retrieves documents normally
2. Uses an AI model to extract only the relevant parts
3. Returns compressed, focused content

**When to use compression?**
- When your documents are long and contain mixed information
- When you want to reduce token usage
- When you need more focused, relevant responses
- When working with limited context windows

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# Function to display results nicely
def pretty_print_docs(docs):
    separator = "\n" + "-" * 100 + "\n"
    content = separator.join([f"Document {i+1}:\n\n{d.page_content}" for i, d in enumerate(docs)])
    print(content)

# Create the compressor
# This AI model will read the full document and extract only relevant parts
llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")
compressor = LLMChainExtractor.from_llm(llm)

# Wrap our vector database with compression
# This creates a "smart" retriever that compresses results
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,      # The AI that does the compression
    base_retriever=vectordb.as_retriever()  # Our regular retriever
)

print("🗜️ Compression retriever created!")

### Testing Compression

Let's see how compression extracts only the relevant parts:

In [None]:
# Test compression
question = "what did they say about matlab?"

print(f"Question: {question}")
print("\n🔍 Getting compressed results...")

# Get compressed documents
compressed_docs = compression_retriever.get_relevant_documents(question)

print("\n📄 Compressed Results:")
pretty_print_docs(compressed_docs)

print("\n" + "="*50 + "\n")

# Compare with uncompressed results
print("📄 Original (Uncompressed) Results for comparison:")
uncompressed_docs = vectordb.similarity_search(question, k=2)
for i, doc in enumerate(uncompressed_docs, 1):
    print(f"\nOriginal Document {i}:")
    print(f"{doc.page_content[:500]}...")  # First 500 characters

**Notice:** The compression removed irrelevant text and kept only the parts that answer our question about MATLAB. This makes the results more focused and saves tokens!

View your langsmith dashboard to understand Query calls under the hood.

## 9. Combining Techniques

**Why combine techniques?**
Different techniques solve different problems:
- MMR provides diversity
- Compression provides focus
- Combined: diverse AND focused results

**Best practices for combining:**
- Start with the base retrieval method (similarity or MMR)
- Add compression to reduce noise
- Use filtering when you need specific sources
- Test different combinations to see what works best

In [None]:
# Combine compression with MMR
# This gives us both diverse and concise results
compression_retriever_mmr = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type="mmr")  # Use MMR instead of similarity
)

# Test the combined approach
question = "what did they say about matlab?"

print(f"Question: {question}")
print("\n🔍 Getting compressed + MMR results...")

compressed_docs = compression_retriever_mmr.get_relevant_documents(question)

print("\n📄 Compressed + MMR Results:")
pretty_print_docs(compressed_docs)

**Best of both worlds:** We get diverse results (MMR) that are also concise (compression). This is often the best approach for complex queries!

## 10. Alternative Retrieval Methods

**Beyond embeddings:** Not all retrieval needs to use embeddings. Sometimes traditional methods work better!

**TF-IDF (Term Frequency-Inverse Document Frequency):**
- **What it is:** Measures how important a word is to a document
- **How it works:** Frequent words in a document but rare overall = important
- **When to use:** When you want exact keyword matching
- **Pros:** Fast, no need for embeddings, good for specific terms
- **Cons:** Misses semantic meaning ("car" vs "automobile")

**SVM (Support Vector Machine):**
- **What it is:** Machine learning classification for retrieval
- **How it works:** Learns to classify documents as relevant/irrelevant
- **When to use:** When you have training data about what's relevant
- **Pros:** Good for specific domains, can learn patterns
- **Cons:** Requires training data, more complex setup

In [None]:
from langchain.retrievers import SVMRetriever, TFIDFRetriever
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load a PDF document
loader = PyPDFLoader(rf"{source_path}/MachineLearning-Lecture01.pdf")
pages = loader.load()

# Combine all pages into one text
# This creates a single string from all pages
all_page_text = [p.page_content for p in pages]
joined_page_text = " ".join(all_page_text)

# Split the text into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,    # Each chunk is about 1500 characters
    chunk_overlap=150   # 150 characters overlap between chunks
)
splits = text_splitter.split_text(joined_page_text)

print(f"📄 Created {len(splits)} text chunks from the PDF")

# Create different types of retrievers
print("\n🔧 Creating retrievers...")

# SVM Retriever - uses machine learning classification
# Note: This will train a model on your data
svm_retriever = SVMRetriever.from_texts(splits, embedding)

# TF-IDF Retriever - uses word frequency statistics
# Note: This doesn't need embeddings!
tfidf_retriever = TFIDFRetriever.from_texts(splits)

print("✅ All retrievers created!")

### Testing Alternative Retrievers

Let's test each retriever to see how they perform:

In [None]:
# Test SVM retriever
question = "What are major topics for this class?"

print(f"Question: {question}")
print("\n🤖 SVM Retriever Result:")

docs_svm = svm_retriever.get_relevant_documents(question)
print(f"Content preview: {docs_svm[0].page_content[:300]}...")
print(f"\nNumber of results: {len(docs_svm)}")

print("\n" + "="*50 + "\n")

# Test TF-IDF retriever
question = "what did they say about matlab?"

print(f"Question: {question}")
print("\n📊 TF-IDF Retriever Result:")

docs_tfidf = tfidf_retriever.get_relevant_documents(question)
print(f"Content preview: {docs_tfidf[0].page_content[:300]}...")
print(f"\nNumber of results: {len(docs_tfidf)}")

print("\n" + "="*50 + "\n")

# Compare with embedding-based search
print("🔍 Embedding-based Similarity Search (for comparison):")
docs_embedding = vectordb.similarity_search(question, k=1)
print(f"Content preview: {docs_embedding[0].page_content[:300]}...")

**Comparison insights:**
- **TF-IDF:** Likely found exact matches for "matlab" keyword
- **SVM:** Used machine learning to classify relevance
- **Embedding:** Found semantically similar content (might catch variations like "MATLAB", "programming environment", etc.)

Each method has strengths depending on your use case!

## 11. Choosing the Right Method: Decision Guide

**How do you choose which method to use?** Here's a practical guide:

### Start with these questions:

1. **Do you need exact keyword matching?** → Use TF-IDF
2. **Do you need semantic understanding?** → Use embedding-based methods
3. **Are results too similar?** → Add MMR
4. **Are documents too long?** → Add compression
5. **Do you need source-specific results?** → Use metadata filtering or self-query

### Method Comparison Table

| Method | Best For | Pros | Cons | When to Use |
|--------|----------|------|------|-------------|
| **Similarity Search** | Quick, simple searches | Fast, easy to understand | May return similar results | General Q&A, simple queries |
| **MMR** | Diverse results needed | Reduces redundancy | Slightly slower | When you need comprehensive coverage |
| **Metadata Filtering** | Source-specific searches | Very precise | Requires good metadata | Known source constraints |
| **Self-Query** | Natural language filtering | User-friendly | Needs more setup | Complex queries with filters |
| **Compression** | Long documents | Saves tokens, focused results | Additional processing time | Detailed documents, cost control |
| **TF-IDF** | Keyword matching | No embeddings needed | Misses semantic meaning | Exact term searches |
| **SVM** | Classification-based | Good for specific domains | Requires training data | Domain-specific applications |

## 12. Practical Examples: Common Use Cases

Let's look at some real-world scenarios and which methods work best:

### Use Case 1: Customer Support Chatbot
**Scenario:** Help customers find answers in product documentation
**Best approach:** Similarity Search + Compression
**Why:** Fast responses, focused answers, handles varied ways of asking questions

In [None]:
# Example: Customer support scenario
customer_question = "How do I reset my password?"

print(f"Customer Question: {customer_question}")
print("\n🎧 Customer Support Response:")

# Use compression for focused, helpful answers
support_docs = compression_retriever.get_relevant_documents(customer_question)

# In a real chatbot, you'd feed this to an LLM for a natural response
print("Retrieved information:")
for i, doc in enumerate(support_docs[:1], 1):  # Just show first result
    print(f"\nRelevant content: {doc.page_content[:200]}...")

### Use Case 2: Research Assistant
**Scenario:** Help researchers find diverse information on a topic
**Best approach:** MMR + Self-Query
**Why:** Diverse results, can handle complex queries with filters

In [None]:
# Example: Research scenario
research_question = "What are the different approaches to machine learning mentioned in these lectures?"

print(f"Research Question: {research_question}")
print("\n🔬 Research Assistant Response:")

# Use MMR for diverse information
research_docs = vectordb.max_marginal_relevance_search(research_question, k=3)

print("Diverse research findings:")
for i, doc in enumerate(research_docs, 1):
    print(f"\nApproach {i}: {doc.page_content[:150]}...")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")

### Use Case 3: Legal Document Search
**Scenario:** Find specific legal precedents or clauses
**Best approach:** TF-IDF + Metadata Filtering
**Why:** Exact term matching important, need source-specific results

In [None]:
# Example: Legal scenario (simulated)
legal_question = "regression analysis"

print(f"Legal Search Term: {legal_question}")
print("\n⚖️ Legal Document Search:")

# Use TF-IDF for exact term matching
legal_docs = tfidf_retriever.get_relevant_documents(legal_question)

print("Exact term matches:")
for i, doc in enumerate(legal_docs[:2], 1):
    print(f"\nMatch {i}: {doc.page_content[:200]}...")
    # In real legal search, you'd also show metadata like case numbers, dates, etc.

## 13. Performance and Cost Considerations

**Why does this matter?**
Different methods have different costs and speeds. Choose based on your needs:

### Speed Ranking (Fastest to Slowest):
1. **TF-IDF** - No AI calls needed
2. **Similarity Search** - Simple vector comparison
3. **MMR** - Additional diversity calculations
4. **Self-Query** - Requires LLM call for query construction
5. **Compression** - Requires LLM call for each document

### Cost Ranking (Cheapest to Most Expensive):
1. **TF-IDF** - No API calls
2. **Similarity Search** - Only embedding API calls
3. **MMR** - Only embedding API calls
4. **Self-Query** - Embedding + LLM API calls
5. **Compression** - Embedding + LLM API calls for each result

### Quality Ranking (for semantic search):
1. **Compression + MMR** - Best quality, most focused
2. **MMR** - Good quality, diverse results
3. **Similarity Search** - Good quality, may have duplicates
4. **Self-Query** - Good quality, precise filtering
5. **TF-IDF** - Good for keywords, poor for semantic meaning

## 14. Troubleshooting Common Issues

### Problem 1: Results are too similar
**Solution:** Use MMR instead of similarity search

### Problem 2: Results are too long/verbose
**Solution:** Add compression to extract only relevant parts

### Problem 3: Can't find information from specific sources
**Solution:** Use metadata filtering or self-query retriever

### Problem 4: Missing obvious keyword matches
**Solution:** Try TF-IDF retriever or combine with similarity search

### Problem 5: Responses are too slow
**Solution:** Use simpler methods (similarity search, TF-IDF) and reduce k value

### Problem 6: Costs are too high
**Solution:** Use compression to reduce token usage, or switch to TF-IDF for keyword queries

## 15. Best Practices and Tips

### 1. Start Simple, Add Complexity
- Begin with basic similarity search
- Add MMR if results are too similar
- Add compression if documents are too long
- Add filtering if you need source-specific results

### 2. Optimize Your Chunks
- **Chunk size:** 1000-2000 characters usually works well
- **Overlap:** 10-20% of chunk size
- **Test different sizes** for your specific use case

### 3. Use Good Metadata
- Include source, date, author, section
- Make metadata human-readable
- Update metadata when documents change

### 4. Test with Real Queries
- Use actual user questions, not made-up ones
- Test edge cases and typos
- Measure success rate, not just technical metrics

### 5. Monitor and Iterate
- Track which queries work well/poorly
- Adjust chunk sizes based on performance
- Update retrieval methods as your data changes

## 16. Quick Reference: Code Templates

### Basic Setup Template

In [None]:
# Template for basic retrieval setup
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

# 1. Set up embeddings
embedding = OpenAIEmbeddings()

# 2. Split documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,
    chunk_overlap=150
)
splits = text_splitter.split_documents(documents)

# 3. Create vector database
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embedding,
    persist_directory="./chroma_db"
)

# 4. Basic search
results = vectordb.similarity_search("your question", k=3)
print("This template is ready to use!")

### Advanced Retrieval Template

In [None]:
# Template for advanced retrieval with compression and MMR
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.llms import OpenAI

# 1. Create compressor
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

# 2. Create compression retriever with MMR
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type="mmr")
)

# 3. Use it
results = compression_retriever.get_relevant_documents("your question")
print("Advanced retrieval template ready!")

## 17. Summary: When to Use Each Method

| Method | Best For | Pros | Cons |
|--------|----------|------|------|
| **Similarity Search** | Quick, simple searches | Fast, easy to understand | May return similar results |
| **MMR** | When you need diverse results | Reduces redundancy | Slightly slower |
| **Metadata Filtering** | Specific source/time searches | Very precise | Requires good metadata |
| **Self-Query** | Natural language filtering | User-friendly | Needs more setup |
| **Compression** | When documents are long | Saves tokens, focused results | Additional processing time |
| **TF-IDF** | Traditional keyword matching | No embeddings needed | Misses semantic meaning |
| **SVM** | Classification-based retrieval | Good for specific domains | Requires training data |

### Key Takeaways:
1. **Start simple** with similarity search
2. **Add complexity** only when you need it
3. **Choose methods** based on your specific use case
4. **Test with real queries** to validate performance
5. **Monitor and iterate** to improve over time

### Next Steps:
1. Try these methods with your own documents
2. Experiment with different chunk sizes
3. Combine methods for better results
4. Build a complete RAG application

Remember: The best retrieval method depends on your specific use case, data type, and performance requirements. Start with the basics and evolve your approach as you learn what works best for your application!