# **LangChain `MultiQueryRetriever` Quick Reference**

## **Introduction**

The `MultiQueryRetriever` is a powerful tool in the LangChain framework designed to enhance document retrieval by generating multiple queries from a single input query. Using a language model (LLM), it creates alternative versions of the original query, retrieves documents for each version, and returns the unique union of all retrieved documents. This approach helps overcome limitations in traditional retrieval methods, such as those relying solely on distance-based similarity search, by providing a more comprehensive set of results.

This article explores the capabilities of the `MultiQueryRetriever` through practical examples, covering key functionalities such as initialization, query generation, document retrieval, streaming, and advanced features like retry mechanisms and lifecycle listeners. Whether you're building a question-answering system, a knowledge base, or a search engine, the `MultiQueryRetriever` can significantly improve the relevance and diversity of your search results.

---

## Preparation

### Installing Required Libraries
This section installs the necessary Python libraries for working with LangChain, OpenAI embeddings, and Chroma vector store. These libraries include:
- `langchain-openai`: Provides integration with OpenAI's embedding models.
- `langchain_community`: Contains community-contributed modules and tools for LangChain.
- `langchain_experimental`: Includes experimental features and utilities for LangChain.
- `langchain-chroma`: Enables integration with the Chroma vector database.
- `chromadb`: The core library for the Chroma vector database.

In [None]:
!pip install -qU langchain-openai
!pip install -qU langchain_community
!pip install -qU langchain_experimental
!pip install -qU langchain-chroma>=0.1.2
!pip install -qU chromadb

### Initializing OpenAI Embeddings
This section demonstrates how to securely fetch an OpenAI API key using Kaggle's `UserSecretsClient` and initialize the OpenAI embedding model. The `OpenAIEmbeddings` class is used to create an embedding model instance, which will be used to convert text into numerical embeddings.

Key steps:
1. **Fetch API Key**: The OpenAI API key is securely retrieved using Kaggle's `UserSecretsClient`.
2. **Initialize Embeddings**: The `OpenAIEmbeddings` class is initialized with the `text-embedding-3-small` model and the fetched API key.

This setup ensures that the embedding model is ready for use in downstream tasks, such as caching embeddings or creating vector stores.

In [None]:
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from kaggle_secrets import UserSecretsClient

# Fetch API key securely
user_secrets = UserSecretsClient()
my_api_key = user_secrets.get_secret("api-key-openai")

# Initialize OpenAI embeddings
embed = OpenAIEmbeddings(model="text-embedding-3-small", api_key=my_api_key)

# Initialize LLM
model = ChatOpenAI(model="gpt-4o-mini", api_key=my_api_key)

---

## **1. Initialization and Configuration**

### **Example 1: Basic Initialization**
This example demonstrates how to initialize the `MultiQueryRetriever` with a vector store (`Chroma`) and an embedding model (`OpenAIEmbeddings`). It also adds sample documents to the vector store and retrieves relevant documents for a query.

In [None]:
from langchain_chroma import Chroma
from langchain.retrievers.multi_query import MultiQueryRetriever

# Initialize vector store and embeddings
vectorstore = Chroma(embedding_function=embed)
retriever = vectorstore.as_retriever()

# Initialize MultiQueryRetriever
multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=retriever,
    llm=model,
    include_original=True  # Include the original query
)

# Add documents to the vector store (for demonstration)
documents = [
    "Exercise improves cardiovascular health.",
    "A healthy diet reduces the risk of chronic diseases.",
    "Meditation helps reduce stress and anxiety."
]
vectorstore.add_texts(documents)

# Use the retriever to fetch documents using `invoke`
query = "What are the benefits of exercise?"
relevant_docs = multi_query_retriever.invoke(query)

print("Retrieved Documents:")
for doc in relevant_docs:
    print(doc.page_content)

### **Example 2: Custom Prompt Template**
This example shows how to use a custom prompt template with the `MultiQueryRetriever`. The custom prompt generates alternative versions of the input query, and the retriever fetches documents based on these queries.

In [None]:
from langchain_core.prompts import PromptTemplate

# Define a custom prompt template
custom_prompt = PromptTemplate(
    input_variables=["question"],
    template="Generate 3 different versions of this question: {question}"
)

# Initialize MultiQueryRetriever with custom prompt
multi_query_retriever_custom = MultiQueryRetriever.from_llm(
    retriever=retriever,
    llm=model,
    prompt=custom_prompt,
    include_original=False  # Exclude the original query
)

# Use the custom retriever to fetch documents
query = "How does meditation improve mental health?"
relevant_docs = multi_query_retriever_custom.get_relevant_documents(query)

print("Retrieved Documents (Custom Prompt):")
for doc in relevant_docs:
    print(doc.page_content)

---

## **2. Document Retrieval**

### **Example 1: Retrieve Documents for a Single Query**
This example retrieves documents relevant to a single query using the `get_relevant_documents` method.

In [None]:
query = "What are the benefits of exercise?"
relevant_docs = multi_query_retriever.get_relevant_documents(query)

print("Retrieved Documents:")
for doc in relevant_docs:
    print(doc.page_content)

### **Example 2: Retrieve Documents for Multiple Queries**
This example demonstrates how to retrieve documents for multiple queries in a loop. It processes each query individually and prints the retrieved documents.

In [None]:
queries = [
    "What are the benefits of exercise?",
    "How does meditation improve mental health?"
]

for query in queries:
    relevant_docs = multi_query_retriever.get_relevant_documents(query)
    print(f"Retrieved Documents for: {query}")
    for doc in relevant_docs:
        print(doc.page_content)

### **Example 3: Retrieve Unique Documents**
This example retrieves documents for a query and ensures that the results are unique using the `unique_union` method.

In [None]:
query = "What are the benefits of exercise?"
relevant_docs = multi_query_retriever.get_relevant_documents(query)
unique_docs = multi_query_retriever.unique_union(relevant_docs)

print("Unique Retrieved Documents:")
for doc in unique_docs:
    print(doc.page_content)

---

## **3. Invocation Methods**

### **Example 1: Use `invoke` for Single Query**
This example demonstrates how to use the `invoke` method to retrieve documents for a single query. This is the recommended way to retrieve documents synchronously.

In [None]:
query = "What are the benefits of exercise?"
relevant_docs = multi_query_retriever.invoke(query)

print("Retrieved Documents (via invoke):")
for doc in relevant_docs:
    print(doc.page_content)

### **Example 2: Use `batch` for Multiple Queries**
This example shows how to use the `batch` method to process multiple queries in parallel. It retrieves documents for each query and prints the results.

In [None]:
queries = [
    "What are the benefits of exercise?",
    "How does meditation improve mental health?"
]
batch_results = multi_query_retriever.batch(queries)

print("Batch Results:")
for i, result in enumerate(batch_results):
    print(f"Results for Query {i + 1}:")
    for doc in result:
        print(doc.page_content)

### **Example 3: Use `batch_as_completed` for Parallel Processing**
This example demonstrates how to use the `batch_as_completed` method to process multiple queries in parallel and yield results as they complete.

In [None]:
queries = [
    "What are the benefits of exercise?",
    "How does meditation improve mental health?"
]
for idx, result in multi_query_retriever.batch_as_completed(queries):
    print(f"Results for Query {idx + 1}:")
    for doc in result:
        print(doc.page_content)

---

## **4. Query Generation**

### **Example 1: Generate Queries for a Single Question**
This example demonstrates how to generate multiple queries from a single input question using the `generate_queries` method. It uses a callback manager for logging.


In [None]:
from langchain_core.callbacks.manager import CallbackManagerForRetrieverRun
from langchain_core.callbacks.base import BaseCallbackHandler
import uuid

# Define a question
question = "What are the benefits of a healthy diet?"

# Create a basic callback handler (optional)
class SimpleCallbackHandler(BaseCallbackHandler):
    def on_retriever_start(self, serialized, query, **kwargs):
        print(f"Retriever started with query: {query}")

# Initialize CallbackManagerForRetrieverRun
run_id = str(uuid.uuid4())  # Generate a unique run ID
handlers = [SimpleCallbackHandler()]  # Add your callback handlers
inheritable_handlers = []  # Inheritable handlers (optional)

run_manager = CallbackManagerForRetrieverRun(
    run_id=run_id,
    handlers=handlers,
    inheritable_handlers=inheritable_handlers
)

# Generate queries using MultiQueryRetriever
generated_queries = multi_query_retriever.generate_queries(
    question=question,
    run_manager=run_manager  # Provide the callback manager
)

print("Generated Queries:")
for query in generated_queries:
    print(query)

### **Example 2: Generate Queries for Multiple Questions**
This example demonstrates how to generate queries for multiple questions in a loop. It initializes a new callback manager for each question.


In [None]:
from langchain_core.callbacks.manager import CallbackManagerForRetrieverRun
from langchain_core.callbacks.base import BaseCallbackHandler
import uuid

# Define a list of questions
questions = [
    "What are the benefits of exercise?",
    "How does meditation improve mental health?"
]

# Create a basic callback handler (optional)
class SimpleCallbackHandler(BaseCallbackHandler):
    def on_retriever_start(self, serialized, query, **kwargs):
        print(f"Retriever started with query: {query}")

# Generate queries for each question
for question in questions:
    # Initialize CallbackManagerForRetrieverRun for each question
    run_id = str(uuid.uuid4())  # Generate a unique run ID
    handlers = [SimpleCallbackHandler()]  # Add your callback handlers
    inheritable_handlers = []  # Inheritable handlers (optional)

    run_manager = CallbackManagerForRetrieverRun(
        run_id=run_id,
        handlers=handlers,
        inheritable_handlers=inheritable_handlers
    )

    # Generate queries using MultiQueryRetriever
    generated_queries = multi_query_retriever.generate_queries(
        question=question,
        run_manager=run_manager  # Provide the callback manager
    )
    print(f"Generated Queries for: {question}")
    for query in generated_queries:
        print(query)

---

## **5. Retry Mechanism and Lifecycle Listeners**

### **Example 1: Add Retry Mechanism**
This example demonstrates how to add a retry mechanism to the `MultiQueryRetriever`. The retry mechanism will retry the operation up to 3 times if an exception occurs.

In [None]:
retriever_with_retry = multi_query_retriever.with_retry(
    retry_if_exception_type=(Exception,),  # Retry on any exception
    stop_after_attempt=3  # Maximum number of retries
)

query = "What are the benefits of exercise?"
relevant_docs = retriever_with_retry.invoke(query)

print("Retrieved Documents (with retry):")
for doc in relevant_docs:
    print(doc.page_content)

### **Example 2: Add Lifecycle Listeners**
This example shows how to add lifecycle listeners to the `MultiQueryRetriever`. The `on_start` and `on_end` listeners are triggered when the retriever starts and finishes processing a query, respectively.

In [None]:
def on_start(run_obj):
    print(f"Retriever started with input: {run_obj.input}")

def on_end(run_obj):
    print(f"Retriever finished with output: {run_obj.output}")

retriever_with_listeners = multi_query_retriever.with_listeners(
    on_start=on_start,
    on_end=on_end
)

query = "What are the benefits of exercise?"
relevant_docs = retriever_with_listeners.invoke(query)

print("Retrieved Documents (with listeners):")
for doc in relevant_docs:
    print(doc.page_content)

### **Example 3: Combine Retry and Listeners**
This example combines the retry mechanism and lifecycle listeners into a single retriever. The retriever will retry on exceptions and trigger the `on_start` and `on_end` listeners during its execution.

In [None]:
retriever_with_retry_and_listeners = multi_query_retriever.with_retry(
    retry_if_exception_type=(Exception,),
    stop_after_attempt=3
).with_listeners(
    on_start=on_start,
    on_end=on_end
)

query = "What are the benefits of exercise?"
relevant_docs = retriever_with_retry_and_listeners.invoke(query)

print("Retrieved Documents (with retry and listeners):")
for doc in relevant_docs:
    print(doc.page_content)

---

## **6. Good Practices**

### **Key Takeaways**

- **Building a Vector Database**: Load, split, and embed documents to create a searchable vector database.
- **Simple Usage of MultiQueryRetriever**: Use the retriever with a language model to generate multiple queries and retrieve documents.
- **Customizing the Prompt and Output Parser**: Define custom prompts and parsers to tailor the query generation process for specific use cases.

### **Code for Building a Sample Vector Database**
This code demonstrates how to build a vector database using a blog post as the data source. It loads the blog post, splits it into smaller chunks, and creates a vector database using `Chroma` and `OpenAIEmbeddings`.

In [None]:
# Build a sample vectorDB
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load blog post
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

# VectorDB
embedding = OpenAIEmbeddings(model="text-embedding-3-small", api_key=my_api_key)
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

### **Code for Simple Usage of MultiQueryRetriever**
This example shows how to use the `MultiQueryRetriever` with a pre-built vector database. It initializes the retriever with a language model (`ChatOpenAI`) and retrieves documents for a specific question. Logging is enabled to display the generated queries.

In [None]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

# Initial LLM
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=my_api_key)

question = "What are the approaches to Task Decomposition?"
retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(), llm=model)

# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

unique_docs = retriever_from_llm.invoke(question)
len(unique_docs)

### **Code for Customizing the Prompt and Output Parser**
This example demonstrates how to customize the prompt and output parser for the `MultiQueryRetriever`. It defines a custom prompt template and an output parser to generate multiple queries from a single input question. The retriever is then used to fetch documents based on the generated queries.

In [None]:
from typing import List
from langchain_core.output_parsers import BaseOutputParser
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field

# Output parser will split the LLM result into a list of queries
class LineListOutputParser(BaseOutputParser[List[str]]):
    """Output parser for a list of lines."""

    def parse(self, text: str) -> List[str]:
        lines = text.strip().split("\n")
        return list(filter(None, lines))  # Remove empty lines

output_parser = LineListOutputParser()

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five 
    different versions of the given user question to retrieve relevant documents from a vector 
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search. 
    Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

# Initial LLM
model = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=my_api_key)

# Chain
llm_chain = QUERY_PROMPT | model | output_parser

# Other inputs
question = "What are the approaches to Task Decomposition?"

# Run
retriever = MultiQueryRetriever(
    retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key="lines"
)  # "lines" is the key (attribute name) of the parsed output

# Results
unique_docs = retriever.invoke("What does the course say about regression?")
len(unique_docs)

## **Conclusion**

The `MultiQueryRetriever` is a versatile and robust component of the LangChain framework, offering advanced document retrieval capabilities through query generation and retrieval optimization. By generating multiple versions of a query and retrieving documents for each, it ensures a more comprehensive and diverse set of results, making it ideal for applications requiring high-quality search functionality.

Through the examples provided in this article, we’ve demonstrated how to initialize the retriever, generate queries, retrieve documents, and leverage advanced features like streaming, retry mechanisms, and lifecycle listeners. These tools empower developers to build more resilient and efficient retrieval systems, capable of handling complex queries and delivering accurate results.

Whether you're working on a small project or a large-scale application, the `MultiQueryRetriever` provides the flexibility and power needed to enhance your document retrieval workflows. By integrating these techniques into your projects, you can unlock new possibilities for improving search accuracy, user experience, and system reliability.