# **LangChain `MultiVectorRetriever` Quick Reference**

## **Introduction**

The `MultiVectorRetriever` class allows for the retrieval of documents based on a set of multiple embeddings. This is beneficial for applications such as:

- **Chunking**: Splitting a document into smaller segments and embedding each segment, which can help in capturing semantic meaning while maintaining context.
- **Summarization**: Creating and embedding summaries of documents alongside the original content, facilitating quicker searches and retrievals.
- **Hypothetical Questions**: Embedding hypothetical questions relevant to the document, enhancing the retriever's ability to match queries with appropriate documents.

### Key Features

- **Parallel Invocation**: The `MultiVectorRetriever` can run its retrieval process in parallel using asynchronous methods, improving performance when dealing with large datasets.
- **Custom Search Types**: Users can specify different search types, such as similarity searches or Max Marginal Relevance (MMR), to tailor the retrieval process according to their needs.
- **Flexible Storage Options**: The retriever supports various storage backends for both the parent documents and their embeddings, allowing for greater flexibility in implementation.

### Use Cases

1. **Enhanced Document Retrieval**: By using multiple vectors, the retriever can return more relevant results based on varied representations of a document.
2. **Efficient Information Retrieval**: Ideal for applications requiring quick access to large volumes of information, such as chatbots or search engines that need to understand user queries deeply.

---

## Preparation

### Installing Required Libraries
This section installs the necessary Python libraries for working with LangChain, OpenAI embeddings, and Chroma vector store. These libraries include:
- `langchain-openai`: Provides integration with OpenAI's embedding models.
- `langchain_community`: Contains community-contributed modules and tools for LangChain.
- `langchain_experimental`: Includes experimental features and utilities for LangChain.
- `langchain-chroma`: Enables integration with the Chroma vector database.
- `chromadb`: The core library for the Chroma vector database.

In [None]:
!pip install -qU langchain-openai
!pip install -qU langchain_community
!pip install -qU langchain_experimental
!pip install -qU langchain-chroma>=0.1.2
!pip install -qU chromadb

### Initializing OpenAI Embeddings
This section demonstrates how to securely fetch an OpenAI API key using Kaggle's `UserSecretsClient` and initialize the OpenAI embedding model. The `OpenAIEmbeddings` class is used to create an embedding model instance, which will be used to convert text into numerical embeddings.

Key steps:
1. **Fetch API Key**: The OpenAI API key is securely retrieved using Kaggle's `UserSecretsClient`.
2. **Initialize Embeddings**: The `OpenAIEmbeddings` class is initialized with the `text-embedding-3-small` model and the fetched API key.

This setup ensures that the embedding model is ready for use in downstream tasks, such as caching embeddings or creating vector stores.

In [None]:
from langchain_openai import OpenAIEmbeddings
from kaggle_secrets import UserSecretsClient

# Fetch API key securely
user_secrets = UserSecretsClient()
my_api_key = user_secrets.get_secret("api-key-openai")

# Initialize OpenAI embeddings
embed = OpenAIEmbeddings(model="text-embedding-3-small", api_key=my_api_key)

---

## **1. Document Retrieval**

### **Basic Document Retrieval**
This example demonstrates how to retrieve relevant documents using a query. It initializes a `Chroma` vector store and an `InMemoryByteStore` for storing parent documents. Documents are added to both the vector store and byte store, and a query is used to retrieve relevant documents.

In [None]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.schema import Document
from langchain.storage import InMemoryByteStore
from langchain_core.load.dump import dumps  # For serializing documents

# Initialize vector store, byte store, and embeddings
vectorstore = Chroma(embedding_function=embed)
byte_store = InMemoryByteStore()  # Required for MultiVectorRetriever
retriever = MultiVectorRetriever(vectorstore=vectorstore, byte_store=byte_store)

# Add documents to the vector store and byte store
documents = [
    Document(page_content="LangChain is a framework for building LLM applications.", metadata={"doc_id": "1"})
]

# Add documents to the vector store
vectorstore.add_documents(documents)

# Add parent documents to the byte store (properly serialized)
for doc in documents:
    serialized_doc = dumps(doc)  # Serialize the Document object
    byte_store.mset([(doc.metadata["doc_id"], serialized_doc.encode("utf-8"))])

# Retrieve relevant documents
query = "What is LangChain?"
results = retriever.invoke(query)
print(results)

### **Retrieval with Metadata Filtering**
This example shows how to retrieve documents while filtering by metadata. Documents with specific metadata (e.g., `language: Python`) are added to the vector store and byte store. The retriever is then used to fetch documents that match the metadata filter.

In [None]:
# Add documents with metadata
documents = [
    Document(page_content="LangChain supports Python.", metadata={"doc_id": "2", "language": "Python"}),
    Document(page_content="LangChain also supports JavaScript.", metadata={"doc_id": "3", "language": "JavaScript"}),
]

# Add documents to the vector store
vectorstore.add_documents(documents)

# Add parent documents to the byte store (properly serialized)
for doc in documents:
    serialized_doc = dumps(doc)  # Serialize the Document object
    byte_store.mset([(doc.metadata["doc_id"], serialized_doc.encode("utf-8"))])

# Retrieve documents with metadata filtering
query = "What languages does LangChain support?"
results = retriever.invoke(query, search_kwargs={"filter": {"language": "Python"}})
print(results)

---

## **2. Batch Processing**

### **Batch Retrieval**
This example demonstrates how to retrieve documents for multiple queries in a batch. Documents are added to the vector store and byte store, and a list of queries is processed in a single batch. The results for all queries are returned together.

In [None]:
# Add documents to the vector store
documents = [
    Document(page_content="LangChain is a framework for LLM applications.", metadata={"doc_id": "4"}),
    Document(page_content="OpenAI provides powerful language models.", metadata={"doc_id": "5"}),
]

# Add documents to the vector store
vectorstore.add_documents(documents)

# Add parent documents to the byte store (properly serialized)
for doc in documents:
    serialized_doc = dumps(doc)  # Serialize the Document object
    byte_store.mset([(doc.metadata["doc_id"], serialized_doc.encode("utf-8"))])

# Batch retrieval
queries = ["What is LangChain?", "What does OpenAI provide?"]
results = retriever.batch(queries)
print(results)

### **Batch Retrieval with Custom Config**
This example shows how to use a custom configuration for batch retrieval. The `max_concurrency` parameter is used to control the number of parallel retrieval operations. This is useful for optimizing performance when processing a large number of queries.

In [None]:
# Batch retrieval with custom config
queries = ["What is LangChain?", "What does OpenAI provide?"]
results = retriever.batch(queries, config={"max_concurrency": 2})
print(results)

---

## **3. Streaming**

### **Streaming Retrieval Results**
This example demonstrates how to stream retrieval results in real-time. Documents are added to the vector store and byte store, and the retriever streams results as they are retrieved. This is useful for handling large datasets or real-time applications.

In [None]:
# Add documents to the vector store
documents = [
    Document(page_content="LangChain is a framework for LLM applications.", metadata={"doc_id": "6"}),
    Document(page_content="OpenAI provides powerful language models.", metadata={"doc_id": "7"}),
]

# Add documents to the vector store
vectorstore.add_documents(documents)

# Add parent documents to the byte store (properly serialized)
for doc in documents:
    serialized_doc = dumps(doc)  # Serialize the Document object
    byte_store.mset([(doc.metadata["doc_id"], serialized_doc.encode("utf-8"))])

# Stream retrieval results
query = "What is LangChain?"
for result in retriever.stream(query):
    print(result)

### **Streaming with Metadata**
This example shows how to stream results while including metadata. The `include_metadata` parameter is used to ensure that metadata is included in the streaming output. This is useful when additional context is needed for each retrieved document.

In [None]:
# Stream retrieval results with metadata
query = "What does OpenAI provide?"
for result in retriever.stream(query, search_kwargs={"include_metadata": True}):
    print(result)

---

## **4. Configuration and Customization**

### **Binding Arguments to the Retriever**
This example demonstrates how to bind additional arguments to the retriever. The `search_kwargs` parameter is used to customize the retrieval process, such as limiting the number of results (`k`). This allows for flexible configuration of the retriever.

In [None]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.schema import Document
from langchain.storage import InMemoryByteStore
from langchain_core.load.dump import dumps  # For serializing documents

# Initialize vector store, byte store
vectorstore = Chroma(embedding_function=embed)
byte_store = InMemoryByteStore()
retriever = MultiVectorRetriever(vectorstore=vectorstore, byte_store=byte_store)

# Add documents to the vector store and byte store
documents = [
    Document(page_content="LangChain is a framework for building LLM applications.", metadata={"doc_id": "1"})
]

# Add documents to the vector store
vectorstore.add_documents(documents)

# Add parent documents to the byte store (properly serialized)
for doc in documents:
    serialized_doc = dumps(doc)  # Serialize the Document object
    byte_store.mset([(doc.metadata["doc_id"], serialized_doc.encode("utf-8"))])

# Bind additional arguments
custom_retriever = retriever.bind(search_kwargs={"k": 3})  # Retrieve top 3 results

# Retrieve relevant documents
query = "What is LangChain?"
results = custom_retriever.invoke(query)
print(results)

### **Configurable Alternatives**
This example shows how to configure alternative retrievers at runtime. A `ConfigurableField` is used to define a default retriever and an alternative retriever. The retriever can be switched at runtime using a configuration parameter. This is useful for testing different retrieval strategies.

In [None]:
from langchain_core.runnables.utils import ConfigurableField

# Create a configurable retriever
configurable_retriever = retriever.configurable_alternatives(
    ConfigurableField(id="retriever"),
    default_key="default",
    alternative_retriever=MultiVectorRetriever(vectorstore=Chroma(embedding_function=embed), byte_store=InMemoryByteStore())
)

# Use the default retriever
print("Using Default Retriever:")
results = configurable_retriever.invoke("What is LangChain?")
print(results)

# Use the alternative retriever
print("\nUsing Alternative Retriever:")
results = configurable_retriever.with_config(configurable={"retriever": "alternative_retriever"}).invoke("What is LangChain?")
print(results)

---

## **5. Event Handling and Error Handling**

### **Adding Lifecycle Listeners**
This example demonstrates how to add synchronous lifecycle listeners to the retriever. The `on_start` and `on_end` listeners are used to track the start and end of retrieval operations. This is useful for logging or monitoring the retrieval process.

In [None]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.schema import Document
from langchain.storage import InMemoryByteStore
from langchain_core.load.dump import dumps

# Initialize vector store, byte store, and embeddings
vectorstore = Chroma(embedding_function=embed)
byte_store = InMemoryByteStore()
retriever = MultiVectorRetriever(vectorstore=vectorstore, byte_store=byte_store)

# Add documents to the vector store and byte store
documents = [
    Document(page_content="LangChain is a framework for building LLM applications.", metadata={"doc_id": "1"})
]

# Add documents to the vector store
vectorstore.add_documents(documents)

# Add parent documents to the byte store (properly serialized)
for doc in documents:
    serialized_doc = dumps(doc)  # Serialize the Document object
    byte_store.mset([(doc.metadata["doc_id"], serialized_doc.encode("utf-8"))])

# Define lifecycle listeners
def on_start(run):
    print(f"Retrieval started with input: {run.input}")

def on_end(run):
    print(f"Retrieval ended with output: {run.output}")

# Add listeners to the retriever
listener_retriever = retriever.with_listeners(on_start=on_start, on_end=on_end)

# Invoke the retriever with listeners
results = listener_retriever.invoke("What is LangChain?")
print(results)

### **Retry on Failure**
This example shows how to add retry logic to handle failures. The `with_retry` method is used to specify the number of retry attempts and the types of exceptions to handle. This ensures that transient failures do not disrupt the retrieval process.

In [None]:
# Add retry logic
retry_retriever = retriever.with_retry(stop_after_attempt=3, retry_if_exception_type=(Exception,))

# Invoke the retriever with retry logic
results = retry_retriever.invoke("What is LangChain?")
print(results)

### **Fallback Retriever**
This example demonstrates how to add a fallback retriever in case of failure. A fallback retriever is defined and added to the primary retriever. If the primary retriever fails, the fallback retriever is used as a backup. This provides redundancy and improves reliability.

In [None]:
from langchain.retrievers import MultiVectorRetriever

# Create a fallback retriever
fallback_retriever = MultiVectorRetriever(vectorstore=Chroma(embedding_function=embed), byte_store=InMemoryByteStore())

# Add fallback to the retriever
fallback_enabled_retriever = retriever.with_fallbacks([fallback_retriever])
results = fallback_enabled_retriever.invoke("What is LangChain?")
print(results)

---

## **6. Best Practices**

### **Example 1: Associating Summaries with a Document for Retrieval**

1. **Summarization Chain**:
   - A chain is created to summarize documents using an LLM (`ChatOpenAI`).
   - The chain takes a document's content, generates a summary, and outputs it as a string.
2. **Batch Summarization**:
   - The summarization chain is applied to a batch of documents (`docs`) with a concurrency limit of 5.
3. **Vector Store and Document Store**:
   - A `Chroma` vector store is initialized to store the summaries.
   - An `InMemoryByteStore` is used to store the original documents.
4. **Retriever Initialization**:
   - A `MultiVectorRetriever` is initialized to link summaries (stored in the vector store) with the original documents (stored in the document store).
5. **Querying**:
   - The retriever is queried with a search term (`"justice breyer"`), and it returns the relevant parent documents.

In [None]:
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain.storage import InMemoryByteStore
from langchain.retrievers import MultiVectorRetriever
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
import uuid

# Initialize OpenAI embeddings and LLM
embed = OpenAIEmbeddings(model="text-embedding-3-small", api_key=my_api_key)
model = ChatOpenAI(model="gpt-4o-mini", api_key=my_api_key)

# Define a list of documents to summarize and retrieve
docs = [
    Document(page_content="LangChain is a framework for building LLM applications.", metadata={"title": "LangChain Overview"}),
    Document(page_content="OpenAI provides powerful language models like GPT-4.", metadata={"title": "OpenAI Models"}),
    Document(page_content="Chroma is a vector store for embedding-based retrieval.", metadata={"title": "Chroma Vector Store"}),
]

# Define summarization chain
chain = (
    {"doc": lambda x: x.page_content}
    | ChatPromptTemplate.from_template("Summarize the following document:\n\n{doc}")
    | model
    | StrOutputParser()
)

# Generate summaries for documents
summaries = chain.batch(docs, {"max_concurrency": 5})

# Initialize vector store and document store
vectorstore = Chroma(collection_name="summaries", embedding_function=embed)
store = InMemoryByteStore()
id_key = "doc_id"

# Initialize retriever
retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=store,
    id_key=id_key,
)

# Generate unique IDs for documents
doc_ids = [str(uuid.uuid4()) for _ in docs]

# Create summary documents
summary_docs = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(summaries)
]

# Add summaries to vector store and original documents to document store
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

# Query the retriever
retrieved_docs = retriever.invoke("LangChain")
print(retrieved_docs)

### **Example 2: Hypothetical Queries for Improved Retrieval**

1. **Hypothetical Questions Chain**:
   - A chain is created to generate hypothetical questions for a document using an LLM (`ChatOpenAI`).
   - The chain uses a structured output (`HypotheticalQuestions`) to ensure the output is a list of questions.
2. **Batch Question Generation**:
   - The chain is applied to a batch of documents (`docs`) with a concurrency limit of 5.
3. **Vector Store and Document Store**:
   - A `Chroma` vector store is initialized to store the hypothetical questions.
   - An `InMemoryByteStore` is used to store the original documents.
4. **Retriever Initialization**:
   - A `MultiVectorRetriever` is initialized to link hypothetical questions (stored in the vector store) with the original documents (stored in the document store).
5. **Querying**:
   - The retriever is queried with a search term (`"justice breyer"`), and it returns the relevant parent documents.

In [None]:
from typing import List
from pydantic import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain.storage import InMemoryByteStore
from langchain.retrievers import MultiVectorRetriever
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
import uuid

# Define Pydantic model for hypothetical questions
class HypotheticalQuestions(BaseModel):
    """Generate hypothetical questions."""
    questions: List[str] = Field(..., description="List of questions")

# Initialize OpenAI embeddings and LLM
embed = OpenAIEmbeddings(model="text-embedding-3-small", api_key=my_api_key)
model = ChatOpenAI(model="gpt-4o-mini", api_key=my_api_key)

# Define chain to generate hypothetical questions
chain = (
    {"doc": lambda x: x.page_content}
    | ChatPromptTemplate.from_template(
        "Generate a list of exactly 3 hypothetical questions that the below document could be used to answer:\n\n{doc}"
    )
    | model.with_structured_output(HypotheticalQuestions)
    | (lambda x: x.questions)
)

# Generate hypothetical questions for documents
hypothetical_questions = chain.batch(docs, {"max_concurrency": 5})

# Initialize vector store and document store
vectorstore = Chroma(collection_name="hypo-questions", embedding_function=embed)
store = InMemoryByteStore()
id_key = "doc_id"

# Initialize retriever
retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=store,
    id_key=id_key,
)

# Generate unique IDs for documents
doc_ids = [str(uuid.uuid4()) for _ in docs]

# Create question documents
question_docs = []
for i, question_list in enumerate(hypothetical_questions):
    question_docs.extend(
        [Document(page_content=s, metadata={id_key: doc_ids[i]}) for s in question_list]
    )

# Add questions to vector store and original documents to document store
retriever.vectorstore.add_documents(question_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))

# Query the retriever
retrieved_docs = retriever.invoke("justice breyer")
print(retrieved_docs)

## **Conclusion**

The `MultiVectorRetriever` is a versatile and robust solution for advanced document retrieval tasks. By leveraging multiple representations of documents—such as summaries, chunks, or hypothetical questions—it significantly improves the accuracy and relevance of search results. Its integration with vector stores and document stores allows for efficient indexing and retrieval, while its customizable nature makes it adaptable to a wide range of use cases. Whether you're building a retrieval-augmented generation (RAG) system, a semantic search engine, or a document summarization tool, the `MultiVectorRetriever` provides the flexibility and power needed to deliver high-quality results. With its ability to handle complex retrieval scenarios, it stands as a key component in modern natural language processing workflows.