# Caching Embeddings in LangChain: Boosting Performance for NLP Applications

## Introduction

Embeddings are a fundamental component of modern natural language processing (NLP) systems, enabling machines to understand and process text data by converting it into numerical representations. However, computing embeddings for large datasets or frequently repeated texts can be computationally expensive and time-consuming. To address this challenge, **caching embeddings** has emerged as a powerful technique to optimize performance and resource utilization. By storing precomputed embeddings in a key-value store, systems can avoid redundant computations and significantly speed up operations.

The `CacheBackedEmbeddings` class in LangChain provides a seamless way to implement caching for embeddings. It acts as a wrapper around an embedding model, allowing embeddings to be cached in a variety of storage backends, such as in-memory stores or disk-based stores. This approach not only improves efficiency but also enhances scalability, making it ideal for applications that require repeated embedding computations, such as vector store creation, semantic search, and retrieval-augmented generation (RAG) systems.

In this guide, we explore how to use `CacheBackedEmbeddings` to cache embeddings, demonstrate its integration with vector stores like Chroma, and highlight its benefits through practical examples. Whether you're working with large datasets or building real-time NLP applications, caching embeddings can help you achieve faster and more efficient workflows.

---

## Preparation

### Installing Required Libraries
This section installs the necessary Python libraries for working with LangChain, OpenAI embeddings, and Chroma vector store. These libraries include:
- `langchain-openai`: Provides integration with OpenAI's embedding models.
- `langchain_community`: Contains community-contributed modules and tools for LangChain.
- `langchain_experimental`: Includes experimental features and utilities for LangChain.
- `langchain-chroma`: Enables integration with the Chroma vector database.
- `chromadb`: The core library for the Chroma vector database.

In [None]:
!pip install -qU langchain-openai
!pip install -qU langchain_community
!pip install -qU langchain_experimental
!pip install -qU langchain-chroma>=0.1.2
!pip install -qU chromadb

### Initializing OpenAI Embeddings
This section demonstrates how to securely fetch an OpenAI API key using Kaggle's `UserSecretsClient` and initialize the OpenAI embedding model. The `OpenAIEmbeddings` class is used to create an embedding model instance, which will be used to convert text into numerical embeddings.

Key steps:
1. **Fetch API Key**: The OpenAI API key is securely retrieved using Kaggle's `UserSecretsClient`.
2. **Initialize Embeddings**: The `OpenAIEmbeddings` class is initialized with the `text-embedding-3-small` model and the fetched API key.

This setup ensures that the embedding model is ready for use in downstream tasks, such as caching embeddings or creating vector stores.

In [None]:
from langchain_openai import OpenAIEmbeddings
from kaggle_secrets import UserSecretsClient

# # Fetch API key securely
# user_secrets = UserSecretsClient()
# my_api_key = user_secrets.get_secret("api-key-openai")

# # Initialize OpenAI embeddings
# embed = OpenAIEmbeddings(model="text-embedding-3-small", api_key=my_api_key)


embed = OpenAIEmbeddings(model="text-embedding-3-large", base_url="请输入地址",
                        api_key="sk-RapHwqOGWbKT68V1531b7011388549F3Bb4316EcF8Ac28De")

---

## CacheBackedEmbeddings

### Example 1: Using `embed_documents()`
This example demonstrates how to use the `embed_documents()` function to embed a list of texts. It caches the embeddings if they are not already in the cache.

### Example 2: Using `embed_query()`
This example demonstrates how to use the `embed_query()` function to embed a single query text. It caches the query embedding if a `query_embedding_store` is provided.

In [None]:
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import InMemoryStore

# Initialize the document embedding store (e.g., InMemoryStore)
document_embedding_store = InMemoryStore()

# Create the CacheBackedEmbeddings instance
cached_embedder = CacheBackedEmbeddings(underlying_embeddings=embed, document_embedding_store=document_embedding_store)

# List of texts to embed
texts = ["Hello, world!", "This is a test.", "Caching embeddings is useful."]

# Embed the documents
embeddings = cached_embedder.embed_documents(texts)

# Print the embeddings
for text, embedding in zip(texts, embeddings):
    print(f"Text: {text}\nEmbedding Length: {len(embedding)}\n")

In [None]:
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import InMemoryStore

# Initialize the document and query embedding stores (e.g., InMemoryStore)
document_embedding_store = InMemoryStore()
query_embedding_store = InMemoryStore()

# Create the CacheBackedEmbeddings instance with query caching enabled
cached_embedder = CacheBackedEmbeddings(
    underlying_embeddings=embed,
    document_embedding_store=document_embedding_store,
    query_embedding_store=query_embedding_store
)

# Query text to embed
query_text = "What is the meaning of life?"

# Embed the query
query_embedding = cached_embedder.embed_query(query_text)

# Print the query embedding
print(f"Query: {query_text}\nEmbedding Length: {len(query_embedding)}\n")

### Example 3: Using `from_bytes_store()`
This example demonstrates how to use the `from_bytes_store()` function to create a `CacheBackedEmbeddings` instance with a byte-based store.

In [None]:
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import InMemoryByteStore

# Initialize the byte-based document embedding store (e.g., InMemoryByteStore)
document_embedding_cache = InMemoryByteStore()

# Create the CacheBackedEmbeddings instance using from_bytes_store
cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embed,
    document_embedding_cache=document_embedding_cache,
    namespace="openai_embeddings"  # Optional namespace to avoid cache collisions
)

# List of texts to embed
texts = ["Caching embeddings with bytes store.", "This is another example."]

# Embed the documents
embeddings = cached_embedder.embed_documents(texts)

# Print the embeddings
for text, embedding in zip(texts, embeddings):
    print(f"Text: {text}\nEmbedding Length: {len(embedding)}\n")

---

## InMemoryByteStore

### Example 1: Using `mset()` to Store Key-Value Pairs
This example demonstrates how to use the `mset()` function to store multiple key-value pairs in the `InMemoryByteStore`.

In [None]:
from langchain.storage import InMemoryByteStore

# Initialize an empty store
store = InMemoryByteStore()

# Set multiple key-value pairs
store.mset([('key1', b'value1'), ('key2', b'value2'), ('key3', b'value3')])

# Verify the keys and values
print("Store after mset():", store.store)

### Example 2: Using `mget()` to Retrieve Values
This example demonstrates how to use the `mget()` function to retrieve values associated with multiple keys.

In [None]:
from langchain.storage import InMemoryByteStore

# Initialize a store with some data
store = InMemoryByteStore()
store.mset([('key1', b'value1'), ('key2', b'value2'), ('key3', b'value3')])

# Retrieve values for multiple keys
values = store.mget(['key1', 'key2', 'key4'])

# Print the retrieved values
print("Retrieved values:", values)

### Example 3: Using `mdelete()` to Delete Keys
This example demonstrates how to use the `mdelete()` function to delete specific keys and their associated values.

In [None]:
from langchain.storage import InMemoryByteStore

# Initialize a store with some data
store = InMemoryByteStore()
store.mset([('key1', b'value1'), ('key2', b'value2'), ('key3', b'value3')])

# Delete specific keys
store.mdelete(['key1', 'key3'])

# Verify the store after deletion
print("Store after mdelete():", store.store)

### Example 4: Combining `mset()`, `mget()`, and `mdelete()`
This example demonstrates how to use `mset()`, `mget()`, and `mdelete()` together.

In [None]:
from langchain.storage import InMemoryByteStore

# Initialize an empty store
store = InMemoryByteStore()

# Set multiple key-value pairs
store.mset([('key1', b'value1'), ('key2', b'value2'), ('key3', b'value3')])

# Retrieve values for multiple keys
values_before_deletion = store.mget(['key1', 'key2', 'key3'])
print("Values before deletion:", values_before_deletion)
# Output: [b'value1', b'value2', b'value3']

# Delete specific keys
store.mdelete(['key1', 'key3'])

# Retrieve values after deletion
values_after_deletion = store.mget(['key1', 'key2', 'key3'])
print("Values after deletion:", values_after_deletion)

### Example 5: Using `yield_keys()` to Iterate Over Keys
This example demonstrates how to use the `yield_keys()` function to iterate over keys in the store, optionally filtered by a prefix.

In [None]:
from langchain.storage import InMemoryByteStore

# Initialize a store with some data
store = InMemoryByteStore()
store.mset([('key1', b'value1'), ('key2', b'value2'), ('key3', b'value3'), ('other_key', b'other_value')])

# Iterate over all keys
print("All keys:")
for key in store.yield_keys():
    print(key)

In [None]:
# Iterate over keys with a specific prefix
print("Keys with prefix 'key':")
for key in store.yield_keys(prefix='key'):
    print(key)

---

## Caching Embeddings

### Part 1: Using `LocalFileStore` for Caching Embeddings

#### Purpose:
This section demonstrates how to cache embeddings on disk using `LocalFileStore` and create a vector store with **Chroma**. The caching mechanism avoids recomputing embeddings for the same text, significantly speeding up repeated operations.

#### Steps:
1. **Initialize the Cache**:
   - A `LocalFileStore` is created to store embeddings on disk in the `./cache/` directory.
   - A `CacheBackedEmbeddings` instance is created using the `from_bytes_store()` method, which wraps the embedding model and caches embeddings in the specified store.

2. **Load and Split Documents**:
   - A document (`state_of_the_union.txt`) is loaded using `TextLoader`.
   - The document is split into smaller chunks using `CharacterTextSplitter`.

3. **Create Chroma Vector Store**:
   - The `Chroma.from_documents()` method is used to create a vector store from the document chunks.
   - The time taken to create the vector store is measured using Python's `time` module.

4. **Reuse Cached Embeddings**:
   - The vector store is created again using the same documents and cached embeddings.
   - The time taken for the second creation is measured to demonstrate the performance improvement due to caching.

5. **Check Cached Embeddings**:
   - The keys of the cached embeddings are printed to verify that embeddings are being stored.

In [None]:
import os
import time
import shutil
from langchain.storage import LocalFileStore
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import CharacterTextSplitter

# Define the cache directory
cache_dir = "./cache/"

# Check if the cache folder exists and delete it if it does
if os.path.exists(cache_dir):
    print(f"Deleting existing cache folder: {cache_dir}")
    shutil.rmtree(cache_dir)

# Create a LocalFileStore for caching embeddings
store = LocalFileStore(cache_dir)

# Create a CacheBackedEmbeddings instance
cached_embedder = CacheBackedEmbeddings.from_bytes_store(embed, store, namespace=embed.model)

# Check the cache (it should be empty initially)
print("Initial cache keys:", list(store.yield_keys()))  # Output: []

# Load the document and split it into chunks
raw_documents = TextLoader("/kaggle/input/state-of-the-union-txt/state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

# Create the Chroma vector store with the cached embedder
start_time = time.time()  # Start timing
db = Chroma.from_documents(documents, cached_embedder, persist_directory="./chroma_db")
end_time = time.time()    # End timing
print(f"\nTime taken to create vector store (initial): {end_time - start_time:.2f} seconds")

# Try creating the vector store again (it should be much faster due to caching)
start_time = time.time()  # Start timing
db2 = Chroma.from_documents(documents, cached_embedder, persist_directory="./chroma_db")
end_time = time.time()    # End timing
print(f"Time taken to create vector store (cached): {end_time - start_time:.2f} seconds")

# Check some of the cached embeddings
print("\nCached embeddings keys:", list(store.yield_keys())[:5])

### Part 2: Swapping to `InMemoryByteStore` for Caching Embeddings

#### Purpose:
This section demonstrates how to use an in-memory store (`InMemoryByteStore`) for caching embeddings instead of a disk-based store. It also shows how to retrieve, inspect, and delete cached embeddings.

#### Steps:
1. **Initialize the In-Memory Cache**:
   - An `InMemoryByteStore` is created to store embeddings in memory.
   - A `CacheBackedEmbeddings` instance is created using the in-memory store.

2. **Embed Documents**:
   - A list of texts is embedded using the `embed_documents()` method.
   - The embeddings are cached in the in-memory store.

3. **Print Embeddings**:
   - The embeddings for the texts are printed (only the first 10 values are shown for brevity).

4. **Check Cache Keys**:
   - The keys of the cached embeddings are printed to verify that the embeddings have been stored.

5. **Retrieve Cached Embeddings**:
   - The cached embeddings are retrieved using the `mget()` method and printed (only the first 10 values are shown for brevity).

6. **Delete Keys from Cache**:
   - Some keys are deleted from the cache using the `mdelete()` method.
   - The remaining cache keys are printed to verify the deletion.

In [None]:
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import InMemoryByteStore

# Create an in-memory store
store = InMemoryByteStore()

# Create a CacheBackedEmbeddings instance with the in-memory store
cached_embedder = CacheBackedEmbeddings.from_bytes_store(embed, store, namespace=embed.model)

# List of texts to embed
texts = ["Caching embeddings with in-memory store.", "This is another example."]

# Embed the documents
embeddings_list = cached_embedder.embed_documents(texts)

# Print the embeddings
print("Embeddings for the texts:")
for text, embedding in zip(texts, embeddings_list):
    print(f"Text: {text}\nEmbedding (first 10 values): {embedding[:10]}\n")  # Print first 10 values for brevity

# Check the cache keys after embedding
print("Cache keys after embedding:")
cache_keys = list(store.yield_keys())
print(cache_keys)

# Retrieve cached embeddings using mget
print("\nRetrieving cached embeddings:")
cached_embeddings = store.mget(cache_keys)
for key, cached_embedding in zip(cache_keys, cached_embeddings):
    print(f"Key: {key}\nCached Embedding (first 10 values): {cached_embedding[:10]}\n")  # Print first 10 values for brevity

# Delete some keys from the cache
keys_to_delete = cache_keys[:1]  # Delete the first key
store.mdelete(keys_to_delete)

# Check the cache keys after deletion
print("Cache keys after deletion:")
print(list(store.yield_keys()))

## Conclusion

Caching embeddings is a game-changer for optimizing NLP workflows, especially when dealing with large datasets or repeated computations. By leveraging `CacheBackedEmbeddings`, developers can significantly reduce the time and resources required to compute embeddings, enabling faster and more scalable applications. Whether you're building a vector store, performing semantic search, or implementing retrieval-augmented generation, caching embeddings ensures that your system remains efficient and responsive.

The flexibility to use different storage backends, such as in-memory stores or disk-based stores, makes `CacheBackedEmbeddings` a versatile tool for a wide range of use cases. Additionally, features like namespace support and query embedding caching further enhance its utility, allowing developers to tailor the caching mechanism to their specific needs.

As NLP systems continue to grow in complexity and scale, techniques like caching embeddings will play an increasingly important role in ensuring optimal performance. By adopting these strategies, you can build faster, more efficient, and more reliable NLP applications that deliver value to users and stakeholders.