# LLM缓存

本笔记本介绍如何使用各种缓存方式来缓存单个 LLM 调用的结果。

首先,让我们安装一些依赖项

In [None]:
%pip install -qU langchain-openai langchain-community

In [2]:
import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass()

In [None]:
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI

# 为了让缓存效果更明显,我们使用较慢的较老的模型。
# 缓存也支持较新的聊天模型。
llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2)

## 内存缓存

In [3]:
from langchain_community.cache import InMemoryCache

set_llm_cache(InMemoryCache())

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 7.57 ms, sys: 8.22 ms, total: 15.8 ms
Wall time: 649 ms


"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
llm.invoke("Tell me a joke")

CPU times: user 551 µs, sys: 221 µs, total: 772 µs
Wall time: 1.23 ms


"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

## SQLite 缓存

In [6]:
!rm .langchain.db

In [None]:
# 我们可以用 SQLite 缓存来做相同的事情
from langchain_community.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 12.6 ms, sys: 3.51 ms, total: 16.1 ms
Wall time: 486 ms


"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
llm.invoke("Tell me a joke")

CPU times: user 52.6 ms, sys: 57.7 ms, total: 110 ms
Wall time: 113 ms


"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

## Upstash Redis 缓存

### 标准缓存
使用 [Upstash Redis](https://upstash.com) 通过无服务器 HTTP API 缓存提示和响应。

In [None]:
%pip install -qU upstash_redis

In [11]:
import langchain
from langchain_community.cache import UpstashRedisCache
from upstash_redis import Redis

URL = "<UPSTASH_REDIS_REST_URL>"
TOKEN = "<UPSTASH_REDIS_REST_TOKEN>"

langchain.llm_cache = UpstashRedisCache(redis_=Redis(url=URL, token=TOKEN))

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 7.56 ms, sys: 2.98 ms, total: 10.5 ms
Wall time: 1.14 s


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
llm.invoke("Tell me a joke")

CPU times: user 2.78 ms, sys: 1.95 ms, total: 4.73 ms
Wall time: 82.9 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

### 语义缓存

使用 [Upstash Vector](https://upstash.com/docs/vector/overall/whatisvector) 进行语义相似度搜索并在数据库中缓存最相似的响应。在创建 Upstash Vector 数据库时,向量化由选定的嵌入模型自动完成。

In [None]:
%pip install upstash-semantic-cache

In [11]:
from langchain.globals import set_llm_cache
from upstash_semantic_cache import SemanticCache

In [12]:
UPSTASH_VECTOR_REST_URL = "<UPSTASH_VECTOR_REST_URL>"
UPSTASH_VECTOR_REST_TOKEN = "<UPSTASH_VECTOR_REST_TOKEN>"

cache = SemanticCache(
    url=UPSTASH_VECTOR_REST_URL, token=UPSTASH_VECTOR_REST_TOKEN, min_proximity=0.7
)

In [15]:
set_llm_cache(cache)

In [16]:
%%time
llm.invoke("Which city is the most crowded city in the USA?")

CPU times: user 28.4 ms, sys: 3.93 ms, total: 32.3 ms
Wall time: 1.89 s


'\n\nNew York City is the most crowded city in the USA.'

In [17]:
%%time
llm.invoke("Which city has the highest population in the USA?")

CPU times: user 3.22 ms, sys: 940 μs, total: 4.16 ms
Wall time: 97.7 ms


'\n\nNew York City is the most crowded city in the USA.'

## Redis 缓存

详见主要的 [Redis 缓存文档](/docs/integrations/caches/redis_llm_caching/)。

### 标准缓存
使用 [Redis](/docs/integrations/providers/redis) 来缓存提示和响应。

In [None]:
%pip install -qU redis

In [None]:
# 我们可以用 Redis 缓存来做相同的事情
# (在运行此示例之前,请确保您的本地 Redis 实例已启动)
from langchain_community.cache import RedisCache
from redis import Redis

set_llm_cache(RedisCache(redis_=Redis()))

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 6.88 ms, sys: 8.75 ms, total: 15.6 ms
Wall time: 1.04 s


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
llm.invoke("Tell me a joke")

CPU times: user 1.59 ms, sys: 610 µs, total: 2.2 ms
Wall time: 5.58 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

### 语义缓存
使用 [Redis](/docs/integrations/providers/redis) 来缓存提示和响应,并基于语义相似度评估命中。

In [None]:
%pip install -qU redis

In [10]:
from langchain_community.cache import RedisSemanticCache
from langchain_openai import OpenAIEmbeddings

set_llm_cache(
    RedisSemanticCache(redis_url="redis://localhost:6379", embedding=OpenAIEmbeddings())
)

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 351 ms, sys: 156 ms, total: 507 ms
Wall time: 3.37 s


"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

In [None]:
%%time
# 第二次调用时,虽然不是直接命中,但问题与原始问题在语义上相似,
# 因此使用了缓存结果!
llm.invoke("Tell me one joke")

CPU times: user 6.25 ms, sys: 2.72 ms, total: 8.97 ms
Wall time: 262 ms


"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."

## `GPTCache`

我们可以使用 [GPTCache](https://github.com/zilliztech/GPTCache) 进行精确匹配缓存或基于语义相似性缓存结果

首先让我们从精确匹配的示例开始

In [None]:
%pip install -qU gptcache

In [5]:
import hashlib

from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain_community.cache import GPTCache


def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()


def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    cache_obj.init(
        pre_embedding_func=get_prompt,
        data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),
    )


set_llm_cache(GPTCache(init_gptcache))

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 21.5 ms, sys: 21.3 ms, total: 42.8 ms
Wall time: 6.2 s


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
llm.invoke("Tell me a joke")

CPU times: user 571 µs, sys: 43 µs, total: 614 µs
Wall time: 635 µs


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

现在让我们展示一个相似性缓存的示例

In [9]:
import hashlib

from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain_community.cache import GPTCache


def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()


def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")


set_llm_cache(GPTCache(init_gptcache))

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 1.42 s, sys: 279 ms, total: 1.7 s
Wall time: 8.44 s


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

In [None]:
%%time
# 这是一个精确匹配,因此在缓存中找到
llm.invoke("Tell me a joke")

CPU times: user 866 ms, sys: 20 ms, total: 886 ms
Wall time: 226 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

In [None]:
%%time
# 这不是一个精确匹配,但在语义距离内,因此命中!
llm.invoke("Tell me joke")

CPU times: user 853 ms, sys: 14.8 ms, total: 868 ms
Wall time: 224 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

## `MongoDB Atlas` 缓存

[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) 是一个完全托管的云数据库,可在 AWS、Azure 和 GCP 中使用。它对 MongoDB 文档数据具有原生支持
向量搜索。
使用 [MongoDB Atlas Vector Search](/docs/integrations/providers/mongodb_atlas) 来语义缓存提示和响应。

### 标准缓存

标准缓存是 MongoDB 中的简单缓存。它不使用语义缓存,也不需要在生成之前对集合进行索引。

要导入此缓存,首先安装所需的依赖项:

```bash
%pip install -qU langchain-mongodb
```

```python
from langchain_mongodb.cache import MongoDBCache
```


要将此缓存与您的 LLM 一起使用:
```python
from langchain_core.globals import set_llm_cache

# 使用任何嵌入提供者...
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings

mongodb_atlas_uri = "<YOUR_CONNECTION_STRING>"
COLLECTION_NAME="<YOUR_CACHE_COLLECTION_NAME>"
DATABASE_NAME="<YOUR_DATABASE_NAME>"

set_llm_cache(MongoDBCache(
    connection_string=mongodb_atlas_uri,
    collection_name=COLLECTION_NAME,
    database_name=DATABASE_NAME,
))
```


### 语义缓存

语义缓存允许基于用户输入与之前缓存结果之间的语义相似性检索缓存的提示。在底层,它将 MongoDBAtlas 混合为缓存和向量存储。
MongoDBAtlasSemanticCache 继承自 `MongoDBAtlasVectorSearch`,需要定义 Atlas Vector Search Index 才能工作。请查看 [使用示例](/docs/integrations/vectorstores/mongodb_atlas) 了解如何设置索引。

要导入此缓存:
```python
from langchain_mongodb.cache import MongoDBAtlasSemanticCache
```

要将此缓存与您的 LLM 一起使用:
```python
from langchain_core.globals import set_llm_cache

# 使用任何嵌入提供者...
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings

mongodb_atlas_uri = "<YOUR_CONNECTION_STRING>"
COLLECTION_NAME="<YOUR_CACHE_COLLECTION_NAME>"
DATABASE_NAME="<YOUR_DATABASE_NAME>"

set_llm_cache(MongoDBAtlasSemanticCache(
    embedding=FakeEmbeddings(),
    connection_string=mongodb_atlas_uri,
    collection_name=COLLECTION_NAME,
    database_name=DATABASE_NAME,
))
```

要查找有关使用 MongoDBSemanticCache 的更多资源,请访问 [此处](https://www.mongodb.com/blog/post/introducing-semantic-caching-dedicated-mongodb-lang-chain-package-gen-ai-apps)

## `Momento` 缓存
使用 [Momento](/docs/integrations/providers/momento) 缓存提示和响应。

需要安装 `momento` 包:

In [None]:
%pip install -qU momento

您需要获取 Momento 身份验证令牌才能使用此类。这可以直接传递给 momento.CacheClient,如果您希望直接实例化它,作为命名参数 `auth_token` 传递给 `MomentoChatMessageHistory.from_client_params`,或者可以设置为环境变量 `MOMENTO_AUTH_TOKEN`。

In [9]:
from datetime import timedelta

from langchain_community.cache import MomentoCache

cache_name = "langchain"
ttl = timedelta(days=1)
set_llm_cache(MomentoCache.from_client_params(cache_name, ttl))

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 40.7 ms, sys: 16.5 ms, total: 57.2 ms
Wall time: 1.73 s


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
# 当在与缓存相同的区域运行时,延迟为个位数毫秒
llm.invoke("Tell me a joke")

CPU times: user 3.16 ms, sys: 2.98 ms, total: 6.14 ms
Wall time: 57.9 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

## `SQLAlchemy` 缓存

您可以使用 `SQLAlchemyCache` 缓存任何 `SQLAlchemy` 支持的 SQL 数据库。

### 标准缓存

In [None]:
from langchain.cache import SQLAlchemyCache
from sqlalchemy import create_engine

engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
set_llm_cache(SQLAlchemyCache(engine))

### 自定义 SQLAlchemy 模式

您可以定义自己的声明性 `SQLAlchemyCache` 子类以自定义用于缓存的模式。例如,要支持使用 `Postgres` 的高速全文提示索引,请使用:

In [None]:
from langchain_community.cache import SQLAlchemyCache
from sqlalchemy import Column, Computed, Index, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy_utils import TSVectorType

Base = declarative_base()


class FulltextLLMCache(Base):  # type: ignore
    """Postgres 表用于全文索引的 LLM 缓存"""

    __tablename__ = "llm_cache_fulltext"
    id = Column(Integer, Sequence("cache_id"), primary_key=True)
    prompt = Column(String, nullable=False)
    llm = Column(String, nullable=False)
    idx = Column(Integer)
    response = Column(String)
    prompt_tsv = Column(
        TSVectorType(),
        Computed("to_tsvector('english', llm || ' ' || prompt)", persisted=True),
    )
    __table_args__ = (
        Index("idx_fulltext_prompt_tsv", prompt_tsv, postgresql_using="gin"),
    )


engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
set_llm_cache(SQLAlchemyCache(engine, FulltextLLMCache))

## `Cassandra` 缓存

> [Apache Cassandra®](https://cassandra.apache.org/) 是一个 NoSQL、面向行、高度可扩展和高度可用的数据库。从 5.0 版本开始,数据库附带了 [向量搜索功能](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html)。

您可以使用 Cassandra 缓存 LLM 响应,从精确匹配的 `CassandraCache` 或基于向量相似性的 `CassandraSemanticCache` 中进行选择。

让我们看看两者的实际应用。接下来的单元格将指导您完成所需的设置,后续单元格展示了两种可用的缓存类。

所需依赖项:

In [None]:
%pip install -qU "cassio>=0.1.4"

### 连接到数据库

此页面中显示的 Cassandra 缓存可以与 Cassandra 以及其他派生数据库(如 Astra DB)一起使用,这些数据库使用 CQL(Cassandra Query Language)协议。

> DataStax [Astra DB](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) 是一个基于 Cassandra 构建的托管无服务器数据库,提供相同的接口和优势。

根据您是通过 CQL 连接到 Cassandra 集群还是 Astra DB,您将在实例化缓存时提供不同的参数(通过初始化 CassIO 连接)。

#### 连接到 Cassandra 集群

您首先需要创建一个 `cassandra.cluster.Session` 对象,如 [Cassandra 驱动程序文档](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster) 中所述。细节可能会有所不同(例如网络设置和身份验证),但可能类似于以下内容:

In [1]:
from cassandra.cluster import Cluster

cluster = Cluster(["127.0.0.1"])
session = cluster.connect()

您现在可以将会话以及您想要的键空间名称设置为全局 CassIO 参数:

In [2]:
import cassio

CASSANDRA_KEYSPACE = input("CASSANDRA_KEYSPACE = ")

cassio.init(session=session, keyspace=CASSANDRA_KEYSPACE)

CASSANDRA_KEYSPACE =  demo_keyspace


#### 通过 CQL 连接到 Astra DB

在这种情况下,您使用以下连接参数初始化 CassIO:

- 数据库 ID,例如 `01234567-89ab-cdef-0123-456789abcdef`
- 令牌,例如 `AstraCS:6gBhNmsk135....`(必须是“数据库管理员”令牌)
- 可选的键空间名称(如果省略,将使用数据库的默认键空间)

In [12]:
import getpass

ASTRA_DB_ID = input("ASTRA_DB_ID = ")
ASTRA_DB_APPLICATION_TOKEN = getpass.getpass("ASTRA_DB_APPLICATION_TOKEN = ")

desired_keyspace = input("ASTRA_DB_KEYSPACE (optional, can be left empty) = ")
if desired_keyspace:
    ASTRA_DB_KEYSPACE = desired_keyspace
else:
    ASTRA_DB_KEYSPACE = None

ASTRA_DB_ID =  01234567-89ab-cdef-0123-456789abcdef
ASTRA_DB_APPLICATION_TOKEN =  ········
ASTRA_DB_KEYSPACE (optional, can be left empty) =  my_keyspace


In [13]:
import cassio

cassio.init(
    database_id=ASTRA_DB_ID,
    token=ASTRA_DB_APPLICATION_TOKEN,
    keyspace=ASTRA_DB_KEYSPACE,
)

### 标准缓存

这将避免在提供的提示与之前遇到的提示完全相同时调用 LLM:

In [3]:
from langchain_community.cache import CassandraCache
from langchain_core.globals import set_llm_cache

set_llm_cache(CassandraCache())

In [9]:
%%time

print(llm.invoke("Why is the Moon always showing the same side?"))



The Moon is tidally locked with the Earth, which means that its rotation on its own axis is synchronized with its orbit around the Earth. This results in the Moon always showing the same side to the Earth. This is because the gravitational forces between the Earth and the Moon have caused the Moon's rotation to slow down over time, until it reached a point where it takes the same amount of time for the Moon to rotate on its axis as it does to orbit around the Earth. This phenomenon is common among satellites in close orbits around their parent planets and is known as tidal locking.
CPU times: user 92.5 ms, sys: 8.89 ms, total: 101 ms
Wall time: 1.98 s


In [10]:
%%time

print(llm.invoke("Why is the Moon always showing the same side?"))



The Moon is tidally locked with the Earth, which means that its rotation on its own axis is synchronized with its orbit around the Earth. This results in the Moon always showing the same side to the Earth. This is because the gravitational forces between the Earth and the Moon have caused the Moon's rotation to slow down over time, until it reached a point where it takes the same amount of time for the Moon to rotate on its axis as it does to orbit around the Earth. This phenomenon is common among satellites in close orbits around their parent planets and is known as tidal locking.
CPU times: user 5.51 ms, sys: 0 ns, total: 5.51 ms
Wall time: 5.78 ms


### 语义缓存

此缓存将进行语义相似性搜索,如果找到足够相似的缓存条目,则返回命中。为此,您需要提供一个您选择的 `Embeddings` 实例。

In [14]:
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()

In [17]:
from langchain_community.cache import CassandraSemanticCache
from langchain_core.globals import set_llm_cache

set_llm_cache(
    CassandraSemanticCache(
        embedding=embedding,
        table_name="my_semantic_cache",
    )
)

In [19]:
%%time

print(llm.invoke("Why is the Moon always showing the same side?"))



The Moon is always showing the same side because of a phenomenon called synchronous rotation. This means that the Moon rotates on its axis at the same rate that it orbits around the Earth, which takes approximately 27.3 days. This results in the same side of the Moon always facing the Earth. This is due to the gravitational forces between the Earth and the Moon, which have caused the Moon's rotation to gradually slow down and become synchronized with its orbit. This is a common occurrence among many moons in our solar system.
CPU times: user 49.5 ms, sys: 7.38 ms, total: 56.9 ms
Wall time: 2.55 s


In [20]:
%%time

print(llm.invoke("How come we always see one face of the moon?"))



The Moon is always showing the same side because of a phenomenon called synchronous rotation. This means that the Moon rotates on its axis at the same rate that it orbits around the Earth, which takes approximately 27.3 days. This results in the same side of the Moon always facing the Earth. This is due to the gravitational forces between the Earth and the Moon, which have caused the Moon's rotation to gradually slow down and become synchronized with its orbit. This is a common occurrence among many moons in our solar system.
CPU times: user 21.2 ms, sys: 3.38 ms, total: 24.6 ms
Wall time: 532 ms


**归属声明:**

>`Apache Cassandra`, `Cassandra` 和 `Apache` 是 [Apache 软件基金会](http://www.apache.org/) 在美国和/或其他国家的注册商标或商标。

## `Astra DB` 缓存

您可以轻松地使用 [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) 作为 LLM 缓存,无论是“精确”还是“基于语义”的缓存。

确保您有一个正在运行的数据库(它必须是启用向量的数据库才能使用语义缓存),并在您的 Astra 仪表板上获取所需的凭据:

- API 端点看起来像 `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`
- 令牌看起来像 `AstraCS:6gBhNmsk135....`

In [3]:
%pip install -qU langchain_astradb

import getpass

ASTRA_DB_API_ENDPOINT = input("ASTRA_DB_API_ENDPOINT = ")
ASTRA_DB_APPLICATION_TOKEN = getpass.getpass("ASTRA_DB_APPLICATION_TOKEN = ")

ASTRA_DB_API_ENDPOINT =  https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com
ASTRA_DB_APPLICATION_TOKEN =  ········


### 标准缓存

这将避免在提供的提示与之前遇到的提示完全相同时调用 LLM:

In [7]:
from langchain.globals import set_llm_cache
from langchain_astradb import AstraDBCache

set_llm_cache(
    AstraDBCache(
        api_endpoint=ASTRA_DB_API_ENDPOINT,
        token=ASTRA_DB_APPLICATION_TOKEN,
    )
)

In [8]:
%%time

print(llm.invoke("Is a true fakery the same as a fake truth?"))



There is no definitive answer to this question as it depends on the interpretation of the terms "true fakery" and "fake truth". However, one possible interpretation is that a true fakery is a counterfeit or imitation that is intended to deceive, whereas a fake truth is a false statement that is presented as if it were true.
CPU times: user 70.8 ms, sys: 4.13 ms, total: 74.9 ms
Wall time: 2.06 s


In [9]:
%%time

print(llm.invoke("Is a true fakery the same as a fake truth?"))



There is no definitive answer to this question as it depends on the interpretation of the terms "true fakery" and "fake truth". However, one possible interpretation is that a true fakery is a counterfeit or imitation that is intended to deceive, whereas a fake truth is a false statement that is presented as if it were true.
CPU times: user 15.1 ms, sys: 3.7 ms, total: 18.8 ms
Wall time: 531 ms


### 语义缓存

此缓存将进行语义相似性搜索,如果找到足够相似的缓存条目,则返回命中。为此,您需要提供一个您选择的 `Embeddings` 实例。

In [10]:
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()

In [11]:
from langchain_astradb import AstraDBSemanticCache

set_llm_cache(
    AstraDBSemanticCache(
        api_endpoint=ASTRA_DB_API_ENDPOINT,
        token=ASTRA_DB_APPLICATION_TOKEN,
        embedding=embedding,
        collection_name="demo_semantic_cache",
    )
)

In [12]:
%%time

print(llm.invoke("Are there truths that are false?"))



There is no definitive answer to this question since it presupposes a great deal about the nature of truth itself, which is a matter of considerable philosophical debate. It is possible, however, to construct scenarios in which something could be considered true despite being false, such as if someone sincerely believes something to be true even though it is not.
CPU times: user 65.6 ms, sys: 15.3 ms, total: 80.9 ms
Wall time: 2.72 s


In [13]:
%%time

print(llm.invoke("Is is possible that something false can be also true?"))



There is no definitive answer to this question since it presupposes a great deal about the nature of truth itself, which is a matter of considerable philosophical debate. It is possible, however, to construct scenarios in which something could be considered true despite being false, such as if someone sincerely believes something to be true even though it is not.
CPU times: user 29.3 ms, sys: 6.21 ms, total: 35.5 ms
Wall time: 1.03 s


## `Azure Cosmos DB` 语义缓存

您可以使用此集成的 [向量数据库](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database) 进行缓存。

In [None]:
from langchain_community.cache import AzureCosmosDBSemanticCache
from langchain_community.vectorstores.azure_cosmos_db import (
    CosmosDBSimilarityType,
    CosmosDBVectorSearchType,
)
from langchain_openai import OpenAIEmbeddings

# 阅读有关 Azure CosmosDB Mongo vCore 向量搜索的更多信息 https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search

NAMESPACE = "langchain_test_db.langchain_test_collection"
CONNECTION_STRING = (
    "请提供您的 azure cosmos mongo vCore 向量数据库连接字符串"
)

DB_NAME, COLLECTION_NAME = NAMESPACE.split(".")

# 这些参数的默认值
num_lists = 3
dimensions = 1536
similarity_algorithm = CosmosDBSimilarityType.COS
kind = CosmosDBVectorSearchType.VECTOR_IVF
m = 16
ef_construction = 64
ef_search = 40
score_threshold = 0.9
application_name = "LANGCHAIN_CACHING_PYTHON"


set_llm_cache(
    AzureCosmosDBSemanticCache(
        cosmosdb_connection_string=CONNECTION_STRING,
        cosmosdb_client=None,
        embedding=OpenAIEmbeddings(),
        database_name=DB_NAME,
        collection_name=COLLECTION_NAME,
        num_lists=num_lists,
        similarity=similarity_algorithm,
        kind=kind,
        dimensions=dimensions,
        m=m,
        ef_construction=ef_construction,
        ef_search=ef_search,
        score_threshold=score_threshold,
        application_name=application_name,
    )
)

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 45.6 ms, sys: 19.7 ms, total: 65.3 ms
Wall time: 2.29 s


'\n\nWhy was the math book sad? Because it had too many problems.'

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
llm.invoke("Tell me a joke")

CPU times: user 9.61 ms, sys: 3.42 ms, total: 13 ms
Wall time: 474 ms


'\n\nWhy was the math book sad? Because it had too many problems.'

## `Azure Cosmos DB NoSql` 语义缓存

您可以使用此集成的 [向量数据库](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database) 进行缓存。

In [None]:
from typing import Any, Dict

from azure.cosmos import CosmosClient, PartitionKey
from langchain_community.cache import AzureCosmosDBNoSqlSemanticCache
from langchain_openai import OpenAIEmbeddings

HOST = "COSMOS_DB_URI"
KEY = "COSMOS_DB_KEY"

cosmos_client = CosmosClient(HOST, KEY)


def get_vector_indexing_policy() -> dict:
    return {
        "indexingMode": "consistent",
        "includedPaths": [{"path": "/*"}],
        "excludedPaths": [{"path": '/"_etag"/?'}],
        "vectorIndexes": [{"path": "/embedding", "type": "diskANN"}],
    }


def get_vector_embedding_policy() -> dict:
    return {
        "vectorEmbeddings": [
            {
                "path": "/embedding",
                "dataType": "float32",
                "dimensions": 1536,
                "distanceFunction": "cosine",
            }
        ]
    }


cosmos_container_properties_test = {"partition_key": PartitionKey(path="/id")}
cosmos_database_properties_test: Dict[str, Any] = {}

set_llm_cache(
    AzureCosmosDBNoSqlSemanticCache(
        cosmos_client=cosmos_client,
        embedding=OpenAIEmbeddings(),
        vector_embedding_policy=get_vector_embedding_policy(),
        indexing_policy=get_vector_indexing_policy(),
        cosmos_container_properties=cosmos_container_properties_test,
        cosmos_database_properties=cosmos_database_properties_test,
    )
)

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 374 ms, sys: 34.2 ms, total: 408 ms
Wall time: 3.15 s


"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
llm.invoke("Tell me a joke")

CPU times: user 17.7 ms, sys: 2.88 ms, total: 20.6 ms
Wall time: 373 ms


"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

## `Elasticsearch` 缓存

一个使用 Elasticsearch 的 LLM 缓存层。

首先安装与 Elasticsearch 的 LangChain 集成。

In [None]:
%pip install -qU langchain-elasticsearch

### 标准缓存

使用类 `ElasticsearchCache`。

简单示例:

In [None]:
from langchain.globals import set_llm_cache
from langchain_elasticsearch import ElasticsearchCache

set_llm_cache(
    ElasticsearchCache(
        es_url="http://localhost:9200",
        index_name="llm-chat-cache",
        metadata={"project": "my_chatgpt_project"},
    )
)

`index_name` 参数也可以接受别名。这允许使用
[ILM: 管理索引生命周期](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html)
我们建议考虑用于管理保留和控制缓存增长。

查看类文档字符串以了解所有参数。

### 索引生成的文本

默认情况下,缓存的数据不会被搜索。
开发人员可以自定义 Elasticsearch 文档的构建以添加索引文本字段,
例如将 LLM 生成的文本放入其中。

这可以通过子类化结束覆盖方法来完成。
新的缓存类也可以应用于预先存在的缓存索引:

In [None]:
import json
from typing import Any, Dict, List

from langchain.globals import set_llm_cache
from langchain_core.caches import RETURN_VAL_TYPE
from langchain_elasticsearch import ElasticsearchCache


class SearchableElasticsearchCache(ElasticsearchCache):
    @property
    def mapping(self) -> Dict[str, Any]:
        mapping = super().mapping
        mapping["mappings"]["properties"]["parsed_llm_output"] = {
            "type": "text",
            "analyzer": "english",
        }
        return mapping

    def build_document(
        self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE
    ) -> Dict[str, Any]:
        body = super().build_document(prompt, llm_string, return_val)
        body["parsed_llm_output"] = self._parse_output(body["llm_output"])
        return body

    @staticmethod
    def _parse_output(data: List[str]) -> List[str]:
        return [
            json.loads(output)["kwargs"]["message"]["kwargs"]["content"]
            for output in data
        ]


set_llm_cache(
    SearchableElasticsearchCache(
        es_url="http://localhost:9200", index_name="llm-chat-cache"
    )
)

在覆盖映射和文档构建时,
请仅进行附加修改,保持基本映射不变。

### 嵌入缓存

用于缓存嵌入的 Elasticsearch 存储。

In [None]:
from langchain_elasticsearch import ElasticsearchEmbeddingsCache

## LLM 特定的可选缓存

您还可以为特定的 LLM 关闭缓存。在下面的示例中,即使启用了全局缓存,我们也为特定的 LLM 关闭了缓存。

In [13]:
llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2, cache=False)

In [14]:
%%time
llm.invoke("Tell me a joke")

CPU times: user 5.8 ms, sys: 2.71 ms, total: 8.51 ms
Wall time: 745 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

In [15]:
%%time
llm.invoke("Tell me a joke")

CPU times: user 4.91 ms, sys: 2.64 ms, total: 7.55 ms
Wall time: 623 ms


'\n\nTwo guys stole a calendar. They got six months each.'

## 链中的可选缓存

你也可以在链中关闭特定节点的缓存。注意由于某些接口的原因,通常更容易先构建链,然后再编辑 LLM。

作为示例,我们将加载一个摘要生成的 map-reduce 链。我们将缓存映射步骤的结果,但不缓存合并步骤。

In [10]:
llm = OpenAI(model="gpt-3.5-turbo-instruct")
no_cache_llm = OpenAI(model="gpt-3.5-turbo-instruct", cache=False)

In [11]:
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter()

In [14]:
with open("../how_to/state_of_the_union.txt") as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)

In [15]:
from langchain_core.documents import Document

docs = [Document(page_content=t) for t in texts[:3]]
from langchain.chains.summarize import load_summarize_chain

In [16]:
chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)

In [17]:
%%time
chain.invoke(docs)

CPU times: user 176 ms, sys: 23.2 ms, total: 199 ms
Wall time: 4.42 s


{'input_documents': [Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. 

当我们再次运行时,可以看到它运行得快了很多,但最终答案不同。这是因为在映射步骤使用了缓存,但在合并步骤没有使用缓存。

In [19]:
%%time
chain.invoke(docs)

CPU times: user 7 ms, sys: 1.94 ms, total: 8.94 ms
Wall time: 1.06 s


{'input_documents': [Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. 

In [20]:
!rm .langchain.db sqlite.db

rm: sqlite.db: No such file or directory


## `OpenSearch` 语义缓存
使用 [OpenSearch](https://python.langchain.com/docs/integrations/vectorstores/opensearch/) 作为语义缓存来缓存提示和响应并根据语义相似性评估命中。

In [10]:
from langchain_community.cache import OpenSearchSemanticCache
from langchain_openai import OpenAIEmbeddings

set_llm_cache(
    OpenSearchSemanticCache(
        opensearch_url="http://localhost:9200", embedding=OpenAIEmbeddings()
    )
)

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 39.4 ms, sys: 11.8 ms, total: 51.2 ms
Wall time: 1.55 s


"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything."

In [None]:
%%time
# 第二次调用时,虽然不是直接命中,但问题与原始问题在语义上相似,
# 因此使用了缓存结果!
llm.invoke("Tell me one joke")

CPU times: user 4.66 ms, sys: 1.1 ms, total: 5.76 ms
Wall time: 113 ms


"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything."

## `SingleStoreDB` 语义缓存

您可以使用 [SingleStore](https://python.langchain.com/docs/integrations/vectorstores/singlestore/) 作为语义缓存来缓存提示和响应。

In [None]:
%pip install -qU langchain-singlestore

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_singlestore.cache import SingleStoreSemanticCache

set_llm_cache(
    SingleStoreSemanticCache(
        embedding=OpenAIEmbeddings(),
        host="root:pass@localhost:3306/db",
    )
)

## `Memcached` 缓存
您可以使用 [Memcached](https://www.memcached.org/) 作为缓存通过 [pymemcache](https://github.com/pinterest/pymemcache) 缓存提示和响应。

此缓存需要安装 pymemcache 依赖项:

In [1]:
%pip install -qU pymemcache

In [1]:
from langchain_community.cache import MemcachedCache
from pymemcache.client.base import Client

set_llm_cache(MemcachedCache(Client("localhost")))

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 32.8 ms, sys: 21 ms, total: 53.8 ms
Wall time: 343 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
llm.invoke("Tell me a joke")

CPU times: user 2.31 ms, sys: 850 µs, total: 3.16 ms
Wall time: 6.43 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'

## `Couchbase` 缓存

使用 [Couchbase](https://couchbase.com/) 作为提示和响应的缓存。

### 标准缓存

标准缓存会查找用户提示的精确匹配。

In [None]:
%pip install -qU langchain_couchbase

In [None]:
# 创建 Couchbase 连接对象
from datetime import timedelta

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from langchain_couchbase.cache import CouchbaseCache
from langchain_openai import ChatOpenAI

COUCHBASE_CONNECTION_STRING = (
    "couchbase://localhost"  # 或 "couchbases://localhost" 如果使用 TLS
)
DB_USERNAME = "Administrator"
DB_PASSWORD = "Password"

auth = PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)
options = ClusterOptions(auth)
cluster = Cluster(COUCHBASE_CONNECTION_STRING, options)

# 等待集群准备好使用。
cluster.wait_until_ready(timedelta(seconds=5))

In [None]:
# 指定存储缓存文档的桶、范围和集合
BUCKET_NAME = "langchain-testing"
SCOPE_NAME = "_default"
COLLECTION_NAME = "_default"

set_llm_cache(
    CouchbaseCache(
        cluster=cluster,
        bucket_name=BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
    )
)

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
llm.invoke("Tell me a joke")

CPU times: user 22.2 ms, sys: 14 ms, total: 36.2 ms
Wall time: 938 ms


"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
llm.invoke("Tell me a joke")

CPU times: user 25.9 ms, sys: 15.3 ms, total: 41.3 ms
Wall time: 144 ms


"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything."

#### 缓存条目的生存时间 (TTL)
缓存的文档可以在指定时间后自动删除,通过在缓存初始化时指定 `ttl` 参数。

In [8]:
from datetime import timedelta

set_llm_cache(
    CouchbaseCache(
        cluster=cluster,
        bucket_name=BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        ttl=timedelta(minutes=5),
    )
)

### 语义缓存
语义缓存允许用户基于用户输入与之前缓存输入之间的语义相似性检索缓存的提示。在底层它使用 Couchbase 作为缓存和向量存储。这需要定义适当的向量搜索索引才能工作。请查看使用示例了解如何设置索引。

In [None]:
# 创建 Couchbase 连接对象
from datetime import timedelta

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from langchain_couchbase.cache import CouchbaseSemanticCache
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

COUCHBASE_CONNECTION_STRING = (
    "couchbase://localhost"  # 或 "couchbases://localhost" 如果使用 TLS
)
DB_USERNAME = "Administrator"
DB_PASSWORD = "Password"

auth = PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)
options = ClusterOptions(auth)
cluster = Cluster(COUCHBASE_CONNECTION_STRING, options)

# 等待集群准备好使用。
cluster.wait_until_ready(timedelta(seconds=5))

注意事项:
- 语义缓存的搜索索引需要在使用语义缓存之前定义。
- 语义缓存中的可选参数 `score_threshold`,您可以使用它来调整语义搜索的结果。

#### 索引到全文搜索服务

如何将索引导入全文搜索服务?
 - [Couchbase Server](https://docs.couchbase.com/server/current/search/import-search-index.html)
     - 点击搜索 -> 添加索引 -> 导入
     - 在导入屏幕中复制以下索引定义
     - 点击创建索引以创建索引。
 - [Couchbase Capella](https://docs.couchbase.com/cloud/search/import-search-index.html)
     - 将索引定义复制到新文件 `index.json`
     - 使用文档中的说明在 Capella 中导入文件。
     - 点击创建索引以创建索引。

**向量搜索的示例索引:**

  ```
  {
    "type": "fulltext-index",
    "name": "langchain-testing._default.semantic-cache-index",
    "sourceType": "gocbcore",
    "sourceName": "langchain-testing",
    "planParams": {
      "maxPartitionsPerPIndex": 1024,
      "indexPartitions": 16
    },
    "params": {
      "doc_config": {
        "docid_prefix_delim": "",
        "docid_regexp": "",
        "mode": "scope.collection.type_field",
        "type_field": "type"
      },
      "mapping": {
        "analysis": {},
        "default_analyzer": "standard",
        "default_datetime_parser": "dateTimeOptional",
        "default_field": "_all",
        "default_mapping": {
          "dynamic": true,
          "enabled": false
        },
        "default_type": "_default",
        "docvalues_dynamic": false,
        "index_dynamic": true,
        "store_dynamic": true,
        "type_field": "_type",
        "types": {
          "_default.semantic-cache": {
            "dynamic": false,
            "enabled": true,
            "properties": {
              "embedding": {
                "dynamic": false,
                "enabled": true,
                "fields": [
                  {
                    "dims": 1536,
                    "index": true,
                    "name": "embedding",
                    "similarity": "dot_product",
                    "type": "vector",
                    "vector_index_optimized_for": "recall"
                  }
                ]
              },
              "metadata": {
                "dynamic": true,
                "enabled": true
              },
              "text": {
                "dynamic": false,
                "enabled": true,
                "fields": [
                  {
                    "index": true,
                    "name": "text",
                    "store": true,
                    "type": "text"
                  }
                ]
              }
            }
          }
        }
      },
      "store": {
        "indexType": "scorch",
        "segmentVersion": 16
      }
    },
    "sourceParams": {}
  }
  ```

In [10]:
BUCKET_NAME = "langchain-testing"
SCOPE_NAME = "_default"
COLLECTION_NAME = "semantic-cache"
INDEX_NAME = "semantic-cache-index"
embeddings = OpenAIEmbeddings()

cache = CouchbaseSemanticCache(
    cluster=cluster,
    embedding=embeddings,
    bucket_name=BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    index_name=INDEX_NAME,
    score_threshold=0.8,
)

set_llm_cache(cache)

In [None]:
%%time
# 第一次调用,结果还未缓存,所以会花费较长时间
print(llm.invoke("How long do dogs live?"))



The average lifespan of a dog is around 12 years, but this can vary depending on the breed, size, and overall health of the individual dog. Some smaller breeds may live longer, while larger breeds may have shorter lifespans. Proper care, diet, and exercise can also play a role in extending a dog's lifespan.
CPU times: user 826 ms, sys: 2.46 s, total: 3.28 s
Wall time: 2.87 s


In [None]:
%%time
# 第二次调用命中缓存,所以会快很多
print(llm.invoke("What is the expected lifespan of a dog?"))



The average lifespan of a dog is around 12 years, but this can vary depending on the breed, size, and overall health of the individual dog. Some smaller breeds may live longer, while larger breeds may have shorter lifespans. Proper care, diet, and exercise can also play a role in extending a dog's lifespan.
CPU times: user 9.82 ms, sys: 2.61 ms, total: 12.4 ms
Wall time: 311 ms


#### 缓存条目的生存时间 (TTL)

缓存的文档可以在指定时间后自动删除,通过在缓存初始化时指定 `ttl` 参数。

In [13]:
from datetime import timedelta

set_llm_cache(
    CouchbaseSemanticCache(
        cluster=cluster,
        embedding=embeddings,
        bucket_name=BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        index_name=INDEX_NAME,
        score_threshold=0.8,
        ttl=timedelta(minutes=5),
    )
)

## 缓存类:摘要表

**缓存** 类通过继承 [BaseCache](https://python.langchain.com/api_reference/core/caches/langchain_core.caches.BaseCache.html) 类实现。

此表列出了所有派生类及其 API 参考链接。


| 命名空间 | 类 🔻 |
|------------|---------|
| langchain_astradb.cache | [AstraDBCache](https://python.langchain.com/api_reference/astradb/cache/langchain_astradb.cache.AstraDBCache.html) |
| langchain_astradb.cache | [AstraDBSemanticCache](https://python.langchain.com/api_reference/astradb/cache/langchain_astradb.cache.AstraDBSemanticCache.html) |
| langchain_community.cache | [AstraDBCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.AstraDBCache.html) (自 `langchain-community==0.0.28` 起已弃用) |
| langchain_community.cache | [AstraDBSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.AstraDBSemanticCache.html) (自 `langchain-community==0.0.28` 起已弃用) |
| langchain_community.cache | [AzureCosmosDBSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.AzureCosmosDBSemanticCache.html) |
| langchain_community.cache | [CassandraCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.CassandraCache.html) |
| langchain_community.cache | [CassandraSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.CassandraSemanticCache.html) |
| langchain_couchbase.cache | [CouchbaseCache](https://python.langchain.com/api_reference/couchbase/cache/langchain_couchbase.cache.CouchbaseCache.html) |
| langchain_couchbase.cache | [CouchbaseSemanticCache](https://python.langchain.com/api_reference/couchbase/cache/langchain_couchbase.cache.CouchbaseSemanticCache.html) |
| langchain_elasticsearch.cache | [ElasticsearchCache](https://python.langchain.com/api_reference/elasticsearch/cache/langchain_elasticsearch.cache.AsyncElasticsearchCache.html) |
| langchain_elasticsearch.cache | [ElasticsearchEmbeddingsCache](https://python.langchain.com/api_reference/elasticsearch/cache/langchain_elasticsearch.cache.AsyncElasticsearchEmbeddingsCache.html) |
| langchain_community.cache | [GPTCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.GPTCache.html) |
| langchain_core.caches | [InMemoryCache](https://python.langchain.com/api_reference/core/caches/langchain_core.caches.InMemoryCache.html) |
| langchain_community.cache | [InMemoryCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.InMemoryCache.html) |
| langchain_community.cache | [MomentoCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.MomentoCache.html) |
| langchain_mongodb.cache | [MongoDBAtlasSemanticCache](https://python.langchain.com/api_reference/mongodb/cache/langchain_mongodb.cache.MongoDBAtlasSemanticCache.html) |
| langchain_mongodb.cache | [MongoDBCache](https://python.langchain.com/api_reference/mongodb/cache/langchain_mongodb.cache.MongoDBCache.html) |
| langchain_community.cache | [OpenSearchSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.OpenSearchSemanticCache.html) |
| langchain_community.cache | [RedisSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.RedisSemanticCache.html) |
| langchain_community.cache | [SingleStoreDBSemanticCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.SingleStoreDBSemanticCache.html) |
| langchain_community.cache | [SQLAlchemyCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.SQLAlchemyCache.html) |
| langchain_community.cache | [SQLAlchemyMd5Cache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.SQLAlchemyMd5Cache.html) |
| langchain_community.cache | [UpstashRedisCache](https://python.langchain.com/api_reference/community/cache/langchain_community.cache.UpstashRedisCache.html) |
