# [如何使用 LangChain 进行记忆管理](https://blog.imkasen.com/langchain-memory-management/)

聊天机器人的一个主要特点是能使用以前的对话内容作为上下文。这种状态管理有多种形式，包括：

简单地将以前的信息塞进聊天模型提示中。
如上，但会修剪旧信息，以减少模型需要处理的干扰信息量。
更复杂的修改，如为长对话合成摘要。

### Setup

In [1]:
from langchain_openai import ChatOpenAI
chat = ChatOpenAI()

### Message Passing
最简单的记忆形式就是将聊天记录信息传递到一个链中。下面是一个例子：

In [2]:
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | chat

response = chain.invoke(
    {
        "messages": [
            HumanMessage(
                content="Translate this sentence from English to French: I love programming."
            ),
            AIMessage(content="J'adore la programmation."),
            HumanMessage(content="What did you just say?"),
        ],
    }
)
response


AIMessage(content='I said "J\'adore la programmation," which means "I love programming" in French.', response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 61, 'total_tokens': 82}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-d653915c-415d-4df3-bd58-bbb661a9e2ad-0', usage_metadata={'input_tokens': 61, 'output_tokens': 21, 'total_tokens': 82})

我们可以看到，通过将之前的对话传递到链中，聊天机器人可以将其作为回答问题的上下文。这就是聊天机器人记忆的基本概念。

### Chat Message History
直接以数组形式存储和传递消息完全没问题，但我们也可以使用 LangChain 内置的消息历史记录类来存储和加载消息。

In [3]:
from langchain.memory import ChatMessageHistory

demo_ephemeral_chat_history = ChatMessageHistory()
demo_ephemeral_chat_history.add_user_message(
    "Translate this sentence from English to French: I love programming."
)
demo_ephemeral_chat_history.add_ai_message("J'adore la programmation.")

demo_ephemeral_chat_history


InMemoryChatMessageHistory(messages=[HumanMessage(content='Translate this sentence from English to French: I love programming.'), AIMessage(content="J'adore la programmation.")])

我们可以直接用它来为我们的链存储多轮对话：

In [4]:
demo_ephemeral_chat_history = ChatMessageHistory()

input1 = "Translate this sentence from English to French: I love programming."
demo_ephemeral_chat_history.add_user_message(input1)

response = chain.invoke(
    {
        "messages": demo_ephemeral_chat_history.messages,
    }
)
response


AIMessage(content='Je adore la programmation.', response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 39, 'total_tokens': 45}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-f618838c-7064-4801-a99f-92e0be5b30e0-0', usage_metadata={'input_tokens': 39, 'output_tokens': 6, 'total_tokens': 45})

## 记忆类型
LangChain 包含了以下几种记忆类型，通常这些类型会与 LLMs 结合使用。

### Conversation Buffer
ConversationBufferMemory 是一种极其简单的记忆形式，它所做的就是把聊天消息保存在内存中，并将这些消息输入到提示模板。

In [9]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")

memory


ConversationBufferMemory(chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='hi!'), AIMessage(content="what's up?")]))

另一种使用方式：

In [10]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({"input": "hi"}, {"output": "whats up"})

memory.load_memory_variables({})


{'history': 'Human: hi\nAI: whats up'}

在这个示例中，你可以注意到 load_memory_variables 返回了一个名为 history 的键值。这意味着你的链（以及可能的输入提示词）可能会期望一个名为 history 的输入。一般而言，你可以通过在记忆类中设置参数来管理这个变量。例如，如果你希望记忆变量在 chat_history 关键字中返回，你可以这样做：

In [16]:
memory = ConversationBufferMemory(memory_key="chat_history")
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")

# memory.load_memory_variables({})
memory.load_memory_variables({})["chat_history"]


"Human: hi!\nAI: what's up?"

另外，最常见的一种使用记忆的方式是返回聊天信息的列表。这些信息可以整合成一个字符串返回（当要传入 LLMs 时 这种方式很有用）或者作为一个聊天消息的列表返回（在传入 ChatModels 时这种方式很有用）。

默认情况下，它们以一整串字符串的方式返回。为了以消息列表的形式返回，你可以设置 `return_messages=True`。

In [19]:
memory = ConversationBufferMemory(return_messages=True)

memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")

# memory.load_memory_variables({})
memory.load_memory_variables({})["history"]


[HumanMessage(content='hi!'), AIMessage(content="what's up?")]

以下是与链结合使用的示例：

In [23]:
# from langchain_openai import OpenAI
from langchain_openai import ChatOpenAI
from langchain.chains.conversation.base import ConversationChain

llm = ChatOpenAI(model="gpt-3.5-turbo")

conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)

conversation.predict(input="Hi there!")




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi there!
AI:[0m

[1m> Finished chain.[0m


'Hello! How can I assist you today?'

### Conversation Buffer Window
ConversationBufferWindowMemory 跟踪并保存随时间发展的对话互动列表。它只保留最近的 K 次对话记录。这种做法有助于创建一个包含最新互动记录的滑动视窗，可以有效地避免缓存变得过大

In [24]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

memory.load_memory_variables({})


{'history': 'Human: not much you\nAI: not much'}

In [25]:
memory = ConversationBufferWindowMemory(k=1, return_messages=True)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

memory.load_memory_variables({})


{'history': [HumanMessage(content='not much you'),
  AIMessage(content='not much')]}

### Entity
实体记忆在对话中记住了关于特定实体的既定事实。它提取关于实体的信息（使用一个 LLM）并且随着时间的推移建立起关于该实体的知识（也使用一个 LLM）。

In [8]:
from langchain.memory import ConversationEntityMemory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo")

memory = ConversationEntityMemory(llm=llm)
_input = {"input": "Deven & Sam are working on a hackathon project"}
memory.load_memory_variables(_input)
memory


ConversationEntityMemory(llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x122d45c30>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x122d57340>, openai_api_key=SecretStr('**********'), openai_proxy=''), entity_cache=['Deven', 'Sam'])

In [7]:
ConversationEntityMemory(llm=llm, entity_cache=['Deven', 'Sam'])


ConversationEntityMemory(llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x123b5d6c0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x123b1a0b0>, openai_api_key=SecretStr('**********'), openai_proxy=''), entity_cache=['Deven', 'Sam'])

In [9]:
memory.save_context(
    _input,
    {"output": " That sounds like a great project! What kind of project are they working on?"}
)
memory


ConversationEntityMemory(chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='Deven & Sam are working on a hackathon project'), AIMessage(content=' That sounds like a great project! What kind of project are they working on?')]), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x122d45c30>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x122d57340>, openai_api_key=SecretStr('**********'), openai_proxy=''), entity_cache=['Deven', 'Sam'], entity_store=InMemoryEntityStore(store={'Deven': 'Deven is working on a hackathon project with Sam.', 'Sam': 'Sam is working on a hackathon project with Deven.'}))

In [9]:
from langchain.memory.entity import InMemoryEntityStore
from langchain_core.messages import AIMessage, HumanMessage
from langchain.memory import ChatMessageHistory

llm = ChatOpenAI(model="gpt-3.5-turbo")

ConversationEntityMemory(
    chat_memory = ChatMessageHistory(messages=[HumanMessage(content='Deven & Sam are working on a hackathon project'), AIMessage(content=' That sounds like a great project! What kind of project are they working on?')]),
    llm=llm, entity_cache=['Deven', 'Sam'], entity_store=InMemoryEntityStore(store={'Deven': 'Updated summary: Deven is working on a hackathon project with Sam.', 'Sam': 'Updated summary: Sam is working on a hackathon project with Deven.'}))


ConversationEntityMemory(chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='Deven & Sam are working on a hackathon project'), AIMessage(content=' That sounds like a great project! What kind of project are they working on?')]), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x11fecfb50>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x128105960>, openai_api_key=SecretStr('**********'), openai_proxy=''), entity_cache=['Deven', 'Sam'], entity_store=InMemoryEntityStore(store={'Deven': 'Updated summary: Deven is working on a hackathon project with Sam.', 'Sam': 'Updated summary: Sam is working on a hackathon project with Deven.'}))

In [10]:
memory.load_memory_variables({"input": 'who is Sam'})

{'history': 'Human: Deven & Sam are working on a hackathon project\nAI:  That sounds like a great project! What kind of project are they working on?',
 'entities': {'Sam': 'Sam is working on a hackathon project with Deven.'}}

In [11]:
memory = ConversationEntityMemory(llm=llm, return_messages=True)

_input = {"input": "Deven & Sam are working on a hackathon project"}
memory.load_memory_variables(_input)
memory.save_context(
    _input,
    {"output": " That sounds like a great project! What kind of project are they working on?"}
)
memory.load_memory_variables({"input": 'who is Sam'})


{'history': [HumanMessage(content='Deven & Sam are working on a hackathon project'),
  AIMessage(content=' That sounds like a great project! What kind of project are they working on?')],
 'entities': {'Deven': 'Deven is working on a hackathon project with Sam.',
  'Sam': 'Sam is working on a hackathon project with Deven.'}}

### Conversation Knowledge Graph
这种类型的记忆使用知识图谱来重建记忆。

In [12]:
from langchain.memory import ConversationKGMemory

memory = ConversationKGMemory(llm=llm)
memory.save_context({"input": "say hi to sam"}, {"output": "who is sam"})
memory.save_context({"input": "sam is a friend"}, {"output": "okay"})

memory.load_memory_variables({"input": "who is sam"})


{'history': 'On sam: sam is a friend.'}

我们也可以更加模块化地从一条新消息中获取当前实体（将使用之前的消息作为上下文）。

In [13]:
memory.get_current_entities("what's Sams favorite color?")

['Sam']

我们也可以更加模块化地从一条新消息中获取知识三元组（将使用之前的消息作为上下文）

In [15]:
memory.get_knowledge_triplets("her favorite color is red")

[KnowledgeTriple(subject='sam', predicate='favorite color is', object_='red')]

### Conversation Summary
现在让我们来看一个略微复杂的记忆类型 ConversationSummaryMemory。这种记忆类型会随着时间的推移创建对话的总结。这对于压缩对话中随时间积累的信息是有用的。会话总结记忆在对话发生时总结内容，并将当前的总结存储在记忆中。这个记忆之后可以用来将到目前为止的对话总结注入到一个提示词/链中。这种记忆对长时间的对话最有用，因为如果直接在提示词中保持之前的消息历史会占用太多的 Token。

In [16]:
from langchain.memory import ConversationSummaryMemory, ChatMessageHistory

memory = ConversationSummaryMemory(llm=llm)
memory.save_context({"input": "hi"}, {"output": "whats up"})

memory.load_memory_variables({})

{'history': 'The human greets the AI with a simple "hi," and the AI responds by asking "what\'s up."'}

我们也可以直接利用 predict_new_summary 方法：

In [17]:
messages = memory.chat_memory.messages
messages

[HumanMessage(content='hi'), AIMessage(content='whats up')]

In [18]:
previous_summary = ""
memory.predict_new_summary(messages, previous_summary)

'The human greets the AI with "hi" and the AI responds with "what\'s up."'

你可以轻松地用 ChatMessageHistory 初始化 ConversationSummaryMemory。在加载时，会自动生成一个总结。

In [19]:
history = ChatMessageHistory()
history.add_user_message("hi")
history.add_ai_message("hi there!")

memory = ConversationSummaryMemory.from_messages(
    llm=llm,
    chat_memory=history,
    return_messages=True
)
memory

ConversationSummaryMemory(llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x11fecfb50>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x128105960>, openai_api_key=SecretStr('**********'), openai_proxy=''), chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='hi there!')]), return_messages=True, buffer='The human greets the AI with a simple "hi", and the AI responds with a friendly "hi there!".')

In [20]:
from langchain_core.chat_history import InMemoryChatMessageHistory

llm = ChatOpenAI(model="gpt-3.5-turbo")

ConversationSummaryMemory(llm=llm, chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='hi there!')]), return_messages=True, buffer='Current summary: \nThe human greeted the AI. The AI returned the greeting. \n')


ConversationSummaryMemory(llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x1287e11b0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x128800fd0>, openai_api_key=SecretStr('**********'), openai_proxy=''), chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='hi there!')]), return_messages=True, buffer='Current summary: \nThe human greeted the AI. The AI returned the greeting. \n')

In [21]:
memory.buffer


'The human greets the AI with a simple "hi", and the AI responds with a friendly "hi there!".'

你可以使用之前生成的总结来加速初始化，并通过直接初始化来避免重新生成总结。

In [23]:
memory = ConversationSummaryMemory(
    llm=llm,
    buffer="The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.",
    chat_memory=history,
    return_messages=True
)
memory

ConversationSummaryMemory(llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x1287e11b0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x128800fd0>, openai_api_key=SecretStr('**********'), openai_proxy=''), chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='hi there!')]), return_messages=True, buffer='The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.')

In [24]:
from langchain_core.chat_history import InMemoryChatMessageHistory

llm = ChatOpenAI(model="gpt-3.5-turbo")

ConversationSummaryMemory(llm=llm, chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='hi there!')]), return_messages=True, buffer='Current summary: \nThe human greeted the AI. The AI returned the greeting. \n')


ConversationSummaryMemory(llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x11fe66a40>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x1287f4ac0>, openai_api_key=SecretStr('**********'), openai_proxy=''), chat_memory=InMemoryChatMessageHistory(messages=[HumanMessage(content='hi'), AIMessage(content='hi there!')]), return_messages=True, buffer='Current summary: \nThe human greeted the AI. The AI returned the greeting. \n')

### Conversation Token Buffer
ConversationTokenBufferMemory 在内存中保持了一段最近互动的缓存，并使用 Token 的长度而不是互动的数量来决定何时清除互动。

In [27]:
from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

memory.load_memory_variables({})


{'history': 'AI: not much'}

In [28]:
from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

memory.load_memory_variables({})


{'history': 'Human: hi\nAI: whats up\nHuman: not much you\nAI: not much'}

### Conversation Summary Buffer
ConversationSummaryBufferMemory 融合了两种方法。它会在内存中保留最近交互的一个缓存，并不是简单地丢弃旧的交互记录，而是将它们汇总成一份摘要，然后同时使用缓存与摘要。此外，它根据 Token 的使用长度而非交互次数来决定什么时候从缓存中移除旧的交互信息。

In [30]:
from langchain.memory import ConversationSummaryBufferMemory
from pprint import pprint

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

pprint(memory.load_memory_variables({}))


{'history': 'System: The human greets the AI with a simple "hi." The AI '
            'responds by asking "what\'s up," and the human replies with "not '
            'much, you?"\n'
            'AI: not much'}
