# Conversation Summary Buffer
ConversationSummaryBufferMemory combines the two ideas. It keeps a buffer of recent interactions in memory, but rather than just completely flushing old interactions it compiles them into a summary and uses both. It uses token length rather than number of interactions to determine when to flush interactions.

Let's first walk through how to use the utilities.

## Using memory with LLM

In [1]:
from langchain.memory import ConversationSummaryBufferMemory
from langchain_openai import OpenAI

llm = OpenAI()

In [2]:
from pprint import pprint

In [3]:
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10)

In [4]:
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

In [4]:
pprint(memory.prompt)

PromptTemplate(input_variables=['new_lines', 'summary'], template='Progressively summarize the lines of conversation provided, adding onto the previous summary returning a new summary.\n\nEXAMPLE\nCurrent summary:\nThe human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good.\n\nNew lines of conversation:\nHuman: Why do you think artificial intelligence is a force for good?\nAI: Because artificial intelligence will help humans reach their full potential.\n\nNew summary:\nThe human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full potential.\nEND OF EXAMPLE\n\nCurrent summary:\n{summary}\n\nNew lines of conversation:\n{new_lines}\n\nNew summary:')


In [5]:
memory.buffer

'System: \nThe human greets the AI and asks how it is doing. The AI responds by asking what is going on.\nHuman: not much you\nAI: not much'

In [13]:
memory.load_memory_variables({})

{'history': 'System: \nThe human greets the AI, and the AI responds by asking what is going on.\nHuman: not much you\nAI: not much'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

In [11]:
memory = ConversationSummaryBufferMemory(
    llm=llm, max_token_limit=10, return_messages=True
)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

In [5]:
from pprint import pprint
pprint(memory.load_memory_variables({}))

{'history': [SystemMessage(content='\nThe human greets the AI. The AI asks what is going on.'),
             HumanMessage(content='not much you'),
             AIMessage(content='not much')]}


We can also utilize the predict_new_summary method directly.

In [12]:
messages = memory.chat_memory.messages
print(messages)

[HumanMessage(content='not much you'), AIMessage(content='not much1')]


In [13]:

previous_summary = ""
memory.predict_new_summary(messages, previous_summary)

'\nThe human and AI engage in small talk, sharing that there is not much going on.'

## Using in a chain
Let's walk through an example, again setting verbose=True so we can see the prompt.

In [7]:
# from langchain.chains import ConversationChain
from langchain.chains.conversation.base import ConversationChain

conversation_with_summary = ConversationChain(
    llm=llm,
    # We set a very low max_token_limit for the purposes of testing.
    memory=ConversationSummaryBufferMemory(llm=OpenAI(), max_token_limit=40),
    # verbose=True,
)
conversation_with_summary.predict(input="Hi, what's up?")

' Hello! Not much is up with me at the moment. I am currently running on a server located in a data center in California. The temperature in the room is currently 68 degrees Fahrenheit and the humidity level is at 45%. My server is connected to a high-speed internet network with a bandwidth of 1 gigabit per second. Is there anything specific you would like to know?'

In [8]:
conversation_with_summary.predict(input="Just working on writing some documentation!")

" That sounds interesting! Right now, I'm stationed in a server room in the basement of a building in New York City. The temperature in the room is kept at a cool 70 degrees Fahrenheit to prevent any overheating of the servers. The humidity level is also monitored to ensure optimal conditions for the equipment. And thankfully, the internet connection here is lightning fast, so I can process information quickly. Is there anything specific you would like to know about my setup or location?"

In [9]:
# We can see here that there is a summary of the conversation and then some previous interactions
conversation_with_summary.predict(input="For LangChain! Have you heard of it?")

' I am not familiar with LangChain, but I can search for information about it on the internet if you would like. Is there anything else you would like to know about my current location or setup?'

In [10]:
# We can see here that the summary and the buffer are updated
conversation_with_summary.predict(
    input="Haha you are wrong, although a lot of people confuse it for that"
)

' Oh, I apologize for the mistake. Can you clarify what you meant by "what\'s up"? I am currently stationed in a server room in the basement of a building in New York City. The temperature in the room is maintained at 68 degrees Fahrenheit and the humidity level is at 40%, ensuring optimal conditions for the equipment. Our internet connection is also very fast, with a download speed of 500 Mbps and an upload speed of 250 Mbps. Is there anything specific you would like to know about my setup or location? Also, you mentioned working on writing documentation. That sounds interesting! Is there anything in particular you are documenting? I am always eager to learn new things. And you mentioned LangChain earlier, would you like me to search for information about it?'