## Memory types in LangChain
This notebook showcase different memory types in LangChain and how we can use them in a chatbot.  
We work with Amazon Bedrock and the FM Claude V2 from Anthropic in this example.  

Start by importing the libraries for LangChain and boto3

In [None]:
import boto3
from langchain.llms.bedrock import Bedrock
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate

#### Setup Bedrock client and model params
Next we will create out boto3.client and Bedrock objects together with some model parameters.

In [None]:
bedrock_runtime = boto3.client(
        service_name="bedrock-runtime",
        region_name="us-east-1",
)

model_kwargs = {
    "max_tokens_to_sample": 300,
    "temperature": 0.5,
    "top_p": 0.5
}

chat = Bedrock(
    credentials_profile_name="bedrock",
    model_id="anthropic.claude-v2",
    model_kwargs=model_kwargs,
    client=bedrock_runtime,
)

#### Prompt Template
We will then create our prompt template like this

In [None]:
prompt_template = """Assistant: The following is a friendly conversation between a knowledgeable helpful assistant and a customer.
The assistant is talkative and provides lots of specific details from it's context.

Conversation history:
{history}

Current conversation:
Human: {input}
Assistant:"""

PROMPT = PromptTemplate(
        input_variables=["history", "input"], template=prompt_template
    )

### Memory types
LangChain offers a wide range of different memory options, the most common, "ConversationBufferMemory" will store all previous inputs and outputs in a list.  
We configure the memory like this, please note that when we work with Claude V2 we have to change our ai_prefix to "Assistant" and the human_prefix to "Human".

In [None]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(ai_prefix="Assistant", human_prefix="Human")

#### Conversation chain
Then we configure our ConversationChain to use our prompt, llm and memory like this:

In [None]:
conversation = ConversationChain(
    prompt=PROMPT,
    llm=chat,
    verbose=True,
    memory=memory,
)

#### Query
We can now test to query our chatbot

In [None]:
conversation.predict(input="Hi there!")

In [None]:
conversation.predict(input="Can cats fly?")

#### ConversationBufferWindowMemory
What if we dont want to return the hole conversation history, but only the K most recent interactions?  
ConversationBufferWindowMemory comes to the rescue!

In [None]:
from langchain.memory import ConversationBufferWindowMemory
memory = None   # Reset memory
conversation = None # Reset conversation
memory = ConversationBufferWindowMemory(k=1, ai_prefix="Assistant", human_prefix="Human:")

conversation = ConversationChain(
    prompt=PROMPT,
    llm=chat,
    verbose=True,
    memory=memory,
)

conversation.predict(input="Hi there!")
conversation.predict(input="Can cats fly?")
conversation.predict(input="Cool, can you tell me more about cats?")
conversation.predict(input="Have we greeted each other yet?")

### Keep the token usage down, ConversationSummaryMemory
As we can see only the last previous message is included in the history.  
What if we want to give it a longer memory, but still want to keep our token usage down?  
Well thats when ConversationSummaryMemory comes in handy.  
It will summarize the previous conversation and add it to the prompt.

The summarization is handled by the LLM it self so we have to add it as an attribute to ConversationSummaryMemory object.

In [None]:
from langchain.memory import ConversationSummaryMemory

memory = None   # Reset memory
conversation = None # Reset conversation
memory = ConversationSummaryMemory(llm=chat, human_prefix="Human", ai_prefix="Assistant")

conversation = ConversationChain(
    prompt=PROMPT,
    llm=chat,
    verbose=False,
    memory=memory,
)

conversation.predict(input="Hi there!")
conversation.predict(input="Can cats fly?")
conversation.predict(input="Cool, can you tell me more about cats?")
conversation.predict(input="Have we greeted each other yet?")

Please note that we set the verbose attribute to false this time. That is because LangChain utilize a template where the AI prefix has been hardcoded to "AI", and this throws a lot of warnings in the console.
The end result is fine though.

#### ConversationSummaryBufferMemory
Here we will mix the summary memory with the buffer memory, with the attribute "max_token_limit" we specify how many tokens we want to keep in the buffer then we summarize the older conversations.

In [None]:
from langchain.memory import ConversationSummaryBufferMemory

memory = None   # Reset memory
conversation = None # Reset conversation
memory = ConversationSummaryBufferMemory(llm=chat, max_token_limit=25, human_prefix="Human", ai_prefix="Assistant")

conversation = ConversationChain(
    prompt=PROMPT,
    llm=chat,
    verbose=False,
    memory=memory,
)

conversation.predict(input="Hi there!")
conversation.predict(input="Can cats fly?")
conversation.predict(input="Cool, can you tell me more about cats?")
conversation.predict(input="Have we greeted each other yet?")