# Conversational Memory for OpenAI & Llama - LangChain #3

*A key technique to allowing chatbots to talk to us with predictions and a wealth of knowledge about our conversations is bounded by memory, however memory as issues, as we will uncover, it can be detrimental if not properly looked after, but hugely beneficial if it has...*

# **Chapter 3: Conversational Memory**

1. LangChain uses it's own `conversational` template which we will use to feed our LLMs.
2. Memory has a direct impact on `tokens`, which we will try to reduce.
3. We will go over four memory techniques, highlighting the strengths and weeknesses in our LLMs.

## *Section 1: Settup*

*In this section we will import all the libaries needed to make this program run, add our LLMs.*

We first need to import any inital libaries such as Open AI, then we need to set our API keys.

In [204]:
import openai

# OpenAI key to use the model
OPENAI_API_KEY = ""
openai.api_key = OPENAI_API_KEY

***ollama pull llama3.2:1b-instruct-fp16*** - Use in terminal to download correct the llama model we are using.

In [205]:
from langchain_ollama.chat_models import ChatOllama

# Here we can list a couple of models to choose from.
openai_model = "gpt-4o-mini"
llama_model = "llama3.2:1b-instruct-fp16"

# Choose between which model you would like to use
llm_model = openai_model

Now we need to connect the correct chat function depending on which model we are using.

In [206]:
from langchain_community.chat_models import ChatOpenAI

In [207]:
if llm_model == llama_model:
    llm = ChatOllama(temperature=0.0, model=llama_model)
    creative_llm = ChatOllama(temperature=0.9, model=llama_model)
elif llm_model == openai_model:
    llm = ChatOpenAI(temperature=0.0, model=openai_model, openai_api_key = OPENAI_API_KEY)
    creative_llm = ChatOpenAI(temperature=0.9, model=openai_model, openai_api_key = OPENAI_API_KEY)
else:
    raise ValueError("You need to apply the correct name to the model you are referring too")

We don't want to worry about warnings so we will ignore them.

In [208]:
import warnings
warnings.filterwarnings('ignore')

Use ***-uv add tiktoken*** to access this function that will count the number of tokens used for each chat request sent to our LLMs.

In [209]:
import tiktoken
from langchain.callbacks import get_openai_callback

def count_tokens(chain, query):
    with get_openai_callback() as cb:
        result = chain.predict(input=query)  # Use predict instead of run
        print(f'Spent a total of {cb.total_tokens} tokens')
    return result

## *Section 2: Conversation Chain*

*In this section we will go over LangChain's conversation chain.*

First we want to import the corresponding libaries used for conversation chains.

In [210]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

Now we want to create a memory, which is just a buffer that holds data, in this case the previous conversations.

In [211]:
memory = ConversationBufferMemory()

Afterwards we need to settup our chain, which has our model, the memory we just made, and we can tweak verbose to figure out if the chain is behaving as we would expect.

In [None]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

Now we can enter some text and see the inputs and outputs with verbose!

In [212]:
conversation.predict(input="Hi, my name is Josh")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Josh
AI:[0m

[1m> Finished chain.[0m


"Hello, Josh! It's great to meet you! How's your day going so far?"

In [213]:
conversation.predict(input="What is 9+10?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Josh
AI: Hello, Josh! It's great to meet you! How's your day going so far?
Human: What is 9+10?
AI:[0m

[1m> Finished chain.[0m


"9 + 10 equals 19! It's a simple addition problem, but it's always good to double-check your math. Do you enjoy math, or is there another subject you prefer?"

In [214]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Josh
AI: Hello, Josh! It's great to meet you! How's your day going so far?
Human: What is 9+10?
AI: 9 + 10 equals 19! It's a simple addition problem, but it's always good to double-check your math. Do you enjoy math, or is there another subject you prefer?
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


"Your name is Josh! It's nice to remember names; it makes conversations feel more personal. Do you have any hobbies or interests you'd like to share, Josh?"

If we would like to see the same thing the AI is reading from we can access memory to look into all the past conversations.

In [215]:
print(memory.buffer)

Human: Hi, my name is Josh
AI: Hello, Josh! It's great to meet you! How's your day going so far?
Human: What is 9+10?
AI: 9 + 10 equals 19! It's a simple addition problem, but it's always good to double-check your math. Do you enjoy math, or is there another subject you prefer?
Human: What is my name?
AI: Your name is Josh! It's nice to remember names; it makes conversations feel more personal. Do you have any hobbies or interests you'd like to share, Josh?


This is another way we can look at the history.

In [216]:
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Josh\nAI: Hello, Josh! It's great to meet you! How's your day going so far?\nHuman: What is 9+10?\nAI: 9 + 10 equals 19! It's a simple addition problem, but it's always good to double-check your math. Do you enjoy math, or is there another subject you prefer?\nHuman: What is my name?\nAI: Your name is Josh! It's nice to remember names; it makes conversations feel more personal. Do you have any hobbies or interests you'd like to share, Josh?"}

To reset memory all we do is reassign it to the conversation buffer memory.

In [217]:
memory = ConversationBufferMemory()

Here we can 'force' text into the AI, which we will play around with later on...

In [255]:
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

print(memory.buffer)

Human: Hi
AI: What's up


In [256]:
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

## *Section 3: Conversation Summary Memory*

*In this section we will go over LangChain's conversation summary memory.*

Now again import the corresponding libaries, in this case we are using conversation summary memory.

In [220]:
from langchain.memory import ConversationSummaryMemory

This is basically the same as before, however now we are passing in our LLM to the memory as well as the conversation chain.

In [221]:
memory = ConversationSummaryMemory(llm = llm)

In [222]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

Now we are going to test the memory recall of this conversation with some questions and answers.

In [223]:
conversation.predict(input="Hi, my name is James")

"Hello, James! It's great to meet you! How's your day going so far?"

In [224]:
conversation.predict(input="What is 9 + 10?")

"The answer to 9 + 10 is 19! It's a simple addition problem, but it's always good to double-check your math. How's your day going, James?"

In [225]:
conversation.predict(input="What is my name?")

"Your name is James! It's nice to meet you, James. How's your day going so far?"

## *Section 4: Conversation Buffer Window Memory*

*In this section we will go over LangChain's conversation buffer window memory.*

Once again, importing the libaries we need for conversation buffer window memory.

In [226]:
from langchain.memory import ConversationBufferWindowMemory

Same as before with the settup, however this time we have 'k=1', k in this instance is how many individual parts of our conversation the memory will store, as showcased below, and this has a direct response on limiting the token usage.

In [227]:
memory = ConversationBufferWindowMemory(k=1, llm=llm) 

In [228]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})


In [229]:
memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

Now we also want to test the conversation when programmed this way.

In [230]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

In [231]:
conversation.predict(input="Hi, my name is James")

"Hi James! It's great to meet you! I'm here to chat about anything you'd like. What’s on your mind today?"

In [232]:
conversation.predict(input="What is 9+10?")

"9 + 10 equals 19! If you have any more math questions or anything else you'd like to discuss, feel free to ask!"

In [233]:
conversation.predict(input="What is my name?")

'I’m not sure what your name is since I don’t have that information. But I’d love to know it if you’d like to share! What else can I help you with today?'

As you can see the main downside to this is that the memory wipes the name from it's memory, which is annoying if we want to have long conversations going back to previous topics we discussed earlier, however very useful if we need to cut down on tokens as can be seen below!

In [234]:
count_tokens(conversation, "How are you today?")

Spent a total of 145 tokens


'I’m doing great, thank you for asking! I’m always here and ready to chat. How about you? How’s your day going?'

## *Section 5: Conversation Token Buffer Memory*

*In this section we will go over LangChain's conversation token buffer memory.*

Let's go again with conversation token buffer memory.

In [235]:
from langchain.memory import ConversationTokenBufferMemory

Now we set the max amount of tokens we can spend in the memory, this is super useful however we might use more as the query itself has tokens involved.

In [None]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=70)

Here we are forcing 'fixed' conversations into our memory, and if you run this yourself you can see that by changing the token limit this is also getting rid of sections of the memory.

In [251]:
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [None]:
memory.load_memory_variables({})

In [252]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [254]:
conversation.predict(input="Dandilions are what?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: AI is what?!
AI: Amazing!
Human: Backpropagation is what?
AI: Beautiful!
Human: Chatbots are what?
AI: Charming!
Human: Dandilions are what?
AI:[0m

[1m> Finished chain.[0m


'Dandelions are delightful! They are bright yellow flowers that belong to the Asteraceae family and are often seen as a symbol of resilience and adaptability. Dandelions are known for their fluffy seed heads, which children love to blow and make wishes on. Interestingly, every part of the dandelion is edible, and they are often used in salads, teas, and even as a natural remedy for various ailments. Plus, they play a crucial role in supporting pollinators like bees!'

## *Section 6: Conversation Summary Buffer Memory*

*In this section we will go over LangChain's conversation summary buffer memory.*

One last time, with this being conversation summary buffer memory.

In [241]:
from langchain.memory import ConversationSummaryBufferMemory

Here we are feeding in a schedule that the AI will use to inform us of anything we need to know.

In [242]:
# create a long string
schedule = "There is a meeting at 8am with James. \
You will need your cursor AI course prepared. \
9am-12pm have time to do more with Llama and Langchain \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [243]:
memory.load_memory_variables({})

{'history': "System: The human greets the AI, and the AI responds casually. The human mentions they are just hanging out, to which the AI replies that it's cool.\nHuman: What is on the schedule today?\nAI: There is a meeting at 8am with James. You will need your cursor AI course prepared. 9am-12pm have time to do more with Llama and Langchain At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo."}

In [244]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

As you can see from the previous snippets the AI actually uses the token limit to summarise what has happened rather then keep exact details (unless it can afford to do so), this is especially useful for a generic conversation, where the topic is needed to understand the question being asked, however this is not good for the exact details mentioned.

In [245]:
conversation.predict(input="What would be a good demo to show?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human greets the AI, and the AI responds casually. The human mentions they are just hanging out, to which the AI replies that it's cool.
Human: What is on the schedule today?
AI: There is a meeting at 8am with James. You will need your cursor AI course prepared. 9am-12pm have time to do more with Llama and Langchain At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo.
Human: What would be a good demo to show?
AI:[0m

[1m> Finished chain.[0m


"A great demo to show would be a real-time text generation using a large language model (LLM). You could set up a scenario where you input a prompt related to the customer's interests or industry, and then showcase how the model generates coherent and contextually relevant responses. \n\nAnother option could be demonstrating a chatbot that can handle customer inquiries, showcasing its ability to understand and respond to various questions. You could also highlight features like sentiment analysis or summarization of long texts, which can be particularly impressive. \n\nIf you have access to any recent advancements or specific applications of LLMs, like code generation or creative writing, those could also be engaging to demonstrate. Just make sure to tailor the demo to what you think would resonate most with your customer!"

In [246]:
memory.load_memory_variables({})

{'history': "System: The human greets the AI, and the AI responds casually. The human mentions they are just hanging out, to which the AI replies that it's cool. The human then asks about the schedule for the day, and the AI outlines a series of meetings and tasks, including a meeting with James, preparation for a cursor AI course, and a lunch meeting with a customer to discuss the latest in AI. The human inquires about a good demo to show during the meeting, and the AI suggests demonstrating real-time text generation using a large language model (LLM), a chatbot for customer inquiries, or showcasing recent advancements like code generation or creative writing, emphasizing the importance of tailoring the demo to the customer's interests."}