### LangChain Essentials

# Conversational Memory for OpenAI & Llama - LangChain #3

*A key technique to allowing chatbots to talk to us with predictions and a wealth of knowledge about our conversations is bounded by memory, however memory as issues, as we will uncover, it can be detrimental if not properly looked after, but hugely beneficial if it has...*

# **Chapter 3: Conversational Memory**

1. LangChain uses it's own `conversational` template which we will use to feed our LLMs.
2. Memory has a direct impact on `tokens`, which we will try to reduce.
3. We will go over four memory techniques, highlighting the strengths and weeknesses in our LLMs.

## Choosing your Model

This example is split into two versions - The [Ollama version], allowing us to run our LLM locally without needing any external services or API keys. The OpenAI version uses the OpenAI API and requires an OpenAI API key.

## Initializing OpenAI's gpt-4o-mini

We start by initializing the OpenAI model, fine-tuned for instruction following. We pull the model from OpenAI by switching to our terminal and executing:

pip install openai

Once the model libary has finished downloading, we initialize it in LangChain using the ChatOpenAI class:

In [1]:
import os

# OpenAI key to use the model
OPENAI_API_KEY = ""

openai_model = "gpt-4o-mini"

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

In [2]:
from langchain_openai import ChatOpenAI

# For normal accurate responses
llm = ChatOpenAI(temperature=0.0, model=openai_model, openai_api_key = OPENAI_API_KEY)

# For unique creative responses
creative_llm = ChatOpenAI(temperature=0.9, model=openai_model, openai_api_key = OPENAI_API_KEY)

## Counting Tokens

One of the most important things about memory is the relationship between cost and the accuracy of the results, the most 'memory' we feed into the model requires more token usuage, and therefore costs more to run, to keep track of this we will use a token counter to see how much we are spending on each conversation.

We don't want to worry about warnings so we will ignore them.

In [3]:
import warnings
warnings.filterwarnings('ignore')

Use ***-uv add tiktoken*** to access this function that will count the number of tokens used for each chat request sent to our LLMs.

In [4]:
import tiktoken
from langchain.callbacks import get_openai_callback

def count_tokens(chain, query):
    with get_openai_callback() as cb:
        result = chain.predict(input=query)  # Use predict instead of run
        print(f'Spent a total of {cb.total_tokens} tokens')
    return result

## Conversation Chain

*In this section we will go over LangChain's conversation chain.*

First we want to import the corresponding libaries used for conversation chains.

In [5]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

Now we want to create a memory, which is just a buffer that holds data, in this case the previous conversations.

In [6]:
memory = ConversationBufferMemory()

Afterwards we need to settup our chain, which has our model, the memory we just made, and we can tweak verbose to figure out if the chain is behaving as we would expect.

In [7]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

Now we can enter some text and see the inputs and outputs with verbose!

In [8]:
conversation.predict(input="Hi, my name is Josh")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Josh
AI:[0m

[1m> Finished chain.[0m


"Hello, Josh! It's great to meet you! How's your day going so far?"

In [9]:
conversation.predict(input="What is 9+10?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Josh
AI: Hello, Josh! It's great to meet you! How's your day going so far?
Human: What is 9+10?
AI:[0m

[1m> Finished chain.[0m


"9 + 10 equals 19! It's a simple addition problem, but it's always good to double-check your math. Do you enjoy math, or is there another subject you prefer?"

In [10]:
conversation.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Josh
AI: Hello, Josh! It's great to meet you! How's your day going so far?
Human: What is 9+10?
AI: 9 + 10 equals 19! It's a simple addition problem, but it's always good to double-check your math. Do you enjoy math, or is there another subject you prefer?
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


"Your name is Josh! It's nice to have a name to associate with our conversation. Do you have any hobbies or interests you'd like to share, Josh?"

If we would like to see the same thing the AI is reading from we can access memory to look into all the past conversations.

In [11]:
print(memory.buffer)

Human: Hi, my name is Josh
AI: Hello, Josh! It's great to meet you! How's your day going so far?
Human: What is 9+10?
AI: 9 + 10 equals 19! It's a simple addition problem, but it's always good to double-check your math. Do you enjoy math, or is there another subject you prefer?
Human: What is my name?
AI: Your name is Josh! It's nice to have a name to associate with our conversation. Do you have any hobbies or interests you'd like to share, Josh?


This is another way we can look at the history.

In [12]:
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Josh\nAI: Hello, Josh! It's great to meet you! How's your day going so far?\nHuman: What is 9+10?\nAI: 9 + 10 equals 19! It's a simple addition problem, but it's always good to double-check your math. Do you enjoy math, or is there another subject you prefer?\nHuman: What is my name?\nAI: Your name is Josh! It's nice to have a name to associate with our conversation. Do you have any hobbies or interests you'd like to share, Josh?"}

To reset memory all we do is reassign it to the conversation buffer memory.

In [13]:
memory = ConversationBufferMemory()

Here we can 'force' text into the AI, which we will play around with later on...

In [14]:
memory.save_context({"input": "Hi"}, 
                    {"output": "What's up"})

print(memory.buffer)

Human: Hi
AI: What's up


In [15]:
memory.save_context({"input": "Not much, just hanging"}, 
                    {"output": "Cool"})

memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

## Conversation Summary Memory

*In this section we will go over LangChain's conversation summary memory.*

Now again import the corresponding libaries, in this case we are using conversation summary memory.

In [16]:
from langchain.memory import ConversationSummaryMemory

This is basically the same as before, however now we are passing in our LLM to the memory as well as the conversation chain.

In [17]:
memory = ConversationSummaryMemory(llm = llm)

In [18]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

Now we are going to test the memory recall of this conversation with some questions and answers.

In [19]:
conversation.predict(input="Hi, my name is James")

"Hello, James! It's great to meet you! How's your day going so far?"

In [20]:
conversation.predict(input="What is 9 + 10?")

"The answer to 9 + 10 is 19! It's a simple addition problem, but it's always good to double-check your math. How's your day going, James?"

In [21]:
conversation.predict(input="What is my name?")

"Your name is James! It's nice to meet you, James. How's your day going so far?"

## Conversation Buffer Window Memory

*In this section we will go over LangChain's conversation buffer window memory.*

Once again, importing the libaries we need for conversation buffer window memory.

In [22]:
from langchain.memory import ConversationBufferWindowMemory

Same as before with the settup, however this time we have 'k=1', k in this instance is how many individual parts of our conversation the memory will store, as showcased below, and this has a direct response on limiting the token usage.

In [23]:
memory = ConversationBufferWindowMemory(k=1, llm=llm) 

In [24]:
memory.save_context({"input": "Hi"},
                    {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})


In [25]:
memory.load_memory_variables({})

{'history': 'Human: Not much, just hanging\nAI: Cool'}

Now we also want to test the conversation when programmed this way.

In [26]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

In [27]:
conversation.predict(input="Hi, my name is James")

"Hi James! It's great to meet you! I'm here to chat about anything you'd like. What’s on your mind today?"

In [28]:
conversation.predict(input="What is 9+10?")

"9 + 10 equals 19! If you have any more math questions or anything else you'd like to discuss, feel free to ask!"

In [29]:
conversation.predict(input="What is my name?")

'I’m not sure what your name is since I don’t have that information. But I’d love to know it if you’d like to share! What else can I help you with today?'

As you can see the main downside to this is that the memory wipes the name from it's memory, which is annoying if we want to have long conversations going back to previous topics we discussed earlier, however very useful if we need to cut down on tokens as can be seen below!

In [30]:
count_tokens(conversation, "How are you today?")

Spent a total of 144 tokens


'I’m doing great, thanks for asking! I’m always here and ready to chat. How about you? How’s your day going?'

## Conversation Token Buffer Memory

*In this section we will go over LangChain's conversation token buffer memory.*

Let's go again with conversation token buffer memory.

In [31]:
from langchain.memory import ConversationTokenBufferMemory

Now we set the max amount of tokens we can spend in the memory, this is super useful however we might use more as the query itself has tokens involved.

In [32]:
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=70)

Here we are forcing 'fixed' conversations into our memory, and if you run this yourself you can see that by changing the token limit this is also getting rid of sections of the memory.

In [33]:
memory.save_context({"input": "AI is what?!"},
                    {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"},
                    {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, 
                    {"output": "Charming!"})

In [34]:
memory.load_memory_variables({})

{'history': 'Human: AI is what?!\nAI: Amazing!\nHuman: Backpropagation is what?\nAI: Beautiful!\nHuman: Chatbots are what?\nAI: Charming!'}

In [35]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

In [36]:
conversation.predict(input="Dandilions are what?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: AI is what?!
AI: Amazing!
Human: Backpropagation is what?
AI: Beautiful!
Human: Chatbots are what?
AI: Charming!
Human: Dandilions are what?
AI:[0m

[1m> Finished chain.[0m


'Dandelions are delightful! These bright yellow flowers are often seen as weeds, but they have a lot of charm and usefulness. Did you know that dandelions are edible? Every part of the plant can be consumed, from the leaves in salads to the roots that can be roasted and brewed as a coffee substitute. Plus, they are great for pollinators like bees! Their fluffy seed heads are also a favorite for children to blow and make wishes. What do you think about dandelions?'

## Conversation Summary Buffer Memory

*In this section we will go over LangChain's conversation summary buffer memory.*

One last time, with this being conversation summary buffer memory.

In [37]:
from langchain.memory import ConversationSummaryBufferMemory

Here we are feeding in a schedule that the AI will use to inform us of anything we need to know.

In [38]:
# create a long string
schedule = "There is a meeting at 8am with James. \
You will need your cursor AI course prepared. \
9am-12pm have time to do more with Llama and Langchain \
At Noon, lunch at the italian resturant with a customer who is driving \
from over an hour away to meet you to understand the latest in AI. \
Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What is on the schedule today?"}, 
                    {"output": f"{schedule}"})

In [39]:
memory.load_memory_variables({})

{'history': 'System: The human greets the AI, and the AI responds casually. The human mentions they are just hanging out, to which the AI replies positively.\nHuman: What is on the schedule today?\nAI: There is a meeting at 8am with James. You will need your cursor AI course prepared. 9am-12pm have time to do more with Llama and Langchain At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo.'}

In [40]:
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=True
)

As you can see from the previous snippets the AI actually uses the token limit to summarise what has happened rather then keep exact details (unless it can afford to do so), this is especially useful for a generic conversation, where the topic is needed to understand the question being asked, however this is not good for the exact details mentioned.

In [41]:
conversation.predict(input="What would be a good demo to show?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
System: The human greets the AI, and the AI responds casually. The human mentions they are just hanging out, to which the AI replies positively.
Human: What is on the schedule today?
AI: There is a meeting at 8am with James. You will need your cursor AI course prepared. 9am-12pm have time to do more with Llama and Langchain At Noon, lunch at the italian resturant with a customer who is driving from over an hour away to meet you to understand the latest in AI. Be sure to bring your laptop to show the latest LLM demo.
Human: What would be a good demo to show?
AI:[0m

[1m> Finished chain.[0m


"A great demo to show would be a real-time text generation using a large language model (LLM). You could set up a scenario where you input a prompt related to a common business problem, like generating a marketing email or summarizing a report. This would showcase the model's ability to understand context and produce coherent, relevant text.\n\nAnother option could be demonstrating a chatbot interaction, where you simulate a conversation with a customer service bot. You could highlight how the LLM can handle various queries and provide helpful responses, which would be particularly relevant for your customer.\n\nIf you have access to any visual tools, you could also show how LLMs can be integrated with applications, like creating a simple web app that uses the model to generate content based on user input. This would illustrate the practical applications of LLMs in real-world scenarios."

In [42]:
memory.load_memory_variables({})

{'history': "System: The human greets the AI, and the AI responds casually. The human mentions they are just hanging out, to which the AI replies positively. The human then inquires about the day's schedule, and the AI outlines a series of meetings and tasks, including a meeting with James, preparation for a cursor AI course, and a lunch meeting with a customer to discuss the latest in AI. The AI suggests bringing a laptop to demonstrate a large language model (LLM). When the human asks for a good demo to show, the AI recommends real-time text generation for business scenarios, a chatbot interaction to showcase customer service capabilities, and the integration of LLMs with applications to illustrate practical uses."}