# Understanding Memory in LLMs

In the previous Notebooks, we successfully explored how OpenAI models can enhance the results from Azure AI Search queries. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that LLMs (Large Language Models) have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain_community.chat_message_histories import ChatMessageHistory, CosmosDBChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables import ConfigurableFieldSpec
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
from typing import List

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import CustomAzureSearchRetriever, get_answer
from common.prompts import DOCSEARCH_PROMPT

from dotenv import load_dotenv
load_dotenv()

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "What was my prior question?"

In [4]:
COMPLETION_TOKENS = 1500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=os.environ["GPT35_DEPLOYMENT_NAME"], 
                      temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
output_parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that give thorough responses to users."),
    ("user", "{input}")
])

In [6]:
# Let's see what the GPT model responds
chain = prompt | llm | output_parser
response_to_initial_question = chain.invoke({"input": QUESTION})
display(Markdown(response_to_initial_question))

Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or punishments. Here are some common use cases for reinforcement learning:

1. Game playing: Reinforcement learning has been successfully applied to various games, such as chess, Go, and video games. Agents can learn to play games at a high level by exploring different strategies and refining their decision-making abilities.

2. Robotics: Reinforcement learning can be used to train robots to perform complex tasks, such as grasping objects, walking, or even flying drones. By interacting with the environment, robots can learn to optimize their actions to achieve desired goals.

3. Autonomous vehicles: Reinforcement learning can be used to train self-driving cars to make decisions in real-time, such as lane changing, merging, and navigating through complex traffic scenarios. Agents can learn to optimize their driving behavior based on safety, efficiency, and passenger comfort.

4. Resource management: Reinforcement learning can be applied to optimize the allocation of resources in various domains, such as energy grids, transportation systems, or supply chains. Agents can learn to make decisions that maximize efficiency, minimize costs, or reduce waste.

5. Personalized recommendation systems: Reinforcement learning can be used to build recommendation systems that learn from user feedback. By exploring different recommendations and observing user responses, the system can learn to make personalized recommendations that maximize user satisfaction.

6. Healthcare: Reinforcement learning can be applied to healthcare settings, such as optimizing treatment plans for chronic diseases, personalized drug dosage recommendations, or adaptive clinical trial designs. Agents can learn to make decisions that maximize patient outcomes while considering individual characteristics and medical constraints.

These are just a few examples, and the potential use cases for reinforcement learning are vast and diverse. The key is to identify domains where decision-making is crucial and where the agent can interact with an environment to learn and improve its behavior over time.

In [8]:
#Now let's ask a follow up question
printmd(chain.invoke({"input": FOLLOW_UP_QUESTION}))

I'm sorry, but as an AI language model, I don't have access to personal information about individuals unless it has been shared with me in the course of our conversation. I am designed to respect user privacy and confidentiality. Therefore, I don't have knowledge of your prior question or any other personal data. My primary function is to provide information and answer questions to the best of my knowledge and abilities. If there's anything specific you'd like assistance with, feel free to ask!

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [9]:
hist_prompt = ChatPromptTemplate.from_template(
"""
    {history}
    Human: {question}
    AI:
"""
)
chain = hist_prompt | llm | output_parser

In [10]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response_to_initial_question)

In [11]:
printmd(chain.invoke({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

Your prior question was "Tell me some use cases for reinforcement learning".

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

From Langchain website:
    
A memory system needs to support two basic actions: reading and writing. Recall that every chain defines some core execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can come from memory. A chain will interact with its memory system twice in a given run.

    AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its memory system and augment the user inputs.
    AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs of the current run to memory, so that they can be referred to in future runs.
    
So this process adds delays to the response, but it is a necessary delay :)

![image](https://python.langchain.com/assets/images/memory_diagram-0627c68230aa438f9b5419064d63cbbc.png)

In [12]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books"
indexes = [index1_name, index2_name, index3_name]

In [13]:
# Initialize our custom retriever 
retriever = CustomAzureSearchRetriever(indexes=indexes, topK=10, reranker_threshold=1)

If you check closely in prompts.py, there is an optional variable in the `DOCSEARCH_PROMPT` called `history`. Now it is the time to use it. It is basically a place holder were we will inject the conversation in the prompt so the LLM is aware of it before it answers.

**Now let's add memory to it:**

In [14]:
store = {} # Our first memory will be a dictionary in memory

# We have to define a custom function that takes a session_id and looks somewhere
# (in this case in a dictionary in memory) for the conversation
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


In [15]:
# We use our original chain with the retriever but removing the StrOutputParser
chain = (
    {
        "context": itemgetter("question") | retriever, 
        "question": itemgetter("question"),
        "history": itemgetter("history")
    }
    | DOCSEARCH_PROMPT
    | llm
)

## Then we pass the above chain to another chain that adds memory to it

output_parser = StrOutputParser()

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
) | output_parser

In [16]:
# This is where we configure the session id
config={"configurable": {"session_id": "abc123"}}

Notice below, that we are adding a `history` variable in the call. This variable will hold the chat historywithin the prompt.

In [17]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))

Reinforcement learning (RL) has been applied to various domains and has shown promising results in solving complex problems. Here are some notable use cases for reinforcement learning:

1. **Game Playing**: RL has been successfully applied to various games, such as chess, Go, and poker. For example, AlphaGo, developed by DeepMind, used RL to defeat world champion Go players. RL algorithms can learn optimal strategies through trial and error, resulting in superhuman performance in game playing.

2. **Robotics**: RL has been used to train robots to perform complex tasks, such as grasping objects, walking, and flying. RL enables robots to learn from their interactions with the environment and improve their performance over time. This has applications in industrial automation, healthcare, and autonomous vehicles.

3. **Recommendation Systems**: RL can be used to personalize recommendations for users in various domains, such as e-commerce, entertainment, and advertising. By learning from user feedback and interactions, RL algorithms can optimize recommendations to maximize user satisfaction and engagement.

4. **Resource Management**: RL can be applied to optimize resource allocation and scheduling in various industries, such as transportation, logistics, and energy. RL algorithms can learn to make decisions in dynamic environments to maximize efficiency and minimize costs.

5. **Finance and Trading**: RL has been used in financial markets to develop trading strategies and make investment decisions. RL algorithms can learn to exploit patterns in market data and adapt to changing market conditions, leading to improved trading performance.

6. **Healthcare**: RL has the potential to optimize treatment plans and decision-making in healthcare. It can be used to personalize treatment strategies for patients, optimize scheduling in hospitals, and improve disease diagnosis and prediction.

7. **Autonomous Agents**: RL is crucial in developing autonomous agents that can learn to navigate and interact with their environment. This has applications in autonomous vehicles, drones, and virtual assistants.

These are just a few examples of the wide range of use cases for reinforcement learning. RL continues to be an active area of research and development, with potential applications in many other domains.

In [18]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

Your prior question was: "Tell me some use cases for reinforcement learning."

In [19]:
# Remembers
printmd(chain_with_history.invoke({"question": "Thank you! Good bye"},config=config))

You're welcome! If you have any more questions in the future, feel free to ask. Goodbye and take care!

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wish to provide recommendations in the future. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory

In [20]:
# Create the function to retrieve the conversation

def get_session_history(session_id: str, user_id: str) -> CosmosDBChatMessageHistory:
    cosmos = CosmosDBChatMessageHistory(
        cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
        cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
        cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
        connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
        session_id=session_id,
        user_id=user_id
        )

    # prepare the cosmosdb instance
    cosmos.prepare_cosmos()
    return cosmos


In [21]:
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="Unique identifier for the conversation.",
            default="",
            is_shared=True,
        ),
    ],
) | output_parser

In [22]:
# This is where we configure the session id and user id
random_session_id = "session"+ str(random.randint(1, 1000))
ramdom_user_id = "user"+ str(random.randint(1, 1000))

config={"configurable": {"session_id": random_session_id, "user_id": ramdom_user_id}}

In [23]:
config

{'configurable': {'session_id': 'session155', 'user_id': 'user185'}}

In [24]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))

Reinforcement learning (RL) has been applied to various domains and has shown promising results. Here are some use cases for reinforcement learning:

1. **Game Playing**: RL has been successfully applied to game playing, where agents learn to make optimal decisions in complex game environments. For example, AlphaGo, developed by DeepMind, used RL to defeat world champion Go players.

2. **Robotics**: RL has been used to train robots to perform complex tasks and navigate real-world environments. Robots can learn to manipulate objects, walk, and perform other physical actions through RL algorithms.

3. **Autonomous Vehicles**: RL can be used to train autonomous vehicles to make decisions in dynamic and uncertain environments. Agents can learn to navigate traffic, make lane changes, and respond to unexpected situations.

4. **Recommendation Systems**: RL can be used in personalized recommendation systems to learn user preferences and make recommendations. Agents can learn to optimize the selection of items, such as movies, music, or products, based on user feedback.

5. **Resource Management**: RL can be applied to optimize resource allocation and management in various domains. For example, in energy management, RL can be used to optimize the scheduling of power generation and storage systems.

6. **Healthcare**: RL can be used in healthcare for personalized treatment planning, drug dosage optimization, and disease management. Agents can learn to make treatment decisions based on patient data and optimize patient outcomes.

7. **Finance**: RL can be applied to financial trading, portfolio management, and risk assessment. Agents can learn to make trading decisions based on market data and optimize investment strategies.

8. **Supply Chain Management**: RL can be used to optimize inventory management, pricing, and logistics in supply chain operations. Agents can learn to make decisions that minimize costs and maximize efficiency.

9. **Control Systems**: RL can be used in control systems to optimize the control policies of complex systems. For example, RL can be applied to optimize the control of power grids, manufacturing processes, or autonomous drones.

These are just a few examples of the diverse applications of reinforcement learning. RL has the potential to revolutionize various industries by enabling intelligent decision-making and optimization in complex and dynamic environments.

In [25]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

Your prior question was: "Tell me some use cases for reinforcement learning."

In [26]:
# Remembers
printmd(chain_with_history.invoke(
    {"question": "Can you tell me a one line summary of our conversation?"},
    config=config))

In our conversation, we discussed various use cases for reinforcement learning, including game playing, robotics, autonomous vehicles, recommendation systems, resource management, healthcare, finance, supply chain management, and control systems.

In [27]:
printmd(chain_with_history.invoke(
    {"question": "Thank you very much!"},
    config=config))

You're welcome! If you have any more questions, feel free to ask.

In [28]:
printmd(chain_with_history.invoke(
    {"question": "I do have one more question, why did you give me a one line summary?"},
    config=config))

I provided a one-line summary as a concise overview of the main topic and points discussed in our conversation. It serves as a quick reference or summary of the key information covered.

In [29]:
printmd(chain_with_history.invoke(
    {"question": "why not 2?"},
    config=config))

I apologize for not providing a two-line summary. I can certainly provide a two-line summary if you prefer. Here it is:

In our conversation, we discussed various use cases for reinforcement learning, including game playing, robotics, autonomous vehicles, recommendation systems, resource management, healthcare, finance, supply chain management, and control systems. Additionally, we touched upon the importance of a concise summary for quick reference.

#### Let's check our Azure CosmosDB to see the whole conversation


![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input. This doesn't seem efficient, but regardless, we are very close to finish our first RAG-talk to your data Bot.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models (1106 or newer), we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it struggles sometimes to follow them accurately (not too reliable). Additionally, its limited "memory" (token context) can make sustained conversations challenging. Its response are also simple not deep.

**GPT-4-Turbo** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills, consistently adheres to instructions and its answers are beter. It has extended memory retention (larger context size) for instructions, and it excels at following them. Its responses are deep and thorough.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

In the next notebook 6, we are going to build our first RAG bot. In order to do this we will introduce the concept of Agents.