# Understanding Memory in LLMs

In the previous Notebooks, we successfully explored how OpenAI models can enhance the results from Azure AI Search queries. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that LLMs (Large Language Models) have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain_community.chat_message_histories import ChatMessageHistory, CosmosDBChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables import ConfigurableFieldSpec
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
from typing import List
import openai
from azure.identity import ManagedIdentityCredential

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import CustomAzureSearchRetriever, get_answer
from common.prompts import DOCSEARCH_PROMPT

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [2]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "What was my prior question?"

In [3]:
# Step 1: Specify the client ID of the User Managed Identity
user_managed_identity_client_id = "d30cba06-04c1-4065-a91d-8b7ce3b07b78"  # Replace with your User Managed Identity client ID

# Step 2: Fetch the access token using ManagedIdentityCredential and the client ID of the user-managed identity
credential = ManagedIdentityCredential(client_id=user_managed_identity_client_id)
token = credential.get_token("https://cognitiveservices.azure.com/.default")

# Step 3: Set the access token in the OpenAI API
openai.api_key = token.token
openai.api_type = "azure"
openai.api_base = "https://azuremlopenai.openai.azure.com/"  # Replace with your OpenAI resource's base URL
openai.api_version = "2023-06-01-preview"  # Use the correct API version

In [4]:
COMPLETION_TOKENS = 1000
# Create an OpenAI instance
llm = AzureChatOpenAI(openai_api_key=token.token,azure_endpoint=openai.api_base,openai_api_version=openai.api_version,deployment_name=os.environ["GPT4o_DEPLOYMENT_NAME"], 
                      temperature=0.5, max_tokens=COMPLETION_TOKENS,)

In [5]:
# We create a very simple prompt template, just the question as is:
output_parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that give thorough responses to users."),
    ("user", "{input}")
])

In [6]:
# Let's see what the GPT model responds
chain = prompt | llm | output_parser
response_to_initial_question = chain.invoke({"input": QUESTION})
display(Markdown(response_to_initial_question))

Reinforcement learning (RL) is a type of machine learning where an agent learns to make decisions by performing actions in an environment to maximize some notion of cumulative reward. Here are several use cases across various domains:

### 1. **Robotics**
- **Autonomous Navigation**: RL is used to train robots to navigate complex environments, avoid obstacles, and find optimal paths.
- **Manipulation Tasks**: Robots can learn to perform tasks such as picking and placing objects, assembly, and other intricate manipulations.

### 2. **Gaming**
- **Game Playing**: RL has been famously used in training agents to play games like Go, Chess, and video games (e.g., AlphaGo, AlphaZero, and agents playing Atari games).
- **Game Design**: RL can assist in designing game levels or AI opponents that adapt to the player's skill level.

### 3. **Finance**
- **Algorithmic Trading**: RL is used to develop trading strategies that adapt to market conditions and optimize returns.
- **Portfolio Management**: Agents can learn to allocate assets in a portfolio to maximize returns while managing risk.

### 4. **Healthcare**
- **Treatment Strategies**: RL can help in developing personalized treatment plans for patients by learning from historical data and patient responses.
- **Drug Discovery**: RL can assist in optimizing the process of drug discovery, including the design and testing of new compounds.

### 5. **Transportation**
- **Autonomous Vehicles**: RL is used to train self-driving cars to navigate roads, make decisions, and interact safely with other vehicles and pedestrians.
- **Traffic Management**: RL can optimize traffic light control to reduce congestion and improve traffic flow.

### 6. **Industrial Automation**
- **Process Optimization**: RL can optimize manufacturing processes, energy management, and resource allocation to improve efficiency and reduce costs.
- **Predictive Maintenance**: RL can predict equipment failures and schedule maintenance activities to minimize downtime.

### 7. **Natural Language Processing**
- **Conversational Agents**: RL can be used to train chatbots and virtual assistants to handle conversations more effectively by learning from interactions.
- **Machine Translation**: RL can improve the quality of machine translation by optimizing the translation process based on feedback.

### 8. **Energy**
- **Smart Grid Management**: RL can optimize the distribution of electricity in smart grids to balance supply and demand efficiently.
- **Renewable Energy**: RL can help in optimizing the operation of renewable energy sources like wind and solar power to maximize output and efficiency.

### 9. **Marketing**
- **Personalized Recommendations**: RL can be used to develop recommendation systems that adapt to user preferences over time.
- **Dynamic Pricing**: RL can optimize pricing strategies in real-time to maximize revenue and customer satisfaction.

### 10. **Supply Chain Management**
- **Inventory Management**: RL can optimize inventory levels to reduce costs and meet demand effectively.
- **Logistics and Routing**: RL can improve the efficiency of logistics operations, including routing of delivery vehicles and warehouse management.

### 11. **Education**
- **Personalized Learning**: RL can create adaptive learning systems that tailor educational content to individual student needs and learning paces.
- **Tutoring Systems**: RL can enhance intelligent tutoring systems to provide more effective and personalized feedback to students.

### 12. **Space Exploration**
- **Planetary Rovers**: RL can be used to train autonomous rovers to navigate and perform tasks on other planets.
- **Satellite Management**: RL can optimize the operations of satellites, including orbit adjustments and resource management.

Reinforcement learning is a powerful tool that can be applied to a wide range of problems where decision-making and optimization are critical. The versatility and adaptability of RL make it suitable for many real-world applications.

In [7]:
#Now let's ask a follow up question
printmd(chain.invoke({"input": FOLLOW_UP_QUESTION}))

I'm sorry, but I don't have access to previous interactions or any prior questions you may have asked. How can I assist you today?

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = ChatPromptTemplate.from_template(
"""
    {history}
    Human: {question}
    AI:
"""
)
chain = hist_prompt | llm | output_parser

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response_to_initial_question)

In [10]:
printmd(chain.invoke({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

Your prior question was: "Tell me some use cases for reinforcement learning."

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

From Langchain website:
    
A memory system needs to support two basic actions: reading and writing. Recall that every chain defines some core execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can come from memory. A chain will interact with its memory system twice in a given run.

    AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its memory system and augment the user inputs.
    AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs of the current run to memory, so that they can be referred to in future runs.
    
So this process adds delays to the response, but it is a necessary delay :)

![image](./images/memory_diagram.png)

In [11]:
index1_name = "srch-index-files-yk"
index2_name = "srch-index-csv-yk"
#index3_name = "srch-index-books-yk"
indexes = [index1_name, index2_name]#, index3_name]

In [12]:
# Initialize our custom retriever 
retriever = CustomAzureSearchRetriever(indexes=indexes, topK=10, reranker_threshold=1)

If you check closely in prompts.py, there is an optional variable in the `DOCSEARCH_PROMPT` called `history`. Now it is the time to use it. It is basically a place holder were we will inject the conversation in the prompt so the LLM is aware of it before it answers.

**Now let's add memory to it:**

In [13]:
store = {} # Our first memory will be a dictionary in memory

# We have to define a custom function that takes a session_id and looks somewhere
# (in this case in a dictionary in memory) for the conversation
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


In [14]:
# We use our original chain with the retriever but removing the StrOutputParser
chain = (
    {
        "context": itemgetter("question") | retriever, 
        "question": itemgetter("question"),
        "history": itemgetter("history")
    }
    | DOCSEARCH_PROMPT
    | llm
)

## Then we pass the above chain to another chain that adds memory to it

output_parser = StrOutputParser()

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
) | output_parser

In [15]:
# This is where we configure the session id
config={"configurable": {"session_id": "abc123"}}

Notice below, that we are adding a `history` variable in the call. This variable will hold the chat historywithin the prompt.

In [16]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))

printing response
<Response [200]>
printing response
<Response [200]>


Reinforcement learning (RL) has a wide array of applications across various fields, leveraging its ability to learn optimal actions through trial and error. Here are some notable use cases:

1. **Robotics**: RL is extensively used in robotics for tasks such as robotic manipulation, autonomous navigation, and robotic control. It enables robots to learn complex tasks in dynamic environments by receiving feedback from their actions and adjusting accordingly.

2. **Game Playing**: One of the most famous applications of RL is in playing games. Algorithms like AlphaGo have demonstrated RL's capability by defeating human champions in complex games like Go. RL is also used in other games like chess, StarCraft, and various video games to develop intelligent agents that can learn and master the game.

3. **Autonomous Vehicles**: RL is crucial in developing self-driving cars. It helps in decision-making processes such as lane changing, speed control, and collision avoidance by learning from the environment and optimizing driving strategies.

4. **Healthcare**: In healthcare, RL can be used for personalized treatment planning, optimizing drug dosages, and managing chronic diseases. It can learn from patient data to provide tailored treatment recommendations that improve patient outcomes.

5. **Finance**: RL is applied in algorithmic trading, portfolio management, and financial forecasting. It helps in making trading decisions by learning from historical data and market trends to maximize returns and minimize risks.

6. **Natural Language Processing (NLP)**: RL is used in NLP for tasks like dialogue management in conversational agents, machine translation, and text summarization. It helps in optimizing responses and improving the interaction quality by learning from user feedback.

7. **Industrial Automation**: In industrial settings, RL is used for optimizing manufacturing processes, predictive maintenance, and energy management. It helps in improving efficiency, reducing downtime, and cutting operational costs by learning optimal strategies from the operational data.

8. **Marketing**: RL can optimize marketing strategies by learning from consumer behavior data. It helps in personalized advertising, dynamic pricing, and recommendation systems to enhance customer engagement and increase sales.

These use cases illustrate the versatility and potential of reinforcement learning in solving complex problems and optimizing performance across various domains.

In [17]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

printing response
<Response [200]>
printing response
<Response [200]>


Your prior question was: "Tell me some use cases for reinforcement learning."

In [18]:
# Remembers
printmd(chain_with_history.invoke({"question": "Thank you! Good bye"},config=config))

printing response
<Response [200]>
printing response
<Response [200]>


You're welcome! If you have any more questions in the future, feel free to ask. Goodbye!

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wish to provide recommendations in the future. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory

In [19]:
credential = ManagedIdentityCredential(client_id="d30cba06-04c1-4065-a91d-8b7ce3b07b78")

In [20]:
# Create the function to retrieve the conversation

def get_session_history(session_id: str, user_id: str) -> CosmosDBChatMessageHistory:
    cosmos = CosmosDBChatMessageHistory(
        credential=credential,
        cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
        cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
        cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
        connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
        session_id=session_id,
        user_id=user_id
        )

    # prepare the cosmosdb instance
    cosmos.prepare_cosmos()
    return cosmos


In [21]:
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="Unique identifier for the conversation.",
            default="",
            is_shared=True,
        ),
    ],
) | output_parser

In [22]:
# This is where we configure the session id and user id
random_session_id = "session"+ str(random.randint(1, 1000))
ramdom_user_id = "user"+ str(random.randint(1, 1000))

config={"configurable": {"session_id": random_session_id, "user_id": ramdom_user_id}}

In [23]:
config

{'configurable': {'session_id': 'session247', 'user_id': 'user913'}}

In [24]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))
#Cosmosdb role assignment by power shell is needed
# $resourceGroupName="ai-bootcamp"
# $accountName="cosmosdb-account-jed5nzg3k2jp6"
# $readOnlyRoleDefinitionId="00000000-0000-0000-0000-000000000002"
# $principalId="ce7ea55d-acf9-4107-a891-ca04f4d417b3"
# az cosmosdb sql role assignment create --account-name $accountName --resource-group $resourceGroupName --scope "/" --principal-id $principalId --role-definition-id $readOnlyRoleDefinitionId



printing response
<Response [200]>
printing response
<Response [200]>


Reinforcement learning (RL) is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative rewards. Here are some notable use cases for reinforcement learning:

1. **Robotics:**
   - RL is extensively used in robotics for tasks such as navigation, manipulation, and control. Robots can learn to perform complex tasks by interacting with their environment and receiving feedback. For example, RL can enable a robotic arm to learn how to grasp objects of different shapes and sizes.

2. **Autonomous Vehicles:**
   - RL algorithms are employed to train self-driving cars to make decisions in real-time, such as lane keeping, obstacle avoidance, and path planning. The car learns to navigate through different driving scenarios by receiving rewards for safe and efficient driving behaviors.

3. **Healthcare:**
   - In healthcare, RL can be used for personalized treatment plans. For example, it can optimize the dosing of medications for patients with chronic diseases by learning from patient responses to different dosages over time.

4. **Finance:**
   - RL is used in finance for portfolio management, algorithmic trading, and risk management. It helps in making investment decisions by learning to balance the trade-off between risk and return based on historical market data.

5. **Gaming:**
   - RL has achieved significant success in gaming, where it is used to train AI agents to play games at superhuman levels. Notable examples include AlphaGo, which defeated human champions in the game of Go, and OpenAI's Dota 2 bot, which competes with professional players.

6. **Industrial Automation:**
   - RL is applied in industrial automation for optimizing manufacturing processes, such as controlling the temperature and pressure in chemical plants, or scheduling and routing in logistics and supply chain management.

7. **Natural Language Processing (NLP):**
   - In NLP, RL is used for tasks like dialogue systems and chatbots. It helps in training models to generate more human-like and contextually appropriate responses by learning from interactions with users.

8. **Energy Management:**
   - RL can optimize energy consumption in smart grids and buildings by learning to adjust heating, cooling, and lighting systems based on occupancy patterns and weather forecasts.

These use cases highlight the versatility of reinforcement learning across various domains, where it helps in making intelligent decisions by learning from interactions with the environment.

In [25]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

printing response
<Response [200]>
printing response
<Response [200]>


Your prior question was: "Tell me some use cases for reinforcement learning."

In [26]:
# Remembers
printmd(chain_with_history.invoke(
    {"question": "Can you tell me a one line summary of our conversation?"},
    config=config))

printing response
<Response [200]>
printing response
<Response [200]>


You asked for use cases of reinforcement learning, and I provided examples across various domains including robotics, autonomous vehicles, healthcare, finance, gaming, industrial automation, natural language processing, and energy management.

In [27]:
try:
    printmd(chain_with_history.invoke(
    {"question": "Thank you very much!"},
    config=config))
except Exception as e:
    print(e)

printing response
<Response [200]>
printing response
<Response [200]>


You're welcome! If you have any more questions or need further assistance, feel free to ask. Have a great day!

In [28]:
printmd(chain_with_history.invoke(
    {"question": "I do have one more question, why did you give me a one line summary?"},
    config=config))

printing response
<Response [200]>
printing response
<Response [200]>


You asked for a one-line summary of our conversation, so I provided it to concisely encapsulate the main points we discussed. If you have any other questions or need more detailed information, feel free to ask!

In [29]:
printmd(chain_with_history.invoke(
    {"question": "why not 2?"},
    config=config))

printing response
<Response [200]>
printing response
<Response [200]>


I provided a one-line summary because that's what you specifically requested. If you would like a more detailed summary or additional information, I'm happy to provide that as well. Just let me know what you need!

#### Let's check our Azure CosmosDB to see the whole conversation


![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input. This doesn't seem efficient, but regardless, we are very close to finish our first RAG-talk to your data Bot.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models (1106 or newer), we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it struggles sometimes to follow them accurately (not too reliable). Additionally, its limited "memory" (token context) can make sustained conversations challenging. Its response are also simple not deep.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills, consistently adheres to instructions and its answers are beter. It has extended memory retention (larger context size) for instructions, and it excels at following them. Its responses are deep and thorough.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

In the next notebook 6, we are going to build our first RAG bot. In order to do this we will introduce the concept of Agents.