# Raw RAG 07: Total Recall (Memory)

## Introduction

Memory is a cornerstone of intelligence, whether we're talking about human cognition or artificial systems. In the realm of Retrieval-Augmented Generation (RAG), implementing effective memory mechanisms can significantly enhance the quality and coherence of AI-generated responses. This notebook explores a fundamental approach to incorporating memory into your RAG system.

### What We'll Cover

In this notebook, we'll demonstrate a straightforward method to implement memory in Python. Our approach builds upon concepts introduced in earlier levels, particularly the summarization techniques from Level 06. Here's what you can expect:

1. **Basic Memory Process**: We'll implement a simple yet effective memory system that summarizes previous interactions and uses them as context for future responses.

2. **Similarities to Summarization**: You'll see how this memory method relates to the summarization techniques we've explored before, creating a bridge between different RAG components.

3. **Foundation for Advanced Techniques**: While we're starting with a basic implementation, this serves as a stepping stone towards more sophisticated memory systems.

### Important Considerations

As you work through this notebook, keep the following points in mind:

- **Starting Point**: This implementation represents a basic memory process. It's an excellent foundation, but there's room for significant enhancement.

- **Future Improvements**: We'll briefly touch on the potential for embedded search in long-term memory, which can greatly expand the capabilities of your RAG system.

- **Enhancing Accuracy**: By incorporating metadata and other contextual information, you can further improve the accuracy and relevance of retrieved memories.

- **Context Window Limitations**: Our current approach is designed with typical context window constraints in mind. As language models evolve and context windows expand, our strategies for memory management may shift dramatically.

### Looking Ahead

While this notebook focuses on a fundamental memory implementation, the field of AI is rapidly evolving. We may soon see developments that allow for much larger context windows at lower costs, potentially revolutionizing how we approach memory in RAG systems. Until then, mastering these basic techniques will provide a solid foundation for building more advanced and efficient AI systems.

Let's dive in and explore how we can give our RAG system a memory, enhancing its ability to maintain context and generate more coherent, contextually relevant responses!

In [1]:
%pip install openai python-dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Load the environment variables from the .env file

from dotenv import load_dotenv
import os

# Specify the path to your .env file if it's not in the same directory
dotenv_path = ".env"
load_dotenv(dotenv_path=dotenv_path)

True

In [3]:
from openai import OpenAI

client = OpenAI()

In [4]:
# Create basic memory method for ChatGPT API calls

def initialize_conversation():
    return [
        {
            "role": "system",
            "content": "You are a helpful assistant. You are being provided with the conversation history for context. Each new interaction builds upon this history.",
        }
    ]


def add_to_history(history, role, content):
    history.append({"role": role, "content": content})
    return history


def get_response(history, user_input, model="gpt-4-turbo"):
    # Add user input to the conversation history
    history = add_to_history(history, "user", user_input)

    # Make the API call
    response = client.chat.completions.create(model=model, messages=history)

    # Extract the assistant's reply
    assistant_reply = response.choices[0].message.content

    # Add the assistant's reply to the conversation history
    history = add_to_history(history, "assistant", assistant_reply)

    return history, assistant_reply


def display_conversation(history):
    for message in history[1:]:  # Skip the system message
        print(f"{message['role'].capitalize()}: {message['content']}")


In [5]:
def chat_with_questions(questions):
    conversation_history = initialize_conversation()

    for i, question in enumerate(questions, 1):
        print(f"Question {i}: {question}")
        conversation_history, response = get_response(conversation_history, question)
        print(f"Answer {i}: {response}\n")

    return conversation_history


questions = [
    "Are you familiar with the novel The Dead by James Joyce?",
    "Please give me a 500 words summary of the novel.",
    "What does the main character realize at the end of the story?",
]

final_history = chat_with_questions(questions)

print("Full Conversation History:")
for message in final_history[1:]:  
    print(f"{message['role'].capitalize()}: {message['content']}")

Question 1: Are you familiar with the novel The Dead by James Joyce?
Answer 1: Yes, I am familiar with "The Dead." It's actually a novella and the final story in James Joyce's collection titled *Dubliners*, which was published in 1914. "The Dead" is considered one of Joyce's most famous works of short fiction, noted for its intricate depiction of social life and the intricacies of human feelings and relationships. The story takes place in early January in Dublin and centers around Gabriel Conroy, who attends a festive gathering with his wife Gretta. The evening culminates in a poignant revelation that leads Gabriel to deeper reflections on his life, love, and the inevitability of death. Would you like to dive into some specific aspects of the story?

Question 2: Please give me a 500 words summary of the novel.
Answer 2: "The Dead," the longest and final story in James Joyce's *Dubliners*, masterfully encapsulates themes of lost opportunities and the subtle but profound revelations of e

Working in Progress: Embedding long term memory retrieval system.

## Conclusion: The Power of Memory in RAG Systems

As demonstrated throughout our experiments, the implementation of memory in our RAG system has significantly enhanced its capabilities. Even without explicit keywords in subsequent prompts, our LLM was able to maintain context and provide accurate, relevant responses to follow-up questions about the novel.

Key takeaways from this notebook:

1. **Context Retention**: Memory allows the system to maintain important information across multiple interactions.
2. **Improved Coherence**: Responses become more coherent and contextually appropriate over the course of a conversation.
3. **Efficiency**: By leveraging memory, we reduce the need for repetitive information in prompts, streamlining the interaction process.
4. **Foundation for Advanced Techniques**: This basic implementation serves as a stepping stone towards more sophisticated memory management in RAG systems.

While this notebook provides a solid foundation for implementing memory in RAG systems, it's important to note that this is an evolving field with immense potential for further development. Future iterations may explore:

- Long-term memory storage and retrieval techniques
- Integration of metadata for more nuanced context understanding
- Adaptive memory management based on conversation relevance
- Optimization for expanding context windows in newer language models

As we continue to refine and expand upon these concepts, the capabilities of our RAG systems will undoubtedly grow, leading to even more powerful and intuitive AI-assisted interactions.

Stay tuned for future updates and enhancements to this notebook as we delve deeper into the exciting world of memory-augmented RAG systems!