# Introduction to Memory in Semantic Kernel

In AI applications, memory is crucial for creating contextual, personalized experiences. Semantic Kernel provides powerful memory management capabilities that allow your AI applications to:

- Remember facts and knowledge over time
- Find information based on meaning rather than exact matches
- Use previous context in ongoing conversations
- Implement Retrieval-Augmented Generation (RAG) patterns



This notebook explores how to implement and use memory capabilities in Semantic Kernel applications. 


Let's visualize how memory fits into the Semantic Kernel architecture:

In [None]:
%pip install -U "semantic-kernel[azure]==1.27.2" python-dotenv==1.0.1 mermaid-py==0.7.1 --quiet

In [None]:
import mermaid as md
from mermaid.graph import Graph

In [None]:
sequence = Graph(
    "Sequence-diagram",
    """
graph TD
    A[Application] --> B[Kernel]
    B --> C[AI Models]
    B --> D[Memory System]
    B --> E[Plugins]
    D --> F[Short-term Memory]
    D --> G[Long-term Memory]
    G --> H[Vector Embeddings]
    G --> I[Memory Store]
    I --> J[Volatile Store]
    I --> K[Persistent Store]
    style D fill:#f9d5e5,stroke:#333,stroke-width:2px
""",
)
md.Mermaid(sequence)


## Memory
In SK, **memory** refers to the storage and recall of information the AI has learned or been provided. There are two primary forms of memory:

### Semantic Memory (Long-term)
- This is usually an **external vector store** that holds **embeddings of text**, allowing the AI to store facts or documents and later retrieve them by **semantic similarity**.
- SK provides **Memory Connectors** to various vector databases (like **Azure Cognitive Search, Pinecone, Qdrant**, etc.) via a common interface.
- By using a memory store, you can implement the **retrieval** part of **RAG**: store chunks of knowledge and fetch relevant pieces at query time.
- We’ll see how to add and use such memory in our chatbot.

### Conversation History (Short-term Memory)
- SK also manages the **immediate dialogue context** with a **Chat History object** for multi-turn conversations.
- This ensures the AI remembers prior user queries and its own responses, maintaining context across turns.
- We will leverage this to keep the conversation coherent.

In [None]:
from semantic_kernel import __version__

print(__version__)

In [None]:
from semantic_kernel.connectors.ai.open_ai.services.azure_chat_completion import (
    AzureChatCompletion,
)
from semantic_kernel.connectors.ai.open_ai.services.azure_text_embedding import (
    AzureTextEmbedding,
)
from semantic_kernel.core_plugins.text_memory_plugin import TextMemoryPlugin
from semantic_kernel.kernel import Kernel
from semantic_kernel.memory.semantic_text_memory import SemanticTextMemory
from semantic_kernel.memory.volatile_memory_store import VolatileMemoryStore
import os

from dotenv import load_dotenv

load_dotenv("../.env", override=True)

In [None]:
embedding_deployment_name = os.getenv(
    "AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-ada-002"
)
api_key = os.getenv("AZURE_OPENAI_API_KEY")
base_url = os.getenv("AZURE_OPENAI_ENDPOINT")

# Create the embedding service
embedding_service = AzureTextEmbedding(
    endpoint=base_url, deployment_name=embedding_deployment_name, api_key=api_key
)

This cell creates an embedding service that connects to Azure OpenAI. This service will convert text into vector embeddings which are numerical representations that capture semantic meaning. The environment variables should be set in your `.env` file

In [None]:
memory = SemanticTextMemory(
    storage=VolatileMemoryStore(), embeddings_generator=embedding_service
)

Here we initialize our semantic memory system with:

- A VolatileMemoryStore - an in-memory vector database (data will be lost when your session ends)
- The embedding service we created earlier, which will generate vector embeddings for text


In [None]:
collection_id = "generic"


async def populate_memory(memory: SemanticTextMemory) -> None:
    # Add some documents to the semantic memory
    await memory.save_information(
        collection=collection_id, id="info1", text="Your budget for 2024 is $100,000"
    )
    await memory.save_information(
        collection=collection_id, id="info2", text="Your savings from 2023 are $50,000"
    )
    await memory.save_information(
        collection=collection_id, id="info3", text="Your investments are $80,000"
    )


await populate_memory(memory)

This function adds information to our memory store. Each memory item consists of:

- `collection`: A namespace for organizing related memories (like a database table)
- `id`: A unique identifier for retrieving specific memories
- `text`: The actual information to store


When we save information, Semantic Memory:

1. Generates an embedding vector for the text
2. Stores both the text and its vector in the memory store
3. Associates it with the given ID and collection

In [None]:
async def search_memory_examples(memory: SemanticTextMemory) -> None:
    questions = [
        "What is my budget for 2024?",
        "What are my savings from 2023?",
        "What are my investments?",
    ]

    for question in questions:
        print(f"Question: {question}")
        result = await memory.search(collection_id, question)
        print(f"Answer: {result[0].text}\n")

In [None]:
await search_memory_examples(memory)

---
### How does semantic search work?

1. We provide a natural language query (e.g., "What is my budget for 2024?")
2. The memory system:
   - Converts the query to a vector embedding
   - Compares this vector against stored embeddings using cosine similarity
   - Returns the closest matching results
   
The search works even if the query doesn't exactly match the stored text, as it finds semantically similar content.

---

### Exercise: Adding and Retrieving Custom Memories

Try adding your own information to the memory and retrieving it with semantic search.

1. Create a new collection called "personal"
2. Add at least three facts about a fictional person
3. Search for those facts using natural language queries

<details>
  <summary>Click to see solution</summary>
  
```python
# Create a new collection
personal_collection = "personal"

# Add information to memory
async def add_personal_info(memory):
    await memory.save_information(collection=personal_collection, id="fact1", text="John was born in Seattle in 1980")
    await memory.save_information(collection=personal_collection, id="fact2", text="John graduated from University of Washington in 2002")
    await memory.save_information(collection=personal_collection, id="fact3", text="John has two children named Alex and Sam")

await add_personal_info(memory)

# Search for information
questions = [
    "Where was John born?",
    "When did John graduate college?",
    "Does John have kids?"
]

for question in questions:
    print(f"Question: {question}")
    result = await memory.search(personal_collection, question)
    print(f"Answer: {result[0].text}\n")

In [None]:
# Your code goes here

# Create a new collection

# Add information to memory

# Search for information


## Kernel Setup

This code creates a new Kernel instance and adds both:

1. A chat completion service for generating responses
2. The embedding service we created earlier for vector operations

This configuration allows the kernel to generate text and work with vector embeddings in memory operations.

In [None]:
from semantic_kernel.kernel import Kernel
import os
from semantic_kernel.connectors.ai.open_ai.services.azure_chat_completion import (
    AzureChatCompletion,
)
from dotenv import load_dotenv

load_dotenv("../.env", override=True)

kernel = Kernel()

deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")
api_key = os.getenv("AZURE_OPENAI_API_KEY")
base_url = os.getenv("AZURE_OPENAI_ENDPOINT")

chat_completion = AzureChatCompletion(
    endpoint=base_url,
    deployment_name=deployment_name,
    api_key=api_key,
    service_id="chat",
)

kernel.add_service(chat_completion)

# we also add the embedding service to the kernel
kernel.add_service(embedding_service)

In [None]:
from semantic_kernel.core_plugins.text_memory_plugin import TextMemoryPlugin
from semantic_kernel.memory.semantic_text_memory import SemanticTextMemory
from semantic_kernel.memory.volatile_memory_store import VolatileMemoryStore

In [None]:
memory = SemanticTextMemory(
    storage=VolatileMemoryStore(), embeddings_generator=embedding_service
)
kernel.add_plugin(TextMemoryPlugin(memory), "TextMemoryPlugin")

Here we:
1. Create a `SemanticTextMemory` object with our in-memory store and embedding service
2. Add the `TextMemoryPlugin` to the kernel, which provides memory-related functions

The `TextMemoryPlugin` exposes memory operations to the kernel, allowing:
- Semantic search through the `recall` function
- Saving new information during conversations
- Integration of memory capabilities into AI responses


----

Now we set up a chat function that incorporates memory:

1. We define a prompt template that:
   - Gives the AI a role and instructions
   - Uses the `{{recall '...'}}` syntax to search memory for relevant information
   - Includes the user's request via `{{$request}}`

2. We create a kernel function from this template

The `{{recall 'query'}}` syntax tells Semantic Kernel to:
1. Search the memory for information relevant to the query
2. Insert the retrieved information into the prompt
3. Let the AI use this information in its response

This creates a chatbot that can reference previously stored financial information.

In [None]:
from semantic_kernel.functions import KernelFunction
from semantic_kernel.prompt_template import PromptTemplateConfig


    # - {{recall 'budget by year'}} What is my budget for 2024?
    # - {{recall 'savings from previous year'}} What are my savings from 2023?
    # - {{recall 'investments'}} What are my investments?
async def setup_chat_with_memory(
    kernel: Kernel,
    service_id: str,
) -> KernelFunction:
    prompt = """
    ChatBot can have a conversation with you about any topic.
    It can give explicit instructions or say 'I don't know' if
    it does not have an answer.

    Information about me, from previous conversations:
    - {{recall 'budget by year'}}
    - {{recall 'savings from previous year'}}
    - {{recall 'investments'}}

    {{$request}}
    """.strip()

    prompt_template_config = PromptTemplateConfig(
        template=prompt,
        execution_settings={
            service_id: kernel.get_service(
                service_id
            ).get_prompt_execution_settings_class()(service_id=service_id)
        },
    )

    return kernel.add_function(
        function_name="chat_with_memory",
        plugin_name="chat",
        prompt_template_config=prompt_template_config,
    )

In [None]:
print("Populating memory...")
await populate_memory(memory)

print("Asking questions... (manually)")
await search_memory_examples(memory)

print("Setting up a chat (with memory!)")
chat_func = await setup_chat_with_memory(kernel, "chat")

print("Begin chatting (type 'exit' to exit):\n")
print(
    "Welcome to the chat bot!\
    \n  Type 'exit' to exit.\
    \n  Try asking a question about your finances (i.e. \"talk to me about my finances\")."
)


async def chat(user_input: str):
    print(f"User: {user_input}")
    answer = await kernel.invoke(chat_func, request=user_input)
    print(f"ChatBot:> {answer}")

In [None]:
await chat("What is my budget for 2024?")

In [None]:
await chat("talk to me about my finances")

In [None]:
kernel.remove_all_services()


kernel.add_service(chat_completion)

# we also add the embedding service to the kernel
kernel.add_service(embedding_service)

## Retrieval-Augmented Generation (RAG) with Self-Critique

This section demonstrates a powerful pattern combining memory retrieval with response evaluation:

1. **RAG Prompt**: This prompt template:
   - Retrieves information from memory relevant to the user's question
   - Provides this context to the AI
   - Uses the context to generate an informed answer

2. **Self-Critique**: This second prompt evaluates the quality of RAG responses:
   - Takes the original question, retrieved context, and generated answer
   - Classifies the answer as "Grounded", "Ungrounded", or "Unclear"
   - Helps ensure responses are properly using retrieved information

This pattern creates more reliable AI responses by:
- Providing relevant facts from memory
- Checking if responses properly use this information
- Identifying when responses make claims beyond available information

In [None]:
import asyncio
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
    AzureChatCompletion,
    AzureTextEmbedding,
)
from semantic_kernel.connectors.memory.azure_cognitive_search import (
    AzureCognitiveSearchMemoryStore,
)
from semantic_kernel.connectors.memory.azure_cognitive_search.azure_ai_search_settings import (
    AzureAISearchSettings,
)
from semantic_kernel.contents import ChatHistory
from semantic_kernel.core_plugins import TextMemoryPlugin
from semantic_kernel.memory import SemanticTextMemory


COLLECTION_NAME = "generic"


async def populate_memory(memory: SemanticTextMemory) -> None:
    # Add some documents to the ACS semantic memory
    await memory.save_information(COLLECTION_NAME, id="info1", text="My name is Andrea")
    await memory.save_information(
        COLLECTION_NAME, id="info2", text="I currently work as a tour guide"
    )
    await memory.save_information(
        COLLECTION_NAME, id="info3", text="I've been living in Seattle since 2005"
    )
    await memory.save_information(
        COLLECTION_NAME,
        id="info4",
        text="I visited France and Italy five times since 2015",
    )
    await memory.save_information(
        COLLECTION_NAME, id="info5", text="My family is from New York"
    )


azure_ai_search_settings = AzureAISearchSettings.create(
    endpoint=os.getenv("AZURE_AI_SEARCH_ENDPOINT"),
    api_key=os.getenv("AZURE_AI_SEARCH_API_KEY"),
)
vector_size = 1536


acs_connector = AzureCognitiveSearchMemoryStore(
    vector_size=vector_size,
    search_endpoint=azure_ai_search_settings.endpoint,
    admin_key=azure_ai_search_settings.api_key,
)

memory = SemanticTextMemory(
    storage=acs_connector, embeddings_generator=embedding_service
)
kernel.add_plugin(TextMemoryPlugin(memory), "TextMemoryPlugin")

print("Populating memory...")
await populate_memory(memory)

In [None]:
"It can give explicit instructions or say 'I don't know' if it does not have an answer."

sk_prompt_rag = """
Assistant can have a conversation with you about any topic.

Here is some background information about the user that you should use to answer the question below:
{{ recall $user_input }}
User: {{$user_input}}
Assistant: """.strip()

user_input = "Do I live in Seattle?"
print(f"Question: {user_input}")
req_settings = kernel.get_prompt_execution_settings_from_service_id(service_id="chat")
chat_func = kernel.add_function(
    function_name="rag",
    plugin_name="RagPlugin",
    prompt=sk_prompt_rag,
    prompt_execution_settings=req_settings,
)

chat_history = ChatHistory()
chat_history.add_user_message(user_input)

answer = await kernel.invoke(
    chat_func,
    user_input=user_input,
    chat_history=chat_history,
)
chat_history.add_assistant_message(str(answer))
print(f"Answer: {str(answer).strip()}")

In [None]:
sk_prompt_rag_sc = """
You will get a question, background information to be used with that question and a answer that was given.
You have to answer Grounded or Ungrounded or Unclear.
Grounded if the answer is based on the background information and clearly answers the question.
Ungrounded if the answer could be true but is not based on the background information.
Unclear if the answer does not answer the question at all.
Question: {{$rag_output}}
Background: {{ recall $rag_output }}
Answer: {{ $input }}
Remember, just answer Grounded or Ungrounded or Unclear: """.strip()


self_critique_func = kernel.add_function(
    function_name="self_critique_rag",
    plugin_name="RagPlugin",
    prompt=sk_prompt_rag_sc,
    prompt_execution_settings=req_settings,
)


print(f"Answer: {str(answer).strip()}")
check = await kernel.invoke(
    self_critique_func, rag_output=answer, input=answer, chat_history=chat_history
)
print(f"The answer was {str(check).strip()}")

print("-" * 50)
print("   Let's pretend the answer was wrong...")
print(f"Answer: {str(answer).strip()}")
check = await kernel.invoke(
    self_critique_func,
    input=answer,
    rag_output="Yes, you live in New York City.",
    chat_history=chat_history,
)
print(f"The answer was {str(check).strip()}")

print("-" * 50)
print("   Let's pretend the answer is not related...")
print(f"Answer: {str(answer).strip()}")
check = await kernel.invoke(
    self_critique_func,
    input="Yes, the earth is not flat.",
    rag_output=answer,
    chat_history=chat_history,
)
print(f"The answer was {str(check).strip()}")

In [None]:
kernel.remove_all_services()


kernel.add_service(chat_completion)

# we also add the embedding service to the kernel
kernel.add_service(embedding_service)

kernel.add_plugin(TextMemoryPlugin(memory), "TextMemoryPlugin")

### Exercise: Build a Fact-Checking System

Create a system that:
1. Retrieves information from memory about a topic
2. Generates a response based on the retrieved information
3. Evaluates whether the response is factual based on the retrieved information

<details>
  <summary>Click to see solution</summary>
  
```python
# 1. Add some factual information to memory
facts_collection = "facts"
async def add_facts(memory):
    await memory.save_information(facts_collection, "earth", "Earth is the third planet from the Sun and orbits at an average distance of 93 million miles.")
    await memory.save_information(facts_collection, "moon", "The Moon is Earth's only natural satellite and orbits at an average distance of 238,855 miles.")
    await memory.save_information(facts_collection, "mars", "Mars is the fourth planet from the Sun and is often called the 'Red Planet' due to its reddish appearance.")

await add_facts(memory)

# 2. Create the fact retrieval prompt
fact_prompt = """
You are a scientific information system that provides accurate facts. You are not allowed to make up information or provide opinions.
You will be given a question, and you need to provide the most relevant information from your database.

Here is information relevant to the question:
{{ recall $question collection='facts' }}

Question: {{$question}}
Answer:
""".strip()

# 3. Create the fact checker prompt
checker_prompt = """
You are a fact-checker evaluating if answers are supported by provided information.

INFORMATION: {{ recall $question collection='facts' }}
QUESTION: {{$question}}
ANSWER: {{$answer}}

Evaluate if the answer is:
- ACCURATE: Fully supported by the information and directly answers the question
- PARTIALLY ACCURATE: Some statements are supported but others go beyond the information
- INACCURATE: Contains claims contrary to or unsupported by the information

Your assessment:
""".strip()

# 4. Create the functions
req_settings = kernel.get_prompt_execution_settings_from_service_id(service_id="chat")
fact_func = kernel.add_function(
    function_name="get_fact",
    plugin_name="FactSystem",
    prompt=fact_prompt,
    prompt_execution_settings=req_settings
)

checker_func = kernel.add_function(
    function_name="check_fact",
    plugin_name="FactSystem",
    prompt=checker_prompt,
    prompt_execution_settings=req_settings
)

# 5. Test the system
async def check_fact(question):
    # Get the fact
    answer = await kernel.invoke(fact_func, question=question)
    print(f"Question: {question}")
    print(f"Answer: {answer}\n")
    
    # Check the fact
    assessment = await kernel.invoke(checker_func, question=question, answer=str(answer))
    print(f"Assessment: {assessment}")
    print("-" * 50)

# Test with questions
await check_fact("What planet is Earth in our solar system?")
await check_fact("How far is the Moon from Earth?")

In [None]:
## Solution goes here

# 1. Add some factual information to memory

# 2. Create the fact retrieval prompt

# 3. Create the fact checker prompt

# 4. Create the functions

# 5. Test the system


## Using Persistent Memory with Azure AI Search

For production applications, you'll want to use a persistent memory store rather than the in-memory `VolatileMemoryStore`. Azure AI Search (formerly Cognitive Search) provides a powerful, scalable vector database for this purpose.

This code demonstrates how to:
1. Connect to Azure AI Search
2. Use it as a persistent memory store
3. Add information that will persist beyond the current session

Key differences from the in-memory approach:
- Information persists across application restarts
- Supports much larger datasets (millions of entries)
- Provides additional filtering and hybrid search capabilities
- Requires valid Azure credentials and resources

---

In summary, the **Semantic Kernel Intro** section covers the following key points:
- An overview of SK and its importance in bridging AI and application code.
- Core components like the Kernel, AI services, plugins, context, planner, and memory.
- How functions (semantic and native) are organized into plugins.
- The power of automatic function calling for multi-step AI reasoning.
- The role of filters in ensuring secure and validated execution.
- Detailed memory management for integrating external knowledge into AI workflows.
