# Memory Management - Part 2 - Memory

- ‚úÖ Initialize MemoryService and integrate with your agent
- ‚úÖ Transfer session data to memory storage
- ‚úÖ Search and retrieve memories
- ‚úÖ Automate memory storage and retrieval
- ‚úÖ Understand memory consolidation (conceptual overview)


## Session vs Memory

> **Session = Short-term memory** (single conversation)

> **Memory = Long-term knowledge** (across multiple conversations)

Memory provides capabilities that Sessions alone cannot:

| Capability | What It Means | Example |
|------------|---------------|---------|
| **Cross-Conversation Recall** | Access information from any past conversation | "What preferences has this user mentioned across all chats?" |
| **Intelligent Extraction** | LLM-powered consolidation extracts key facts | Stores "allergic to peanuts" instead of 50 raw messages |
| **Semantic Search** | Meaning-based retrieval, not just keyword matching | Query "preferred hue" matches "favorite color is blue" |
| **Persistent Storage** | Survives application restarts | Build knowledge that grows over time |

**Example:** Imagine talking to a personal assistant:
- üó£Ô∏è **Session**: They remember what you said 10 minutes ago in THIS conversation
- üß† **Memory**: They remember your preferences from conversations LAST WEEK


Note: 
- This notebook uses `InMemoryMemoryService` for learning - it performs keyword matching and doesn't persist data. 
- For production applications, use **Vertex AI Memory Bank** (covered in Day 5), which provides LLM-powered consolidation and semantic search with persistent cloud storage.


## SessionService & MemoryService & & Agent & Runner

- session_service: per-conversation working state (messages, scratch, cached tool results)
- memory_service: curated cross-conversation memories (selective, policy-gated)
- agent: policy/brain; given state, emits next action(s) (respond, ask, tool call, plan)
- runner: orchestrator; enforces budgets/policy, executes actions/tools, updates session, proposes/commits memory

**one specific example**
- Runner: "Here's user input + context. What do you want to do?"
- Agent:  "Call tool X with args Y"
- Runner: [executes tool X, gets result]
- Runner: "Tool returned Z. Now what?"
- Agent:  "Respond to user with this message"
- Runner: [sends response, updates session/memory]

---
# 1. Setup

In [1]:
import os
from dotenv import load_dotenv

try:
    load_dotenv()
    print("‚úÖ Gemini API key setup complete.")
except Exception as e:
    print(
        f"üîë Authentication Error: Please make sure you have added 'GOOGLE_API_KEY' to your Kaggle secrets. Details: {e}"
    )

from google.adk.agents import LlmAgent
from google.adk.models.google_llm import Gemini
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.memory import InMemoryMemoryService
from google.adk.tools import load_memory, preload_memory
from google.genai import types

print("‚úÖ ADK components imported successfully.")


retry_config = types.HttpRetryOptions(
    attempts=5,  # Maximum retry attempts
    exp_base=7,  # Delay multiplier
    initial_delay=1,
    http_status_codes=[429, 500, 503, 504],  # Retry on these HTTP errors
)
print("‚úÖ Specified retry_config.")

‚úÖ Gemini API key setup complete.
‚úÖ ADK components imported successfully.
‚úÖ Specified retry_config.


### Helper functions

In [2]:
async def run_session(
    runner_instance: Runner,
    session_service: InMemorySessionService, 
    user_id: str,
    user_queries: list[str] | str, 
    session_id: str = "default"
):
    """Helper function to run queries in a session and display responses."""
    print(f"\n### Session: {session_id}")

    app_name = runner_instance.app_name

    # Create or retrieve session
    try:
        session = await session_service.create_session(
            app_name=app_name,  user_id=user_id, session_id=session_id
        )
    except:
        session = await session_service.get_session(
            app_name=app_name, user_id=user_id, session_id=session_id
        )

    # Convert single query to list
    if isinstance(user_queries, str):
        user_queries = [user_queries]

    # Process each query
    for query in user_queries:
        print(f"\nUser > {query}")
        query_content = types.Content(role="user", parts=[types.Part(text=query)])

        # Stream agent response
        async for event in runner_instance.run_async(
            user_id=user_id, session_id=session.id, new_message=query_content
        ):
            if event.is_final_response() and event.content and event.content.parts:
                text = event.content.parts[0].text
                if text and text != "None":
                    print(f"Model: > {text}")


print("‚úÖ Helper functions defined.")

‚úÖ Helper functions defined.


---
# Xing's Summary

In [3]:
# tools 
# ------------------------------------------------------------------
async def auto_save_to_memory(callback_context):
    """Automatically save session to memory after each agent turn."""
    await callback_context._invocation_context.memory_service.add_session_to_memory(
        callback_context._invocation_context.session
    )


print("‚úÖ Tool - callback created.")


# key components
# ------------------------------------------------------------------
USER_ID = 'p_li_mom'
APP_NAME = 'p_li_with_auto_load_save_memory'

# 1. Session Service
session_service = InMemorySessionService()  # Handles conversations

# 2. Memory Service
memory_service = InMemoryMemoryService()

# 3. Agent 
user_agent = LlmAgent(
    model=Gemini(model="gemini-2.5-flash-lite", retry_options=retry_config),
    name="MemoryDemoAgent",
    instruction="Answer user questions in simple words. You may need to use the load_memory tool first to check for relevant context about the user.",
    tools = [load_memory], # or preload_memory (load_ is reactive up to agent's judgement while preload is proactive)
    after_agent_callback=auto_save_to_memory,  # Optional, and adding this allows for auto memory saving after each call to agent
    
)

# 4. Runner
runner = Runner(
    agent=user_agent,
    app_name=APP_NAME,
    session_service=session_service,
    memory_service=memory_service, 
)

print("‚úÖ Core components created.")

‚úÖ Tool - callback created.
‚úÖ Core components created.


In [4]:
common_kargs = {'runner_instance': runner,
                'session_service': session_service,
                'user_id': USER_ID,
                }

await run_session(
    user_queries="my dog's name is pumpkin li. it is a labradoodle. it is 4",
    session_id = "session_ONE",
    **common_kargs
)

# manually load session to memory
# session = await session_service.get_session(app_name = APP_NAME, user_id = USER_ID, session_id = "p_li_session_1")
# await memory_service.add_session_to_memory(session)

await run_session(
    user_queries = "tell me more about my dog",
    session_id = "session_TWO", # !!Different from one
    **common_kargs
)


### Session: session_ONE

User > my dog's name is pumpkin li. it is a labradoodle. it is 4
Model: > Thanks for sharing! Pumpkin Li sounds like a lovely labradoodle. Is there anything I can help you with regarding Pumpkin Li?

### Session: session_TWO

User > tell me more about my dog


  async for event in agen:


Model: > Pumpkin Li is a labradoodle and is 4 years old.


In [5]:
# more detailed api usage to save and search memory
search_response = await memory_service.search_memory(
    app_name=APP_NAME, user_id=USER_ID, query="dog"
)

print("üîç Search Results:")
print(f"  Found {len(search_response.memories)} relevant memories")
print()

for memory in search_response.memories:
    if memory.content and memory.content.parts:
        text = memory.content.parts[0].text[:80]
        print(f"  [{memory.author}]: {text}...")

üîç Search Results:
  Found 2 relevant memories

  [user]: my dog's name is pumpkin li. it is a labradoodle. it is 4...
  [user]: tell me more about my dog...


---
# 2. <span style="color:blue">Manual</span> Memory Workflow

Three steps to integrate Memory into your Agents:

1. **Initialize** ‚Üí Create a `MemoryService` and provide it to your agent via the `Runner`
2. **<span style="color:blue">Ingest (MANUAL)</span>** ‚Üí Transfer session data to memory using `add_session_to_memory()`
3. **Retrieve** -> add `load_memory` or `preload_memory` to your agent. 
    - for manual retrival, use `memory_service.search_memory()`

**`load_memory` (Reactive)**
- Pros: Efficieny (saves token) by letting agent decide when to search memory
- Cons: Agent might forget to search

**`preload_memory` (Proactive)**
- Pros: guaranteed context by automatically search before every turn, making memory always available to the agent
- Cons: less efficient - searches even when not needed


## Initialize MemoryService + Add `load_memory` to Agent for Retrival

‚ÄºÔ∏è Adding `memory_service` to the `Runner` makes memory *available* to agent. To use it, explicitly 

1. **Ingest data** using `add_session_to_memory()` 
2. **Enable retrieval** by giving your agent memory tools (`load_memory` or `preload_memory`)

In [6]:
# Define constants used throughout the notebook
APP_NAME = "MemoryDemoApp"
USER_ID = "demo_user"

# initialize memory service 
memory_service = (
    InMemoryMemoryService()
)  # ADK's built-in Memory Service for development and testing

# Create Session Service
session_service = InMemorySessionService()  # Handles conversations

# Create agent
user_agent = LlmAgent(
    model=Gemini(model="gemini-2.5-flash-lite", retry_options=retry_config),
    name="MemoryDemoAgent",
    instruction="Answer user questions in simple words. Use load_memory tool if you need to recall past conversations.",
    tools=[
        preload_memory # use load_memory might fail to retrieve if agent decides to skip
    ],  
)

# Create runner with BOTH services
runner = Runner(
    agent=user_agent,
    app_name=APP_NAME,
    session_service=session_service,
    memory_service=memory_service,  # Memory service is now available!
)

---
# 2.2 (After running session) Ingest Session Data into Memory

In [7]:
# User tells agent about their favorite color
common_kargs = {'runner_instance': runner,
                'session_service': session_service,
                'user_id': USER_ID,
                }
                
await run_session(
    **common_kargs,
    user_queries="My dog's name is pumpkin li. she is the best.",
    session_id="conversation-01",  
)

# verify the conversation was captured in the session. You should see the session events containing both the user's prompt and the model's response.
session = await session_service.get_session(
    app_name=APP_NAME, user_id=USER_ID, session_id="conversation-01"
)

# see what's in the session
print("\nüìù Session contains:")
for event in session.events:
    text = (
        event.content.parts[0].text[:60]
        if event.content and event.content.parts
        else "(empty)"
    )
    print(f"  {event.content.role}: {text}...")


### Session: conversation-01

User > My dog's name is pumpkin li. she is the best.
Model: > That's a great name for a dog! Pumpkin Li sounds like a very special pup. Is there anything I can help you with regarding Pumpkin Li today?

üìù Session contains:
  user: My dog's name is pumpkin li. she is the best....
  model: That's a great name for a dog! Pumpkin Li sounds like a very...


call `add_session_to_memory()` and pass the session object. This ingests the conversation into the memory store, making it available for future searches.

In [8]:
# ingest to memory
await memory_service.add_session_to_memory(session)

# check if retrive from memory
await run_session(**common_kargs, user_queries="describe my dog", session_id = "conversation-02")


### Session: conversation-02

User > describe my dog
Model: > You told me your dog's name is Pumpkin Li. You also mentioned that she is the best!


### Manual Memory Search

`search_memory()` method takes a text query and returns a `SearchMemoryResponse` with matching memories.

In [9]:
# alternative retrival: explicitly through memory_service
search_response = await memory_service.search_memory(
    app_name=APP_NAME, user_id=USER_ID, query="What's my dogs name"
)

print("üîç Search Results:")
print(f"  Found {len(search_response.memories)} relevant memories")
print()

for memory in search_response.memories:
    if memory.content and memory.content.parts:
        text = memory.content.parts[0].text[:80]
        print(f"  [{memory.author}]: {text}...")

üîç Search Results:
  Found 2 relevant memories

  [user]: My dog's name is pumpkin li. she is the best....
  [MemoryDemoAgent]: That's a great name for a dog! Pumpkin Li sounds like a very special pup. Is the...


---
# 3. Automatic Memory Workflow
Previous: **manually** called `add_session_to_memory()` to transfer data to long-term storage. 

Now: use callbacks to automate

### Callbacks

Callbacks are **Python functions** you define and attach to agents - ADK automatically calls them at specific stages, acting like checkpoints during the agent's execution flow.


**Available callback types:**

- `before_agent_callback` ‚Üí Runs before agent starts processing a request
- `after_agent_callback` ‚Üí Runs after agent completes its turn  
- `before_tool_callback` / `after_tool_callback` ‚Üí Around tool invocations
- `before_model_callback` / `after_model_callback` ‚Üí Around LLM calls
- `on_model_error_callback` ‚Üí When errors occur

**Common use cases:**

- Logging and observability (track what the agent does)
- Automatic data persistence (like saving to memory)
- Custom validation or filtering
- Performance monitoring

### Automatic Memory Storage with Callbacks

For automatic memory storage, use `after_agent_callback`

How:

- `callback_context`:  ADK automatically passes it to callback functions. It provides access to the Memory Service and other runtime components.
-  with callback_context, the function can access then calls `memory_service.add_session_to_memory()` to persist the conversation automatically.

In [10]:
async def auto_save_to_memory(callback_context):
    """Automatically save session to memory after each agent turn."""
    await callback_context._invocation_context.memory_service.add_session_to_memory(
        callback_context._invocation_context.session
    )

auto_memory_agent = LlmAgent(
    model=Gemini(model="gemini-2.5-flash-lite", retry_options=retry_config),
    name="AutoMemoryAgent",
    instruction="Answer user questions.",
    tools=[preload_memory],
    after_agent_callback=auto_save_to_memory,  # Saves after each turn!
)

auto_runner = Runner(
    agent=auto_memory_agent,  # Use the agent with callback + preload_memory
    app_name=APP_NAME,
    session_service=session_service,  # Same services from the previous section
    memory_service=memory_service,
)

In [11]:
# run to test if auto memory
common_kargs = {"runner_instance": auto_runner,
               "session_service": session_service,
               "user_id":USER_ID}

await run_session(
    **common_kargs,
    user_queries="I gifted a new toy to my nephew on his 1st birthday!",
    session_id="auto-save-test",
)

await run_session(
    **common_kargs,
    user_queries= "What did I gift my nephew?",
    session_id="auto-save-test-2",
)


### Session: auto-save-test

User > I gifted a new toy to my nephew on his 1st birthday!
Model: > That's wonderful! A 1st birthday is such a special milestone. I hope your nephew enjoys his new toy!

### Session: auto-save-test-2

User > What did I gift my nephew?
Model: > You gifted your nephew a new toy for his 1st birthday.


### How often should you save Sessions to Memory?

**Options:**

| Timing | Implementation | Best For |
|--------|----------------|----------|
| **After every turn** | `after_agent_callback` | Real-time memory updates |
| **End of conversation** | Manual call when session ends | Batch processing, reduce API calls |
| **Periodic intervals** | Timer-based background job | Long-running conversations |

---
## 4. Memory Consolidation (notes only)
- Extracts key information from Session data
- Provided by managed memory services such as Vertex AI Memory Bank

**Problem**

Storing every message doesn't scale - a 50-message session means 10,000 tokens the agent must process on every search. We need **consolidation**.

**Solution**

extracting important facts, discarding conversational noise.

| Before | After |
|--------|-------|
| 4 messages: "My favorite color is BlueGreen..." / "Great!" / "Thanks!" / "You're welcome!" | 1 fact: "Favorite color: BlueGreen" |

Less storage, faster retrieval, more accurate answers.

**How**

Raw session ‚Üí LLM extracts key facts ‚Üí stores concise memories ‚Üí merges with existing (deduplication)

e.g.
- Input: "I'm allergic to peanuts. I can't eat anything with nuts."
- Output: `{ allergy: "peanuts, tree nuts", severity: "avoid completely" }`

**Next Steps**

Managed services (like VertexAiMemoryBankService) handle consolidation automatically‚Äîsame API, smarter storage.