# 109 LangGraph: Conversational Memory (Manual Management)

**Workshop**: LangGraph 109
**Duration**: ~45 minutes
**Difficulty**: Intermediate

## Learning Objectives

By completing this notebook, you will:
- Use both `HumanMessage` and `AIMessage` types for conversation tracking
- Implement conversation memory using manual history management
- Understand the `Union` type for handling multiple message types
- Build a stateful conversation loop that remembers context
- Learn about conversation persistence (saving to files/databases)
- Discover the cost implications of growing conversation history
- Implement conversation history trimming strategies

## Prerequisites

- **Knowledge**: Completed notebook 108 (First LLM Integration)
- **Understanding**: Union types from notebook 101, graph patterns from 103-107
- **Setup**: Anthropic API key configured

## What You'll Build

We're fixing the memory problem from notebook 108! This time, our bot will:
- Remember previous messages in the conversation
- Track both human questions AND AI responses
- Maintain context across multiple turns

**Example:**
```
You: My firewall is fw-prod-01
AI: Got it! How can I help with fw-prod-01?

You: What's my firewall's hostname?
AI: Your firewall's hostname is fw-prod-01.
```

**Graph Structure:** (Same as before, but with memory!)
```
START ‚Üí process_query ‚Üí END
```

## Table of Contents

1. [Introduction](#1-introduction)
2. [Setup and New Imports](#2-setup-and-new-imports)
3. [Understanding AIMessage and Union Types](#3-understanding-aimessage-and-union-types)
4. [Building the Memory-Enabled State](#4-building-the-memory-enabled-state)
5. [Creating the Conversation Node](#5-creating-the-conversation-node)
6. [Implementing the Conversation Loop](#6-implementing-the-conversation-loop)
7. [Testing with Memory](#7-testing-with-memory)
8. [Problem 1: Conversation Persistence](#8-problem-1-conversation-persistence)
9. [Problem 2: Growing Token Costs](#9-problem-2-growing-token-costs)
10. [Summary](#10-summary)

---

## 1. Introduction

Welcome back! In notebook 108, we built a simple AI bot that had one critical flaw: **it couldn't remember anything**.

### The Problem We're Solving

Remember this from last time?

```
You: My firewall hostname is fw-prod-01
AI: Thanks for letting me know!

You: What is my firewall hostname?
AI: I don't have information about your firewall hostname.
```

**Why?** Each query was independent - we never stored the conversation history.

### The Solution: Conversation Memory

In this notebook, we'll implement **manual conversation memory** by:
1. Storing both `HumanMessage` and `AIMessage` objects
2. Maintaining a conversation history list
3. Sending the entire conversation to the LLM each time
4. Updating the history with each exchange

### Why Manual First?

Before we learn the advanced `Annotated[list, add_messages]` reducer pattern (coming in notebook 110), we'll build memory manually. This helps you understand:
- How conversation history actually works
- Why reducers are so helpful (you'll appreciate them more!)
- The cost implications of growing histories

### Real-World Use Cases

With memory, our PAN-OS bot can handle:
- Multi-turn troubleshooting: "My firewall is dropping traffic" ‚Üí "Check logs" ‚Üí "What should I look for?"
- Context-aware recommendations: "I'm on 10.1" ‚Üí "Should I upgrade?" ‚Üí "What's the path to 11.0?"
- Configuration assistance: "I need NAT" ‚Üí "For what source?" ‚Üí "How do I configure it?"

Let's build it!

---

## 2. Setup and New Imports

### What's New?

Compared to notebook 108, we're adding:
1. **AIMessage**: To represent messages from the AI
2. **Union**: Type annotation for handling multiple message types

### Quick Union Refresher

Remember from notebook 101? `Union` lets a variable accept multiple types:

```python
Union[HumanMessage, AIMessage]  # Can be EITHER type
```

This is perfect for conversation history where we have both human and AI messages!

### Note About AI Agentic Libraries

**Important insight:** You could build AI agents with pure Python functions - you don't technically need LangChain or LangGraph!

However, I recommend using these libraries because:
- **LangGraph**: Great balance of control vs. convenience
- **Reduces boilerplate**: Handles a lot of tedious code for you
- **Battle-tested**: Robust implementations of common patterns
- **Flexibility**: More control than alternatives like CrewAI or Autogen

Think of LangGraph as the sweet spot between "total control" (pure Python) and "total convenience" (high-level frameworks).

In [None]:
# Core typing imports
from typing import TypedDict, List, Union

# LangChain message types - NOW INCLUDING AIMessage!
from langchain_core.messages import HumanMessage, AIMessage

# LangChain LLM integration - Using Anthropic Claude
from langchain_anthropic import ChatAnthropic

# LangGraph core
from langgraph.graph import StateGraph, START, END

# Visualization
from IPython.display import Image, display

# Environment variable loading
from dotenv import load_dotenv

print("‚úÖ All imports successful!")
print("\nüÜï What's new in this notebook:")
print("  - AIMessage: Represents messages FROM the AI to users")
print("  - Union: Allows storing BOTH HumanMessage AND AIMessage")
print("\nüí° Union[HumanMessage, AIMessage] means:")
print("   'This can be either a HumanMessage OR an AIMessage'")

In [None]:
# Load environment variables
load_dotenv()

print("‚úÖ Environment loaded!")

---

## 3. Understanding AIMessage and Union Types

### Message Types Recap

LangChain provides different message types for different speakers:

| Type | Purpose | Example |
|------|---------|----------|
| `HumanMessage` | User to AI | "What's the upgrade path?" |
| `AIMessage` | AI to User | "You should go 10.1 ‚Üí 10.2 ‚Üí 11.0" |
| `SystemMessage` | Instructions | "You are a PAN-OS expert" |
| `ToolMessage` | Tool results | "API returned: success" |

### Why AIMessage?

In notebook 108, we only tracked `HumanMessage` - the user's questions. But for conversation memory, we need to track BOTH sides:
- What the user asked (HumanMessage)
- What the AI responded (AIMessage)

### Using Union for Multiple Types

We could create two separate lists:

```python
# ‚ùå Naive approach - separate lists
class BadState(TypedDict):
    human_messages: List[HumanMessage]
    ai_messages: List[AIMessage]
```

But this is messy! How do we know which AI message responds to which human message?

**Better approach** - single list with Union:

```python
# ‚úÖ Better - single list with both types
class GoodState(TypedDict):
    messages: List[Union[HumanMessage, AIMessage]]
```

Now messages stay in order: Human, AI, Human, AI, Human, AI...

Let's see it in action!

In [None]:
# Create both message types
human_msg = HumanMessage(content="What is the upgrade path from PAN-OS 10.1 to 10.2?")
ai_msg = AIMessage(content="The upgrade path is: 10.1.0 ‚Üí 10.1.latest ‚Üí 10.2.0 ‚Üí 10.2.latest")

print("HumanMessage:")
print(f"  Type: {type(human_msg).__name__}")
print(f"  Content: {human_msg.content}")
print()
print("AIMessage:")
print(f"  Type: {type(ai_msg).__name__}")
print(f"  Content: {ai_msg.content}")
print()

# Store them together in a list
conversation = [human_msg, ai_msg]
print("Conversation history:")
for i, msg in enumerate(conversation, 1):
    speaker = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"  {i}. [{speaker}] {msg.content}")

print("\nüí° Notice: Both types coexist in the same list!")
print("   This preserves conversation order perfectly.")

---

## 4. Building the Memory-Enabled State

### State Comparison

**Notebook 108 (No Memory):**
```python
class AgentState(TypedDict):
    messages: List[HumanMessage]  # Only human messages
```

**Notebook 109 (With Memory):**
```python
class AgentState(TypedDict):
    messages: List[Union[HumanMessage, AIMessage]]  # BOTH types!
```

### Key Insight

The Union type tells Python: "This list can contain a mix of HumanMessage and AIMessage objects."

This single change enables conversation memory!

In [None]:
class AgentState(TypedDict):
    """State for our memory-enabled PAN-OS AI bot."""
    messages: List[Union[HumanMessage, AIMessage]]  # Can store BOTH message types

print("‚úÖ AgentState defined with memory support!")
print("\nState structure:")
print("  - messages: List[Union[HumanMessage, AIMessage]]")
print("\nüí° This state can now track full conversations:")
print("   [HumanMessage, AIMessage, HumanMessage, AIMessage, ...]")
print("\nüìù NOTE: Modern Python 3.10+ Syntax Alternative:")
print("   messages: list[HumanMessage | AIMessage]  # Equivalent, more concise!")
print("\n   Both are valid:")
print("   ‚úì List[Union[HumanMessage, AIMessage]]  # Python 3.7+ (more compatible)")
print("   ‚úì list[HumanMessage | AIMessage]        # Python 3.10+ (more modern)")
print("\n   This notebook uses the older syntax for broader compatibility.")

In [None]:
# Initialize the LLM - Using Claude
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=0)

print("‚úÖ LLM initialized!")
print("   Model: Claude 3.5 Sonnet")
print("   Provider: Anthropic")

---

## 5. Creating the Conversation Node

### The Key Difference from Notebook 108

**Notebook 108:**
```python
def process_query(state: AgentState) -> AgentState:
    response = llm.invoke(state["messages"])
    print(response.content)
    return state  # ‚ùå Doesn't save AI response
```

**Notebook 109:**
```python
def process_query(state: AgentState) -> AgentState:
    response = llm.invoke(state["messages"])
    state["messages"].append(AIMessage(content=response.content))  # ‚≠ê SAVES!
    print(response.content)
    return state
```

### What's Happening Here?

1. **Invoke LLM**: Send all messages (human + AI history) to the model
2. **Get Response**: LLM returns an AIMessage
3. **‚≠ê Append to State**: Add the AI response to the messages list
4. **Return State**: Updated state now includes the AI response

### Why This Works

By appending `AIMessage(content=response.content)` to the state, we're building a conversation history:

```
Turn 1: [HumanMessage("My firewall is fw-prod-01")]
        [HumanMessage("My firewall is fw-prod-01"), AIMessage("Got it!")]

Turn 2: [HumanMessage("My firewall is fw-prod-01"), AIMessage("Got it!"), HumanMessage("What's my hostname?")]
        [HumanMessage("My firewall is fw-prod-01"), AIMessage("Got it!"), HumanMessage("What's my hostname?"), AIMessage("It's fw-prod-01")]
```

The LLM sees the ENTIRE conversation each time, so it can reference previous context!

Let's implement it:

In [None]:
def process_query(state: AgentState) -> AgentState:
    """
    Process the conversation with memory support.
    
    This node:
    1. Receives all messages (human + AI history)
    2. Invokes the LLM with the full conversation
    3. Appends the AI response to the state
    4. Returns the updated state
    
    Args:
        state: Current agent state with message history
    
    Returns:
        Updated state with AI response appended
    
    Note on State Mutation:
        This function uses state["messages"].append() to mutate state directly.
        An alternative functional approach would be:
            return {"messages": state["messages"] + [AIMessage(...)]}
        
        We use mutation here for clarity and simplicity when teaching manual 
        memory management. In notebook 110, you'll learn about reducers which
        make this choice less important - the reducer handles merging automatically!
    """
    # Get the full conversation history
    messages = state["messages"]
    
    # Invoke LLM with all messages
    response = llm.invoke(messages)
    
    # ‚≠ê KEY DIFFERENCE: Append AI response to state
    state["messages"].append(AIMessage(content=response.content))
    
    # Show the response
    print(f"\nü§ñ AI: {response.content}\n")
    
    return state

print("‚úÖ Conversation node defined with memory!")
print("\nüí° Key insight:")
print("   state['messages'].append(AIMessage(...)) enables memory")
print("   Each turn adds to the growing conversation history")
print("\nüìù State Mutation vs. Returning Updates:")
print("   This function mutates state directly for pedagogical clarity.")
print("   Alternative: return {'messages': state['messages'] + [AIMessage(...)]}")
print("   Both work! Reducers (notebook 110) make this choice less critical.")

In [None]:
# Build the graph (same structure as 108, different behavior!)
graph = StateGraph(AgentState)

# Add the conversation node
graph.add_node("process_query", process_query)

# Define the flow
graph.add_edge(START, "process_query")
graph.add_edge("process_query", END)

# Compile the graph
agent = graph.compile()

print("‚úÖ Conversational agent graph compiled!")
print("\nGraph structure:")
print("  START ‚Üí process_query ‚Üí END")
print("\nüí° Same structure as notebook 108, but now with memory!")

---

## 5.1 Production Pattern: Error Handling

While our focus is on memory management, production LLM applications need robust error handling. Let's add a version of `process_query` with proper exception handling.

### Common LLM API Errors

| Error Type | Cause | Handling Strategy |
|------------|-------|-------------------|
| `ConnectionError` | Network issues | Retry with backoff |
| `TimeoutError` | API timeout | Retry or use shorter context |
| HTTP 401 | Invalid API key | Check credentials |
| HTTP 429 | Rate limit exceeded | Implement retry logic |
| HTTP 500 | Server error | Retry with backoff |

### Why This Matters

In production SCM automation:
- Network interruptions shouldn't crash workflows
- Rate limits need graceful handling
- Users need clear error messages
- Failed LLM calls shouldn't lose conversation state

In [None]:
def process_query_with_error_handling(state: AgentState) -> AgentState:
    """
    Production-ready conversation node with comprehensive error handling.
    
    This demonstrates how to handle common LLM API failures gracefully
    while preserving conversation state and providing clear user feedback.
    
    Args:
        state: Current agent state with message history
    
    Returns:
        Updated state (with AI response on success, or error message on failure)
    """
    messages = state["messages"]
    
    try:
        # Attempt LLM invocation
        response = llm.invoke(messages)
        
        # Success - append AI response
        state["messages"].append(AIMessage(content=response.content))
        print(f"\nü§ñ AI: {response.content}\n")
        
    except ConnectionError as e:
        # Network connectivity issues
        error_msg = f"‚ùå Connection Error: {str(e)}"
        print(f"\n{error_msg}")
        print("   ‚Üí Check your internet connection and try again")
        
        # Optionally append error to conversation for context
        state["messages"].append(AIMessage(
            content="[Connection error - unable to process request]"
        ))
        
    except TimeoutError as e:
        # API timeout (request took too long)
        error_msg = f"‚ùå Timeout Error: {str(e)}"
        print(f"\n{error_msg}")
        print("   ‚Üí The LLM didn't respond in time")
        print("   ‚Üí Try reducing conversation history or use trimming")
        
        state["messages"].append(AIMessage(
            content="[Timeout error - request took too long]"
        ))
        
    except Exception as e:
        # Catch-all for other API errors
        error_message = str(e)
        
        # Check for specific HTTP error codes in the message
        if "401" in error_message or "unauthorized" in error_message.lower():
            print("\n‚ùå Authentication Error:")
            print("   ‚Üí Check your ANTHROPIC_API_KEY in .env file")
            print("   ‚Üí Verify the API key is valid and active")
            
            state["messages"].append(AIMessage(
                content="[Authentication error - check API credentials]"
            ))
            
        elif "429" in error_message or "rate limit" in error_message.lower():
            print("\n‚ùå Rate Limit Error:")
            print("   ‚Üí You've exceeded the API rate limit")
            print("   ‚Üí Consider implementing retry logic with exponential backoff")
            print("   ‚Üí Reference notebook 107 for retry patterns")
            
            state["messages"].append(AIMessage(
                content="[Rate limit exceeded - implement retry logic]"
            ))
            
        elif "500" in error_message or "502" in error_message or "503" in error_message:
            print("\n‚ùå Server Error:")
            print(f"   ‚Üí API server error: {error_message}")
            print("   ‚Üí This is temporary - retry after a brief delay")
            
            state["messages"].append(AIMessage(
                content="[Server error - retry after delay]"
            ))
            
        else:
            # Unknown error
            print(f"\n‚ùå Unexpected Error: {error_message}")
            print("   ‚Üí Review the error message and check API documentation")
            
            state["messages"].append(AIMessage(
                content=f"[Error: {error_message[:100]}]"
            ))
    
    return state

print("‚úÖ Production error handling pattern defined!")
print("\nüí° This pattern handles:")
print("   ‚Ä¢ ConnectionError - Network issues")
print("   ‚Ä¢ TimeoutError - API timeouts")
print("   ‚Ä¢ HTTP 401 - Authentication failures")
print("   ‚Ä¢ HTTP 429 - Rate limit exceeded")
print("   ‚Ä¢ HTTP 500/502/503 - Server errors")
print("\n‚≠ê Key insight:")
print("   Even when LLM calls fail, we preserve conversation state")
print("   and append error messages for debugging context.")

In [None]:
# Visualize the graph
display(Image(agent.get_graph().draw_mermaid_png()))

print("\nüìä Graph visualization above shows:")
print("  - Single node: process_query")
print("  - Linear flow: START ‚Üí process_query ‚Üí END")
print("  - Memory happens INSIDE the node via state mutation")

---

## 6. Implementing the Conversation Loop

### The Challenge: Synchronizing History

We have a problem to solve: **how do we maintain conversation history across multiple invocations?**

**The issue:**
- Each time we call `agent.invoke()`, we pass initial state
- But we need to PERSIST the conversation between calls
- The graph doesn't automatically remember between invocations (yet!)

**The solution:** Maintain an external `conversation_history` variable and synchronize it after each turn.

### The Pattern

```python
# External history variable
conversation_history = []

# Turn 1
conversation_history.append(HumanMessage(content="Turn 1 query"))
result = agent.invoke({"messages": conversation_history})
conversation_history = result["messages"]  # ‚≠ê Sync!

# Turn 2
conversation_history.append(HumanMessage(content="Turn 2 query"))
result = agent.invoke({"messages": conversation_history})
conversation_history = result["messages"]  # ‚≠ê Sync again!
```

### Why This Works

1. **Before invoke**: We add the new human message to our history
2. **During invoke**: The agent processes ALL messages and appends AI response
3. **After invoke**: We update our history with the full conversation (including AI response)
4. **Next turn**: We start with the complete history

This manual synchronization ensures context is preserved across invocations.

Let's implement it!

In [None]:
# Initialize conversation history
conversation_history = []

def chat(user_message: str):
    """
    Handle a single conversation turn.
    
    Args:
        user_message: The user's input
    
    Returns:
        The AI's response content
    """
    global conversation_history
    
    # 1. Add human message to history
    conversation_history.append(HumanMessage(content=user_message))
    
    print(f"üë§ You: {user_message}")
    
    # 2. Invoke agent with full history
    result = agent.invoke({"messages": conversation_history})
    
    # 3. ‚≠ê CRITICAL: Sync history with result
    conversation_history = result["messages"]
    
    # 4. Return the last AI message
    return conversation_history[-1].content

print("‚úÖ Conversation loop function defined!")
print("\nüí° Usage:")
print('   chat("My firewall is fw-prod-01")')
print('   chat("What is my firewall hostname?")')
print("\n‚≠ê Key insight:")
print("   conversation_history = result['messages'] synchronizes state")

---

### ‚ö†Ô∏è Challenges of Manual Memory Management

Before we test our implementation, let's be explicit about the **pain points** of this manual approach:

**1. Verbose and Repetitive**
```python
# This pattern must be repeated for EVERY chat interaction:
conversation_history.append(HumanMessage(content=user_message))
result = agent.invoke({"messages": conversation_history})
conversation_history = result["messages"]  # Don't forget this!
```

**2. Error-Prone**
```python
# Forget ONE sync and you lose context:
conversation_history.append(HumanMessage(content="Question"))
result = agent.invoke({"messages": conversation_history})
# ‚ùå OOPS! Forgot to sync - next turn won't have AI response!
```

**3. Manual State Synchronization**
```python
# You're responsible for keeping external state in sync:
state["messages"].append(AIMessage(...))  # Inside node
conversation_history = result["messages"]  # Outside node
# Must happen EVERY time, no exceptions!
```

**4. No Automatic Merging**
```python
# With reducers (notebook 110), this happens automatically:
# state["messages"].append(msg)  # ‚úÖ Automatic merge!

# Without reducers (this notebook), you must:
# - Manually append human messages
# - Manually sync after invoke
# - Manually handle state updates
```

**5. Boilerplate Everywhere**
Every conversation function needs the same pattern:
- Add human message
- Invoke agent
- Sync history
- Extract response

**6. Token Cost Awareness Required**
You must manually implement trimming, summarization, or windowing to control costs.

### Why Learn This Manual Approach?

If it's so painful, why teach it? **Because understanding manual management:**

‚úÖ Shows you HOW conversation memory works under the hood  
‚úÖ Helps you debug when automatic solutions fail  
‚úÖ Makes you appreciate the value of reducers (notebook 110)  
‚úÖ Gives you full control when you need custom behavior  

**The good news:** Notebook 110 introduces `add_messages` reducer that eliminates ALL of this manual work while giving you MORE control!

---

### üö® What Happens When Manual Sync Fails?

Let's demonstrate the **error-prone nature** of manual memory management by intentionally forgetting to sync the conversation history.

In [None]:
def chat_broken(user_message: str):
    """
    ‚ùå BROKEN: Chat function that FORGETS to sync history.
    
    This demonstrates what happens when you forget the critical sync step.
    """
    global conversation_history
    
    # 1. Add human message
    conversation_history.append(HumanMessage(content=user_message))
    print(f"üë§ You: {user_message}")
    
    # 2. Invoke agent
    result = agent.invoke({"messages": conversation_history})
    
    # 3. ‚ùå BUG: Forgot to sync history!
    # conversation_history = result["messages"]  # <-- MISSING!
    
    # The AI's response is lost from conversation_history!
    return result["messages"][-1].content

# Reset conversation
conversation_history = []

print("="*70)
print("DEMONSTRATION: Manual Sync Failure")
print("="*70)
print("\n‚ö†Ô∏è  Using BROKEN chat function that forgets to sync history\n")

# Turn 1
print("‚îÄ"*70)
print("TURN 1: Provide firewall hostname")
print("‚îÄ"*70)
chat_broken("My firewall hostname is fw-datacenter-01")
print(f"üìä History length after turn 1: {len(conversation_history)} messages")
print(f"   Contents: {[type(m).__name__ for m in conversation_history]}")

# Turn 2
print("\n" + "‚îÄ"*70)
print("TURN 2: Ask about the hostname (should remember it!)")
print("‚îÄ"*70)
chat_broken("What is my firewall hostname?")
print(f"üìä History length after turn 2: {len(conversation_history)} messages")
print(f"   Contents: {[type(m).__name__ for m in conversation_history]}")

print("\n" + "="*70)
print("üö® PROBLEM IDENTIFIED!")
print("="*70)
print("\n‚ùå What went wrong:")
print("   ‚Ä¢ History only has HumanMessages - no AIMessages!")
print("   ‚Ä¢ The bot's responses were NEVER saved to history")
print("   ‚Ä¢ Each turn is isolated - no memory of AI responses")
print("   ‚Ä¢ The bot CAN'T reference its own previous answers")
print("\nüí° Expected history: [HumanMessage, AIMessage, HumanMessage, AIMessage]")
print(f"   Actual history:   {[type(m).__name__ for m in conversation_history]}")
print("\n‚ö†Ô∏è  This is why the sync step is CRITICAL:")
print("   conversation_history = result['messages']  # Don't forget!")
print("\n‚úÖ The correct chat() function (defined earlier) handles this properly.")

---

## 7. Testing with Memory

### The Moment of Truth

Now let's test if our bot can remember! We'll have a multi-turn conversation about PAN-OS upgrades.

**Test scenario:**
1. Tell the bot our current version
2. Ask about upgrade paths (should remember the version!)
3. Ask follow-up questions

### What to Watch For

- **Turn 1**: Bot acknowledges information
- **Turn 2**: Bot references Turn 1 information (MEMORY!)
- **Turn 3**: Bot maintains full context

Let's try it:

In [None]:
# Turn 1: Provide context
print("="*60)
print("TURN 1: Establishing context")
print("="*60)
chat("My firewall is running PAN-OS 10.1.0 and the hostname is fw-prod-01")

print("\n" + "="*60)
print("TURN 2: Testing memory - asking about upgrade path")
print("="*60)
chat("What's the recommended upgrade path for my firewall?")

print("\n" + "="*60)
print("TURN 3: Testing continued memory")
print("="*60)
chat("What was my firewall's hostname again?")

In [None]:
# Inspect the conversation history
print("\n" + "="*60)
print("CONVERSATION HISTORY DEBUG")
print("="*60)
print(f"\nTotal messages: {len(conversation_history)}")
print("\nFull conversation:")
for i, msg in enumerate(conversation_history, 1):
    speaker = "üë§ Human" if isinstance(msg, HumanMessage) else "ü§ñ AI"
    content_preview = msg.content[:80] + "..." if len(msg.content) > 80 else msg.content
    print(f"\n{i}. {speaker}")
    print(f"   {content_preview}")

print("\n" + "="*60)
print("‚úÖ SUCCESS! The bot remembers context across turns!")
print("="*60)
print("\nüí° Key observations:")
print("   - Turn 2: Bot referenced PAN-OS 10.1.0 from Turn 1")
print("   - Turn 3: Bot recalled fw-prod-01 from Turn 1")
print("   - History grows: Human ‚Üí AI ‚Üí Human ‚Üí AI ‚Üí Human ‚Üí AI")
print("\n‚≠ê This is REAL conversation memory!")

---

## 7.5 Practical Example: SCM NAT Policy Wizard

### Real-World Use Case

Let's build a practical **multi-turn wizard** for creating a NAT policy in Strata Cloud Manager. This demonstrates how conversation memory enables complex configuration workflows.

**Scenario**: A network engineer needs to create a NAT policy but doesn't have all details upfront. The wizard collects information across multiple turns:

1. **Turn 1**: Identify the need (NAT policy for web server)
2. **Turn 2**: Collect zone information
3. **Turn 3**: Gather address details  
4. **Turn 4**: Summarize and confirm

This mirrors real troubleshooting and configuration workflows where context builds over multiple interactions.

In [None]:
# Reset conversation for clean demo
conversation_history = []

print("="*70)
print("PRACTICAL EXAMPLE: SCM NAT Policy Configuration Wizard")
print("="*70)
print("\nüéØ Goal: Create a NAT policy through multi-turn conversation")
print("üí° Watch how the bot remembers context from each turn!\n")

# Turn 1: Initial request
print("\n" + "‚îÄ"*70)
print("TURN 1: Engineer states the requirement")
print("‚îÄ"*70)
chat("I need to create a NAT policy for my web server that needs to be accessible from the internet")

# Turn 2: Provide zone details
print("\n" + "‚îÄ"*70)
print("TURN 2: Provide zone information")
print("‚îÄ"*70)
chat("The source zone is 'untrust' for internet traffic, and destination zone is 'dmz' where the web server lives")

# Turn 3: Provide address details
print("\n" + "‚îÄ"*70)
print("TURN 3: Provide addressing information")
print("‚îÄ"*70)
chat("The web server's internal IP is 10.50.100.10 and it should be NATted to public IP 203.0.113.50 on port 443")

# Turn 4: Request summary
print("\n" + "‚îÄ"*70)
print("TURN 4: Request configuration summary")
print("‚îÄ"*70)
chat("Can you summarize the complete NAT policy configuration we just defined?")

print("\n" + "="*70)
print("‚úÖ WIZARD COMPLETE!")
print("="*70)
print("\nüí° Key Observations:")
print("   ‚Ä¢ Turn 1: Bot understood the general requirement")
print("   ‚Ä¢ Turn 2: Bot remembered it was about NAT and added zone context")
print("   ‚Ä¢ Turn 3: Bot retained zones AND added address details")
print("   ‚Ä¢ Turn 4: Bot recalled ALL information to create complete summary")
print("\n‚≠ê This is the power of conversation memory for complex workflows!")

---

### 7.6 Another Practical Example: Address Object Creation Wizard

Let's build another real-world wizard - this time for creating address objects in SCM. This demonstrates how memory enables incremental data collection.

In [None]:
# Reset for new wizard
conversation_history = []

print("="*70)
print("PRACTICAL EXAMPLE: SCM Address Object Creation Wizard")
print("="*70)
print("\nüéØ Goal: Create address objects with incremental information gathering")
print("üí° Demonstrating multi-turn data collection workflow\n")

# Turn 1: Start the workflow
print("\n" + "‚îÄ"*70)
print("TURN 1: Initiate address object creation")
print("‚îÄ"*70)
chat("I need to create address objects for my new branch office network")

# Turn 2: Specify network details
print("\n" + "‚îÄ"*70)
print("TURN 2: Provide network information")
print("‚îÄ"*70)
chat("The branch office uses 10.20.0.0/16 network space")

# Turn 3: Add specific subnets
print("\n" + "‚îÄ"*70)
print("TURN 3: Define subnet breakdown")
print("‚îÄ"*70)
chat("We need separate objects for users (10.20.10.0/24), servers (10.20.20.0/24), and guest wifi (10.20.30.0/24)")

# Turn 4: Add naming convention
print("\n" + "‚îÄ"*70)
print("TURN 4: Specify naming convention")
print("‚îÄ"*70)
chat("Use the naming pattern: branch-office-seattle-<subnet-type>")

# Turn 5: Request configuration summary
print("\n" + "‚îÄ"*70)
print("TURN 5: Generate complete configuration")
print("‚îÄ"*70)
chat("Can you list all the address objects we need to create with their names and IP ranges?")

print("\n" + "="*70)
print("‚úÖ ADDRESS OBJECT WIZARD COMPLETE!")
print("="*70)
print("\nüìä Conversation Statistics:")
print(f"   ‚Ä¢ Total turns: {len([m for m in conversation_history if isinstance(m, HumanMessage)])}")
print(f"   ‚Ä¢ Total messages: {len(conversation_history)}")
print(f"   ‚Ä¢ Information gathered across: 5 separate interactions")
print("\nüí° Notice how the bot:")
print("   ‚Ä¢ Remembered the branch office context from turn 1")
print("   ‚Ä¢ Retained the network space from turn 2")
print("   ‚Ä¢ Recalled all three subnets from turn 3")
print("   ‚Ä¢ Applied the naming convention from turn 4")
print("   ‚Ä¢ Synthesized everything into a complete config in turn 5")
print("\n‚≠ê This incremental data collection is impossible without memory!")

---

### 7.7 Third Practical Example: Security Rule Configuration Wizard

Let's add one more real-world wizard - this time for creating security rules in SCM. This demonstrates how conversation memory handles complex multi-parameter configurations.

In [None]:
# Reset for security rule wizard
conversation_history = []

print("="*70)
print("PRACTICAL EXAMPLE: SCM Security Rule Configuration Wizard")
print("="*70)
print("\nüéØ Goal: Create security rule with incremental parameter gathering")
print("üí° Security rules have many parameters - perfect for multi-turn wizards\n")

# Turn 1: Initial requirement
print("\n" + "‚îÄ"*70)
print("TURN 1: State the requirement")
print("‚îÄ"*70)
chat("I need to create a security rule to allow HTTPS access to our web servers in the DMZ from the internet")

# Turn 2: Specify zones
print("\n" + "‚îÄ"*70)
print("TURN 2: Define security zones")
print("‚îÄ"*70)
chat("The traffic will go from 'untrust' zone (source) to 'dmz' zone (destination)")

# Turn 3: Define source/destination
print("\n" + "‚îÄ"*70)
print("TURN 3: Specify source and destination details")
print("‚îÄ"*70)
chat("Source should be 'any' since it's from the internet, and destination should be address group 'web-servers-dmz'")

# Turn 4: Application and service
print("\n" + "‚îÄ"*70)
print("TURN 4: Application and service details")
print("‚îÄ"*70)
chat("For application use 'ssl' and 'web-browsing', service should be 'application-default'")

# Turn 5: Security profile and action
print("\n" + "‚îÄ"*70)
print("TURN 5: Security settings")
print("‚îÄ"*70)
chat("Set action to 'allow', attach 'strict-security' profile group, and enable logging at session end")

# Turn 6: Request complete configuration
print("\n" + "‚îÄ"*70)
print("TURN 6: Generate complete security rule configuration")
print("‚îÄ"*70)
chat("Can you provide a complete summary of the security rule we just designed, including all parameters and security best practices we should consider?")

print("\n" + "="*70)
print("‚úÖ SECURITY RULE WIZARD COMPLETE!")
print("="*70)
print("\nüìä Conversation Statistics:")
print(f"   ‚Ä¢ Total turns: {len([m for m in conversation_history if isinstance(m, HumanMessage)])}")
print(f"   ‚Ä¢ Total messages: {len(conversation_history)}")
print(f"   ‚Ä¢ Rule parameters gathered: 10+ across 6 interactions")
print("\nüí° Notice the complexity:")
print("   ‚Ä¢ Turn 1: Identified requirement (HTTPS to DMZ)")
print("   ‚Ä¢ Turn 2: Remembered requirement + added zones")
print("   ‚Ä¢ Turn 3: Retained zones + added source/destination")
print("   ‚Ä¢ Turn 4: Kept all context + added application/service")
print("   ‚Ä¢ Turn 5: Preserved all parameters + added security settings")
print("   ‚Ä¢ Turn 6: Synthesized complete rule with best practices")
print("\n‚≠ê Security rules have 15+ parameters - memory makes complex configs manageable!")
print("\nüîê This pattern applies to:")
print("   ‚Ä¢ Security policies (shown here)")
print("   ‚Ä¢ NAT rules (Section 7.5)")
print("   ‚Ä¢ QoS policies")
print("   ‚Ä¢ VPN configurations")
print("   ‚Ä¢ Any complex multi-parameter SCM objects")

---

## 8. Problem 1: Conversation Persistence

### The Issue

Our conversation memory works... **but only while the notebook is running!**

**What happens when:**
- The notebook kernel restarts?
- The user closes their session?
- The application crashes?

**Answer:** All conversation history is LOST because it's stored in a Python variable in memory.

### Real-World Scenarios

In production, you need to persist conversations:

1. **Customer Support Bot**: User comes back tomorrow and expects the bot to remember previous issues
2. **Network Operations Assistant**: Shift handoffs require preserving troubleshooting context
3. **Configuration Wizard**: Multi-session workflows need to resume where they left off

### Solution Approaches

**Option 1: JSON File Storage**
```python
import json

# Save conversation
with open('conversation.json', 'w') as f:
    json.dump([msg.dict() for msg in conversation_history], f)

# Load conversation
with open('conversation.json', 'r') as f:
    data = json.load(f)
    conversation_history = [HumanMessage(**m) if m['type']=='human' 
                           else AIMessage(**m) for m in data]
```

**Option 2: Database Storage (PostgreSQL, MongoDB)**
```python
# Pseudocode
db.conversations.insert({
    'session_id': 'user-123',
    'messages': conversation_history,
    'timestamp': datetime.now()
})
```

**Option 3: LangGraph Checkpointing** ‚≠ê (Coming in notebook 110!)
```python
from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()
agent = graph.compile(checkpointer=checkpointer)
# Automatically persists state!
```

### Why We Haven't Implemented This Yet

For this notebook, we're focusing on **understanding manual memory management**. Persistence adds complexity, and we want you to grasp the fundamentals first.

**Next notebook (110):** We'll introduce **reducers and tools** which handle memory management automatically!

### Quick Demo: Saving to JSON

In [None]:
import json

def save_conversation(filename: str = "conversation.json"):
    """Save conversation history to JSON file."""
    # Convert messages to dict format
    messages_data = []
    for msg in conversation_history:
        messages_data.append({
            'type': 'human' if isinstance(msg, HumanMessage) else 'ai',
            'content': msg.content
        })
    
    with open(filename, 'w') as f:
        json.dump(messages_data, f, indent=2)
    
    print(f"‚úÖ Saved {len(messages_data)} messages to {filename}")

def load_conversation(filename: str = "conversation.json"):
    """Load conversation history from JSON file."""
    global conversation_history
    
    with open(filename, 'r') as f:
        messages_data = json.load(f)
    
    # Convert back to message objects
    conversation_history = []
    for msg_data in messages_data:
        if msg_data['type'] == 'human':
            conversation_history.append(HumanMessage(content=msg_data['content']))
        else:
            conversation_history.append(AIMessage(content=msg_data['content']))
    
    print(f"‚úÖ Loaded {len(conversation_history)} messages from {filename}")

# Demo: Save current conversation
save_conversation()

print("\nüí° Now if the kernel restarts, you can:")
print('   load_conversation()')
print("   # Resume conversation with full history!")

---

## 9. Problem 2: Growing Token Costs

### The Issue

Every time we invoke the LLM, we send the **ENTIRE conversation history**. This works great for memory, but creates a problem:

**Token costs grow exponentially with each turn!**

### Cost Illustration

Assume each message averages 50 tokens:

| Turn | Messages Sent | Total Tokens | Cost Multiplier |
|------|---------------|--------------|-----------------|
| 1    | 1             | 50           | 1x              |
| 2    | 3             | 150          | 3x              |
| 3    | 5             | 250          | 5x              |
| 4    | 7             | 350          | 7x              |
| 10   | 19            | 950          | 19x             |
| 50   | 99            | 4,950        | 99x             |

**After 50 turns**: You're paying 99x more per query than turn 1!

### Real-World Impact

**Example pricing (GPT-4):**
- Input: $0.03 / 1K tokens
- Turn 1: $0.0015
- Turn 50: $0.15 (100x more!)
- 1000 users √ó 50 turns = **$150,000** üò±

### Why This Happens

```python
# Turn 1: Send 1 message
llm.invoke([HumanMessage("Hello")])  # 1 message

# Turn 2: Send 3 messages (H, A, H)
llm.invoke([HumanMessage("Hello"), 
           AIMessage("Hi!"), 
           HumanMessage("How are you?")])  # 3 messages

# Turn 3: Send 5 messages (H, A, H, A, H)
llm.invoke([...all previous messages...])  # 5 messages
```

Each turn re-sends ALL previous messages!

### Solution Strategies

**1. Message Window Trimming** (Keep last N messages)
```python
MAX_HISTORY = 10
recent_messages = conversation_history[-MAX_HISTORY:]
result = agent.invoke({"messages": recent_messages})
```

**2. Summarization** (Compress old messages)
```python
# Every 20 messages, summarize the conversation
if len(conversation_history) > 20:
    summary = summarize_conversation(conversation_history[:20])
    conversation_history = [HumanMessage(content=f"Summary: {summary}")] + conversation_history[20:]
```

**3. Smart Truncation** (Keep system messages + recent)
```python
system_messages = [msg for msg in conversation_history if isinstance(msg, SystemMessage)]
recent_messages = conversation_history[-10:]
trimmed = system_messages + recent_messages
```

**4. Token-Based Trimming** (Stay under budget)
```python
import tiktoken

def trim_to_token_limit(messages, max_tokens=4000):
    encoder = tiktoken.encoding_for_model("gpt-4")
    # Count tokens and trim from the start
    # (implementation left as exercise)
```

### Trade-offs

| Strategy | Pros | Cons |
|----------|------|------|
| Window Trimming | Simple, predictable | Loses old context |
| Summarization | Preserves key info | Adds LLM calls |
| Smart Truncation | Keeps important messages | Complex logic |
| Token-Based | Exact cost control | Requires token counting |

### Demo: Message Window Trimming

### Real Token Counting with tiktoken

Let's move beyond theoretical costs and actually count tokens in our conversations using `tiktoken`, OpenAI's tokenizer library (also useful for Claude token estimation).

In [None]:
try:
    import tiktoken
    tiktoken_available = True
except ImportError:
    tiktoken_available = False
    print("‚ö†Ô∏è  tiktoken not installed - run: pip install tiktoken")
    print("   Proceeding with character-based estimation...")

def count_tokens(messages: list, model: str = "gpt-4") -> dict:
    """
    Count tokens in a message list.
    
    Args:
        messages: List of HumanMessage/AIMessage objects
        model: Model name for tokenizer (defaults to gpt-4)
    
    Returns:
        Dictionary with token counts and statistics
    """
    if tiktoken_available:
        # Use actual tokenizer
        encoder = tiktoken.encoding_for_model(model)
        
        total_tokens = 0
        message_tokens = []
        
        for msg in messages:
            tokens = len(encoder.encode(msg.content))
            message_tokens.append(tokens)
            total_tokens += tokens
        
        return {
            "total_tokens": total_tokens,
            "message_tokens": message_tokens,
            "avg_tokens_per_message": total_tokens / len(messages) if messages else 0,
            "method": "tiktoken (accurate)"
        }
    else:
        # Fallback to character estimation
        # Rule of thumb: ~4 characters per token for English text
        total_chars = sum(len(msg.content) for msg in messages)
        estimated_tokens = total_chars // 4
        
        return {
            "total_tokens": estimated_tokens,
            "message_tokens": [len(msg.content) // 4 for msg in messages],
            "avg_tokens_per_message": estimated_tokens / len(messages) if messages else 0,
            "method": "character estimation (~4 chars/token)"
        }

def analyze_conversation_costs(messages: list, model: str = "gpt-4") -> None:
    """
    Analyze and display conversation token costs.
    
    Args:
        messages: Conversation history
        model: Model name for cost calculation
    """
    stats = count_tokens(messages, model)
    
    # Cost per 1K tokens (example rates - check current pricing!)
    cost_per_1k_input = 0.03  # GPT-4 input pricing
    cost_per_1k_output = 0.06  # GPT-4 output pricing
    
    # Calculate costs (assuming ~50/50 input/output split)
    total_cost = (stats["total_tokens"] / 1000) * cost_per_1k_input
    
    print("="*70)
    print("TOKEN ANALYSIS")
    print("="*70)
    print(f"\nüìä Statistics:")
    print(f"   Total messages: {len(messages)}")
    print(f"   Total tokens: {stats['total_tokens']:,}")
    print(f"   Avg tokens/message: {stats['avg_tokens_per_message']:.1f}")
    print(f"   Method: {stats['method']}")
    
    print(f"\nüí∞ Cost Estimate (at ${cost_per_1k_input}/1K input tokens):")
    print(f"   This conversation: ${total_cost:.4f}")
    print(f"   Per turn: ${total_cost / (len(messages) / 2):.4f}")
    
    # Project costs over time
    print(f"\nüìà Cost Projection:")
    print(f"   After 10 turns: ${total_cost * (10 / (len(messages) / 2)):.4f}")
    print(f"   After 50 turns: ${total_cost * (50 / (len(messages) / 2)):.4f}")
    print(f"   After 100 turns: ${total_cost * (100 / (len(messages) / 2)):.4f}")
    
    print(f"\n‚ö†Ô∏è  Token growth:")
    print(f"   Turn 1: ~{stats['message_tokens'][0] if stats['message_tokens'] else 0} tokens")
    print(f"   Turn {len(messages) // 2}: ~{stats['total_tokens']:,} tokens")
    print(f"   Growth rate: {(stats['total_tokens'] / stats['message_tokens'][0]):.1f}x" if stats['message_tokens'] and stats['message_tokens'][0] > 0 else "   Growth rate: N/A")

# Demo with our current conversation history
if len(conversation_history) > 0:
    analyze_conversation_costs(conversation_history)
    
    print("\n" + "="*70)
    print("üí° KEY INSIGHTS")
    print("="*70)
    print("\n1. Token counts grow LINEARLY with conversation length")
    print("   Each turn adds both human + AI message tokens")
    print("\n2. Costs grow QUADRATICALLY if you send full history each time")
    print("   Turn 1: 1 message, Turn 2: 3 messages, Turn 3: 5 messages...")
    print("\n3. Token trimming controls costs:")
    print("   Keep last 10 messages: ~constant cost per turn")
    print("   Send full history: costs increase every turn")
    print("\n4. For long conversations:")
    print("   ‚Ä¢ Use trimming (last N messages)")
    print("   ‚Ä¢ Use summarization (compress old context)")
    print("   ‚Ä¢ Use checkpointing with smart retrieval")
else:
    print("No conversation history to analyze. Run one of the wizard examples first!")

### Conversation Analytics

Beyond token counting, let's analyze conversation patterns, message lengths, and interaction statistics. This helps understand user behavior and optimize bot performance.

In [None]:
from collections import Counter
import re

def analyze_conversation(messages: list) -> dict:
    """
    Comprehensive conversation analytics.
    
    Args:
        messages: List of conversation messages
    
    Returns:
        Dictionary with analytics metrics
    """
    if not messages:
        return {"error": "No messages to analyze"}
    
    # Separate human and AI messages
    human_messages = [m for m in messages if isinstance(m, HumanMessage)]
    ai_messages = [m for m in messages if isinstance(m, AIMessage)]
    
    # Message length analysis
    human_lengths = [len(m.content) for m in human_messages]
    ai_lengths = [len(m.content) for m in ai_messages]
    
    # Word count analysis
    human_words = [len(m.content.split()) for m in human_messages]
    ai_words = [len(m.content.split()) for m in ai_messages]
    
    # Keyword extraction (simple approach)
    all_text = " ".join([m.content for m in messages])
    # Remove common words and extract technical terms
    words = re.findall(r'\b[A-Za-z]{4,}\b', all_text.lower())
    common_words = {'this', 'that', 'with', 'from', 'have', 'what', 'when', 
                    'where', 'about', 'your', 'should', 'would', 'could'}
    keywords = [w for w in words if w not in common_words]
    keyword_counts = Counter(keywords).most_common(10)
    
    return {
        "total_turns": len(human_messages),
        "total_messages": len(messages),
        "human_messages": len(human_messages),
        "ai_messages": len(ai_messages),
        
        "avg_human_length": sum(human_lengths) / len(human_lengths) if human_lengths else 0,
        "avg_ai_length": sum(ai_lengths) / len(ai_lengths) if ai_lengths else 0,
        "max_human_length": max(human_lengths) if human_lengths else 0,
        "max_ai_length": max(ai_lengths) if ai_lengths else 0,
        
        "avg_human_words": sum(human_words) / len(human_words) if human_words else 0,
        "avg_ai_words": sum(ai_words) / len(ai_words) if ai_words else 0,
        
        "top_keywords": keyword_counts,
        
        "conversation_ratio": len(ai_messages) / len(human_messages) if human_messages else 0
    }

def display_conversation_analytics(messages: list) -> None:
    """
    Display formatted conversation analytics.
    
    Args:
        messages: Conversation history to analyze
    """
    analytics = analyze_conversation(messages)
    
    if "error" in analytics:
        print(f"‚ùå {analytics['error']}")
        return
    
    print("="*70)
    print("CONVERSATION ANALYTICS REPORT")
    print("="*70)
    
    print(f"\nüìä Overview:")
    print(f"   Total turns: {analytics['total_turns']}")
    print(f"   Total messages: {analytics['total_messages']}")
    print(f"   Human messages: {analytics['human_messages']}")
    print(f"   AI messages: {analytics['ai_messages']}")
    print(f"   AI/Human ratio: {analytics['conversation_ratio']:.2f}")
    
    print(f"\nüìù Message Length (characters):")
    print(f"   Avg human message: {analytics['avg_human_length']:.0f}")
    print(f"   Avg AI message: {analytics['avg_ai_length']:.0f}")
    print(f"   Max human message: {analytics['max_human_length']}")
    print(f"   Max AI message: {analytics['max_ai_length']}")
    print(f"   AI response verbosity: {(analytics['avg_ai_length'] / analytics['avg_human_length']):.2f}x human input" if analytics['avg_human_length'] > 0 else "   AI response verbosity: N/A")
    
    print(f"\nüí¨ Word Count:")
    print(f"   Avg human words/message: {analytics['avg_human_words']:.1f}")
    print(f"   Avg AI words/message: {analytics['avg_ai_words']:.1f}")
    
    print(f"\nüîë Top Keywords:")
    for i, (keyword, count) in enumerate(analytics['top_keywords'][:5], 1):
        print(f"   {i}. '{keyword}': {count} occurrences")
    
    print(f"\nüí° Insights:")
    
    # Provide insights based on metrics
    if analytics['avg_ai_length'] > analytics['avg_human_length'] * 3:
        print("   ‚ö†Ô∏è  AI responses are very verbose (3x+ human input)")
        print("      Consider instructing the AI to be more concise")
    
    if analytics['total_turns'] > 20:
        print(f"   ‚ö†Ô∏è  Long conversation ({analytics['total_turns']} turns)")
        print("      Consider implementing trimming or summarization")
    
    if analytics['avg_human_words'] < 5:
        print("   üí≠ Users are sending very short messages")
        print("      Bot may need to ask clarifying questions")
    
    # Check for SCM-specific keywords
    scm_keywords = {'firewall', 'rule', 'security', 'address', 'policy', 'zone', 'panos'}
    found_scm = [kw for kw, _ in analytics['top_keywords'] if kw in scm_keywords]
    if found_scm:
        print(f"   üîê SCM-focused conversation (keywords: {', '.join(found_scm)})")
    
    print("\n" + "="*70)

# Demo with current conversation
if len(conversation_history) > 0:
    display_conversation_analytics(conversation_history)
    
    print("\nüí° Use Cases for Analytics:")
    print("   ‚Ä¢ Optimize bot verbosity based on AI/human length ratios")
    print("   ‚Ä¢ Identify when to trigger trimming (turn count thresholds)")
    print("   ‚Ä¢ Detect conversation topics from keywords")
    print("   ‚Ä¢ Monitor user engagement (message length trends)")
    print("   ‚Ä¢ A/B test different bot personalities (compare analytics)")
else:
    print("No conversation history to analyze. Run one of the wizard examples first!")

In [None]:
def chat_with_trimming(user_message: str, max_history: int = 10):
    """
    Chat with automatic history trimming.
    
    Args:
        user_message: The user's input
        max_history: Maximum number of messages to keep
    
    Returns:
        The AI's response content
    """
    global conversation_history
    
    # 1. Add human message
    conversation_history.append(HumanMessage(content=user_message))
    
    # 2. ‚≠ê Trim to last N messages
    trimmed_history = conversation_history[-max_history:]
    
    print(f"üë§ You: {user_message}")
    print(f"üìä History: {len(conversation_history)} total, using last {len(trimmed_history)}")
    
    # 3. Invoke with trimmed history
    result = agent.invoke({"messages": trimmed_history})
    
    # 4. Important: Update full history (not trimmed)
    conversation_history.append(AIMessage(content=result["messages"][-1].content))
    
    return result["messages"][-1].content

print("‚úÖ Trimming-enabled chat function defined!")
print("\nüí° Usage:")
print('   chat_with_trimming("Question", max_history=10)')
print("\n‚≠ê Benefits:")
print("   - Keeps full history for persistence")
print("   - Only sends recent messages to LLM")
print("   - Predictable token costs")
print("\n‚ö†Ô∏è  Trade-off:")
print("   - LLM can't see messages beyond the window")

In [None]:
# Demo: Simulate a long conversation
print("="*60)
print("SIMULATING LONG CONVERSATION WITH TRIMMING")
print("="*60)

# Reset conversation
conversation_history = []

# Simulate 15 turns (will trigger trimming with max_history=10)
for i in range(1, 6):
    print(f"\n--- Turn {i} ---")
    chat_with_trimming(f"Turn {i}: Tell me about PAN-OS feature #{i}", max_history=10)

print("\n" + "="*60)
print("FINAL ANALYSIS")
print("="*60)
print(f"Total messages in history: {len(conversation_history)}")
print(f"Messages sent to LLM on last turn: 10 (max_history limit)")
print(f"\nüí° Without trimming, last turn would have sent {len(conversation_history)} messages!")
print(f"   Token savings: {((len(conversation_history) - 10) / len(conversation_history) * 100):.1f}%")

---

## 10. Summary

Congratulations! You've built a conversational AI agent with memory. Let's recap what you've learned:

### What We Covered

1. **AIMessage and Union Types** - Tracking both sides of the conversation
   - `Union[HumanMessage, AIMessage]` allows mixed message types
   - Modern Python 3.10+ syntax: `list[HumanMessage | AIMessage]`
   - Single list preserves conversation order
   - Essential for conversation memory

2. **Memory-Enabled State** - Extending state to support conversation
   - Changed from `List[HumanMessage]` to `List[Union[HumanMessage, AIMessage]]`
   - Single type change enables full conversation tracking
   - State can now hold complete dialogue history

3. **Conversation Node with Memory** - Implementing the memory mechanism
   - `state["messages"].append(AIMessage(...))` saves AI responses
   - Each invoke builds on previous context
   - LLM sees full conversation history

4. **Manual History Management** - Synchronizing state across invocations
   - External `conversation_history` variable
   - `conversation_history = result["messages"]` synchronizes state
   - Pattern: add human message ‚Üí invoke ‚Üí sync history
   - **Critical:** Forgetting sync breaks memory!

5. **Real-World SCM Wizards** - Practical multi-turn workflows
   - NAT policy configuration across 4 turns
   - Address object creation with incremental data collection
   - Demonstrates value of memory for complex operations
   - Shows how context builds across interactions

6. **Challenges of Manual Management** - Understanding the pain points
   - Verbose and repetitive synchronization code
   - Error-prone (forget sync = lose context)
   - Manual state management is tedious
   - No automatic message merging
   - Boilerplate in every conversation function

7. **Conversation Persistence** - Saving conversations beyond runtime
   - JSON file storage for simple persistence
   - Database options for production
   - Checkpointing as the ultimate solution (next notebook!)

8. **Token Cost Management** - Controlling growing costs
   - Costs grow linearly with conversation length
   - Window trimming keeps recent context
   - Trade-off: cost savings vs. context loss

### Why Manual Memory Management Matters

Understanding this manual approach is crucial because:

- **Fundamentals**: You now understand HOW conversation memory actually works
- **Debugging**: When things go wrong, you can trace the message flow
- **Appreciation**: You'll appreciate automated solutions (reducers, checkpointing) more
- **Control**: You know when to use manual vs. automatic approaches

### The Problems We Identified

1. ‚ùå **Manual synchronization is tedious**: `conversation_history = result["messages"]` every time
2. ‚ùå **Error-prone**: Forget ONE sync and lose context completely
3. ‚ùå **No automatic persistence**: Need custom code to save conversations
4. ‚ùå **Token costs grow unbounded**: Need manual trimming logic
5. ‚ùå **Boilerplate everywhere**: Lots of repetitive code
6. ‚ùå **State management burden**: Developer responsible for all synchronization

---

## üöÄ Next: The Better Way with Reducers and Tools

In **notebook 110**, we'll solve ALL of these problems with LangGraph's powerful features:

### 1. Automatic Message Merging with Reducers

**Manual approach (this notebook):**
```python
class AgentState(TypedDict):
    messages: List[Union[HumanMessage, AIMessage]]

# Must manually sync every time:
conversation_history.append(HumanMessage(content=msg))
result = agent.invoke({"messages": conversation_history})
conversation_history = result["messages"]  # Don't forget!
```

**Automatic approach (notebook 110):**
```python
from langgraph.graph.message import add_messages
from typing import Annotated

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]  # ‚≠ê Auto-merge!

# Just invoke - merging happens automatically:
result = agent.invoke({"messages": [HumanMessage(content=msg)]})
# No manual sync needed! The reducer handles everything!
```

### 2. Tool Calling - Agents That Take Actions

**Problem:** Our current bot can only TALK about configurations - it can't actually DO anything.

**Solution (notebook 110):**
```python
from langchain_core.tools import tool

@tool
def get_firewall_version(hostname: str) -> str:
    """Get the PAN-OS version for a firewall."""
    # Could call SCM API here!
    return "10.1.0"

@tool
def create_address_object(name: str, ip: str, folder: str) -> str:
    """Create an address object in SCM."""
    # Could use pan-scm-sdk here!
    return f"Created {name} with IP {ip}"

# Agent can now:
# - Decide which tool to use
# - Call tools with correct parameters
# - Reason about tool results
# - Take actual configuration actions
```

### 3. ReAct Pattern - Reasoning and Acting

**Notebook 109:** Linear execution (START ‚Üí process ‚Üí END)

**Notebook 110:** Intelligent loops:
```
1. REASON: "User wants firewall version, I should use get_firewall_version tool"
2. ACT: Call get_firewall_version("fw-prod-01")
3. OBSERVE: "Version is 10.1.0"
4. REASON: "Now I can answer the user's question"
5. RESPOND: "Your firewall fw-prod-01 is running PAN-OS 10.1.0"
```

### 4. What You'll Build Next

In notebook 110, you'll create a PAN-OS agent that can:

‚úÖ **Remember conversations** (with automatic reducers)  
‚úÖ **Call real tools** (check versions, create objects, modify configs)  
‚úÖ **Reason about actions** (ReAct pattern)  
‚úÖ **Handle multi-step tasks** (plan ‚Üí act ‚Üí observe ‚Üí respond)  
‚úÖ **No manual sync required** (reducers handle everything)  

### Key Differences Summary

| Feature | Notebook 109 (Manual) | Notebook 110 (Automatic) |
|---------|----------------------|--------------------------|
| **Memory** | Manual sync required | Automatic with reducers |
| **State Merging** | `history = result["messages"]` | `Annotated[list, add_messages]` |
| **Actions** | Can only talk | Can use tools |
| **Pattern** | Linear (START‚ÜíEND) | ReAct (Reason‚ÜíAct‚ÜíObserve) |
| **Code** | Lots of boilerplate | Clean and concise |
| **Error Risk** | High (forget sync) | Low (automatic) |

---

### Key Takeaways

‚úÖ Conversation memory requires tracking BOTH human and AI messages  
‚úÖ Manual memory means appending AI responses to state  
‚úÖ History synchronization is critical across invocations  
‚úÖ Forgetting to sync breaks memory completely  
‚úÖ Real-world wizards (NAT, address objects) show practical value  
‚úÖ Persistence and cost management are real production concerns  
‚úÖ LangGraph provides better solutions (reducers + tools)  

### Practice Exercises

**Want more practice?** Try these exercises:

1. **Implement summarization trimming**: Instead of window trimming, use an LLM to summarize old messages
2. **Add conversation export**: Create a function to export conversations to markdown format
3. **Build a security policy wizard**: Multi-turn wizard for creating security rules
4. **Implement conversation branching**: Save/load different conversation threads
5. **Add conversation analytics**: Track average message length, turn count, common topics
6. **Create conversation search**: Find past conversations by keyword

### Ready for Notebook 110?

You now have a **deep understanding** of how conversation memory works at the fundamental level. This knowledge will make notebook 110's automatic features much more meaningful - you'll understand what they're doing under the hood.

**In notebook 110**, you'll learn:
- ‚ú® The `add_messages` reducer for automatic message merging
- ‚ú® Creating and using tools with `@tool` decorator
- ‚ú® The ReAct (Reasoning and Acting) agent pattern
- ‚ú® Building agents that can take real actions with SCM
- ‚ú® No more manual synchronization - ever!

**Remember:** You don't need to memorize all of this! The important thing is understanding that:
- Conversation memory requires tracking message history
- This can be done manually (tedious but instructive)
- LangGraph's reducers automate the tedious parts
- Understanding manual approach helps you debug and customize

Great work! You're now ready for ReAct agents with automatic memory and tools! üöÄ

---

**Continue to:** [110 LangGraph: ReAct Agents with Tools](110_react_agents_with_tools.ipynb)