# Conversation Memory trong LangChain

## 🧠 Tại sao cần Memory?

### **Vấn đề với LLMs**
Large Language Models về cơ bản là **stateless** - chúng không "nhớ" gì về các conversations trước đó. Mỗi lần call LLM là một interaction độc lập:

```
User: "Tên tôi là Nam"
LLM:  "Chào Nam! Rất vui được gặp bạn."

User: "Bạn có nhớ tên tôi không?"
LLM:  "Xin lỗi, tôi không có thông tin về tên của bạn." ❌
```

### **Tại sao Memory quan trọng?**

#### **1. 🤝 Natural Conversations**
- Con người expect chatbots "nhớ" context
- Conversations flow naturally khi có memory
- Tránh phải repeat information

#### **2. 🎯 Personalized Experience**
- Remember user preferences
- Adapt responses based on conversation history
- Build rapport over time

#### **3. 🔄 Multi-turn Tasks**
- Complex tasks require multiple exchanges
- Context from previous turns informs current response
- Enable iterative problem-solving

#### **4. 📚 Context Continuity**
- Maintain thread of conversation
- Reference previous topics naturally
- Build on established context

### **Memory Solution**
LangChain Memory components giải quyết vấn đề này bằng cách:
- **Store** conversation history
- **Retrieve** relevant context
- **Format** context cho LLM
- **Manage** memory size và performance

## Setup và Dependencies

In [None]:
# Import các thư viện cần thiết
import os
from dotenv import load_dotenv

# LangChain Core
from langchain.chains import ConversationChain
from langchain.memory import (
    ConversationBufferMemory,
    ConversationBufferWindowMemory,
    ConversationSummaryMemory,
    ConversationSummaryBufferMemory
)
from langchain_core.prompts import PromptTemplate

# LangChain Anthropic
from langchain_anthropic import ChatAnthropic

# Utilities
import time
from datetime import datetime

# Load environment variables
load_dotenv()

print("✅ Dependencies imported successfully")

In [None]:
# Khởi tạo ChatAnthropic
llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    temperature=0.7,  # Slightly creative for natural conversation
    anthropic_api_key=os.getenv("ANTHROPIC_API_KEY")
)

# Test LLM
test_response = llm.invoke("Hello! How are you today?")
print("✅ ChatAnthropic initialized")
print(f"Test response: {test_response.content[:100]}...")

## 1. No Memory - Baseline Comparison

In [None]:
# Tạo conversation chain WITHOUT memory
no_memory_chain = ConversationChain(
    llm=llm,
    verbose=True
)

print("🚫 Conversation Chain WITHOUT Memory")
print("=" * 50)

# Simulate conversation without memory
print("\n👤 User: Tên tôi là Alice và tôi đang học về AI")
response1 = no_memory_chain.predict(input="Tên tôi là Alice và tôi đang học về AI")
print(f"🤖 Assistant: {response1}")

print("\n" + "-" * 50)

print("\n👤 User: Bạn có nhớ tên tôi không?")
response2 = no_memory_chain.predict(input="Bạn có nhớ tên tôi không?")
print(f"🤖 Assistant: {response2}")

print("\n💭 Observation: Chain không nhớ previous conversation!")

## 2. ConversationBufferMemory - Store All History

In [None]:
# Tạo ConversationBufferMemory
buffer_memory = ConversationBufferMemory(
    return_messages=True,  # Return as message objects
    memory_key="history"   # Key để store history trong prompt
)

print("📚 ConversationBufferMemory created")
print(f"Memory type: {type(buffer_memory)}")
print(f"Memory key: {buffer_memory.memory_key}")

# Inspect empty memory
print(f"\nEmpty memory variables: {buffer_memory.load_memory_variables({})}")

In [None]:
# Tạo conversation chain WITH buffer memory
buffer_chain = ConversationChain(
    llm=llm,
    memory=buffer_memory,
    verbose=True
)

print("🧠 Conversation Chain WITH ConversationBufferMemory")
print("=" * 60)

# Start conversation
print("\n👤 User: Xin chào! Tên tôi là Bob và tôi là một developer Python")
response1 = buffer_chain.predict(input="Xin chào! Tên tôi là Bob và tôi là một developer Python")
print(f"🤖 Assistant: {response1}")

In [None]:
# Continue conversation
print("\n" + "-" * 60)
print("\n👤 User: Tôi đang học về LangChain để build chatbots")
response2 = buffer_chain.predict(input="Tôi đang học về LangChain để build chatbots")
print(f"🤖 Assistant: {response2}")

In [None]:
# Test memory recall
print("\n" + "-" * 60)
print("\n👤 User: Bạn có nhớ tên tôi và công việc của tôi không?")
response3 = buffer_chain.predict(input="Bạn có nhớ tên tôi và công việc của tôi không?")
print(f"🤖 Assistant: {response3}")

print("\n✅ Observation: Chain now remembers previous conversation!")

In [None]:
# Inspect memory contents
print("\n🔍 Memory Contents Inspection:")
print("=" * 40)

memory_vars = buffer_memory.load_memory_variables({})
print(f"Memory variables keys: {list(memory_vars.keys())}")

# Show conversation history
history = memory_vars['history']
print(f"\nConversation History ({len(history)} messages):")
for i, message in enumerate(history):
    role = "👤 Human" if message.type == "human" else "🤖 AI"
    content = message.content[:100] + "..." if len(message.content) > 100 else message.content
    print(f"{i+1}. {role}: {content}")

## 3. ConversationBufferWindowMemory - Limited History

In [None]:
# Tạo ConversationBufferWindowMemory với window size
window_memory = ConversationBufferWindowMemory(
    k=2,  # Keep only last 2 exchanges (4 messages: human + ai + human + ai)
    return_messages=True,
    memory_key="history"
)

print("🪟 ConversationBufferWindowMemory created")
print(f"Window size (k): {window_memory.k}")
print(f"This keeps last {window_memory.k} conversation exchanges")

In [None]:
# Tạo conversation chain với window memory
window_chain = ConversationChain(
    llm=llm,
    memory=window_memory,
    verbose=True
)

print("🪟 Conversation Chain WITH ConversationBufferWindowMemory (k=2)")
print("=" * 70)

# Start long conversation để test window behavior
conversations = [
    "Tôi tên là Charlie và tôi 25 tuổi",
    "Tôi làm việc tại một công ty fintech ở TP.HCM",
    "Sở thích của tôi là chơi guitar và đọc sách",
    "Hiện tại tôi đang học machine learning",
    "Bạn có nhớ tên và tuổi của tôi không?"
]

responses = []
for i, user_input in enumerate(conversations, 1):
    print(f"\n--- Exchange {i} ---")
    print(f"👤 User: {user_input}")
    
    response = window_chain.predict(input=user_input)
    responses.append(response)
    print(f"🤖 Assistant: {response}")
    
    # Show current memory window
    current_memory = window_memory.load_memory_variables({})['history']
    print(f"📊 Current memory window size: {len(current_memory)} messages")

In [None]:
# Analyze window memory behavior
print("\n🔍 Window Memory Analysis:")
print("=" * 40)

final_memory = window_memory.load_memory_variables({})['history']
print(f"Final memory contains {len(final_memory)} messages:")

for i, message in enumerate(final_memory):
    role = "👤 Human" if message.type == "human" else "🤖 AI"
    content = message.content[:80] + "..." if len(message.content) > 80 else message.content
    print(f"{i+1}. {role}: {content}")

print(f"\n💭 Observation: Window memory forgotten early information (name, age)")
print(f"   but remembers recent context (machine learning)")

## 4. So sánh Buffer vs Window Memory

In [None]:
# Function để test memory recall
def test_memory_recall(chain, memory_type, test_questions):
    """Test memory recall capability"""
    print(f"\n🧪 Testing {memory_type} Memory Recall")
    print("=" * 50)
    
    results = []
    for question in test_questions:
        print(f"\n❓ Question: {question}")
        try:
            response = chain.predict(input=question)
            print(f"💬 Response: {response[:150]}...")
            
            # Simple heuristic để check if information is recalled
            recalled = any(keyword in response.lower() for keyword in ['charlie', '25', 'fintech', 'guitar'])
            results.append(recalled)
            print(f"📝 Recall detected: {'✅' if recalled else '❌'}")
            
        except Exception as e:
            print(f"❌ Error: {e}")
            results.append(False)
    
    success_rate = sum(results) / len(results) * 100
    print(f"\n📊 Recall Success Rate: {success_rate:.1f}%")
    return results

# Test questions
test_questions = [
    "Tên tôi là gì?",
    "Tôi bao nhiêu tuổi?",
    "Tôi làm việc ở đâu?",
    "Sở thích của tôi là gì?"
]

In [None]:
# Reset memories và test both
# Create fresh memories
fresh_buffer_memory = ConversationBufferMemory(return_messages=True)
fresh_window_memory = ConversationBufferWindowMemory(k=2, return_messages=True)

fresh_buffer_chain = ConversationChain(llm=llm, memory=fresh_buffer_memory, verbose=False)
fresh_window_chain = ConversationChain(llm=llm, memory=fresh_window_memory, verbose=False)

# Setup conversations for both chains
setup_conversations = [
    "Tôi tên là David và tôi 30 tuổi",
    "Tôi là software engineer tại Google",
    "Tôi thích chơi tennis và du lịch",
    "Gần đây tôi đang quan tâm đến blockchain"
]

print("🔄 Setting up conversations for comparison...")

# Feed same conversation to both chains
for conv in setup_conversations:
    fresh_buffer_chain.predict(input=conv)
    fresh_window_chain.predict(input=conv)

print("✅ Setup complete")

In [None]:
# Test both memory types
recall_questions = [
    "Tên tôi là gì?",
    "Tôi bao nhiêu tuổi?", 
    "Tôi làm việc ở công ty nào?",
    "Sở thích của tôi là gì?"
]

# Test buffer memory
buffer_results = test_memory_recall(fresh_buffer_chain, "Buffer", recall_questions)

# Test window memory  
window_results = test_memory_recall(fresh_window_chain, "Window", recall_questions)

# Comparison summary
print("\n📊 Memory Comparison Summary:")
print("=" * 40)
print(f"Buffer Memory Success: {sum(buffer_results)}/{len(buffer_results)} ({sum(buffer_results)/len(buffer_results)*100:.1f}%)")
print(f"Window Memory Success: {sum(window_results)}/{len(window_results)} ({sum(window_results)/len(window_results)*100:.1f}%)")

buffer_memory_size = len(fresh_buffer_memory.load_memory_variables({})['history'])
window_memory_size = len(fresh_window_memory.load_memory_variables({})['history'])

print(f"\nMemory Usage:")
print(f"Buffer Memory: {buffer_memory_size} messages stored")
print(f"Window Memory: {window_memory_size} messages stored")

## 5. Memory Management và Performance

In [None]:
# Analyze memory growth và performance
import sys

def analyze_memory_performance(memory_type, num_exchanges=10):
    """Analyze memory growth và performance"""
    print(f"\n📈 Memory Performance Analysis: {memory_type}")
    print("=" * 50)
    
    if memory_type == "buffer":
        test_memory = ConversationBufferMemory(return_messages=True)
    else:
        test_memory = ConversationBufferWindowMemory(k=3, return_messages=True)
    
    test_chain = ConversationChain(llm=llm, memory=test_memory, verbose=False)
    
    memory_sizes = []
    execution_times = []
    
    for i in range(num_exchanges):
        # Generate test conversation
        test_input = f"This is message number {i+1}. I'm talking about topic {i+1}."
        
        # Measure execution time
        start_time = time.time()
        response = test_chain.predict(input=test_input)
        execution_time = time.time() - start_time
        
        # Measure memory size
        current_memory = test_memory.load_memory_variables({})['history']
        memory_size = len(current_memory)
        
        memory_sizes.append(memory_size)
        execution_times.append(execution_time)
        
        if i % 3 == 0:  # Print every 3rd exchange
            print(f"Exchange {i+1:2d}: Memory={memory_size:2d} messages, Time={execution_time:.2f}s")
    
    # Summary statistics
    avg_time = sum(execution_times) / len(execution_times)
    final_memory_size = memory_sizes[-1]
    
    print(f"\n📊 Summary:")
    print(f"   Final memory size: {final_memory_size} messages")
    print(f"   Average execution time: {avg_time:.2f}s")
    print(f"   Memory growth pattern: {memory_sizes[0]} → {final_memory_size}")
    
    return memory_sizes, execution_times

# Test both memory types
buffer_sizes, buffer_times = analyze_memory_performance("buffer", 8)
window_sizes, window_times = analyze_memory_performance("window", 8)

In [None]:
# Performance comparison
print("\n⚖️ Performance Comparison:")
print("=" * 40)

print(f"Buffer Memory:")
print(f"   Average time: {sum(buffer_times)/len(buffer_times):.3f}s")
print(f"   Memory growth: {buffer_sizes[0]} → {buffer_sizes[-1]} messages")
print(f"   Final memory size: {buffer_sizes[-1]} messages")

print(f"\nWindow Memory (k=3):")
print(f"   Average time: {sum(window_times)/len(window_times):.3f}s")
print(f"   Memory pattern: {window_sizes}")
print(f"   Stable memory size: {window_sizes[-1]} messages")

print(f"\n💡 Insights:")
print(f"   📈 Buffer memory grows linearly với conversation length")
print(f"   🔄 Window memory maintains constant size after initial fill")
print(f"   ⚡ Window memory có predictable performance characteristics")
print(f"   💰 Window memory more cost-effective for long conversations")

## 6. Practical Use Cases và Recommendations

In [None]:
# Use case demonstrations
def demonstrate_use_cases():
    print("🎯 Memory Type Recommendations")
    print("=" * 40)
    
    use_cases = {
        "ConversationBufferMemory": {
            "best_for": [
                "📝 Short conversations (< 10 exchanges)",
                "📚 Educational tutoring sessions", 
                "🔍 Detailed analysis tasks",
                "📋 Interview or survey applications",
                "🎯 When full context is critical"
            ],
            "considerations": [
                "⚠️ Memory grows linearly",
                "💰 Higher token costs over time",
                "🐌 Slower responses với long history",
                "🧠 Perfect recall của all information"
            ]
        },
        "ConversationBufferWindowMemory": {
            "best_for": [
                "💬 Long-running chatbots",
                "🎮 Gaming companions",
                "🛒 E-commerce assistants",
                "📱 Mobile app assistants", 
                "🔄 Ongoing customer support"
            ],
            "considerations": [
                "✅ Predictable memory usage",
                "💰 Constant token costs",
                "⚡ Consistent performance",
                "😔 Forgets old information"
            ]
        }
    }
    
    for memory_type, info in use_cases.items():
        print(f"\n🧠 {memory_type}:")
        print(f"   Best for:")
        for use_case in info["best_for"]:
            print(f"     {use_case}")
        print(f"   Considerations:")
        for consideration in info["considerations"]:
            print(f"     {consideration}")

demonstrate_use_cases()

In [None]:
# Decision framework
def memory_selection_guide():
    print("\n🤔 Memory Selection Decision Tree:")
    print("=" * 40)
    
    decision_tree = """
    📝 Conversation Length?
    ├── Short (< 10 exchanges)
    │   └── 🧠 ConversationBufferMemory
    │       ✅ Full context preserved
    │       ✅ Best accuracy
    │
    └── Long (> 10 exchanges)
        ├── 💰 Cost Sensitive?
        │   ├── Yes → 🪟 ConversationBufferWindowMemory
        │   │   ✅ Predictable costs
        │   │   ✅ Stable performance
        │   │
        │   └── No → 📚 ConversationSummaryMemory
        │       ✅ Compressed history
        │       ✅ Key information retained
        │
        └── 🎯 Context Importance?
            ├── Critical → 🧠 ConversationBufferMemory
            │   ⚠️ Watch token limits
            │
            └── Moderate → 🪟 ConversationBufferWindowMemory
                ⚙️ Tune window size (k=3-7)
    """
    
    print(decision_tree)

memory_selection_guide()

## 7. Advanced Memory Techniques

In [None]:
# Custom memory với filtering
class FilteredConversationMemory(ConversationBufferMemory):
    """Custom memory với content filtering"""
    
    def __init__(self, filter_keywords=None, **kwargs):
        super().__init__(**kwargs)
        self.filter_keywords = filter_keywords or []
    
    def save_context(self, inputs, outputs):
        """Save context với filtering"""
        # Check if we should filter this exchange
        input_text = inputs.get('input', '').lower()
        output_text = outputs.get('response', '').lower()
        
        # Skip saving if contains filter keywords
        if any(keyword in input_text or keyword in output_text 
               for keyword in self.filter_keywords):
            print(f"🚫 Filtered out exchange containing sensitive keywords")
            return
        
        # Save normally if no filters triggered
        super().save_context(inputs, outputs)

# Test filtered memory
filtered_memory = FilteredConversationMemory(
    filter_keywords=['password', 'secret', 'confidential'],
    return_messages=True
)

filtered_chain = ConversationChain(
    llm=llm,
    memory=filtered_memory,
    verbose=False
)

print("🔒 Testing Filtered Memory:")
print("=" * 30)

# Test normal conversation
print("\n👤 User: Tôi tên là Emma")
response1 = filtered_chain.predict(input="Tôi tên là Emma")
print(f"🤖 Assistant: {response1[:80]}...")

# Test filtered content
print("\n👤 User: Đây là password của tôi: 123456")
response2 = filtered_chain.predict(input="Đây là password của tôi: 123456")
print(f"🤖 Assistant: {response2[:80]}...")

# Check memory contents
memory_contents = filtered_memory.load_memory_variables({})['history']
print(f"\n📊 Memory contains {len(memory_contents)} messages (filtered content excluded)")

In [None]:
# Memory with priority system
class PriorityConversationMemory(ConversationBufferWindowMemory):
    """Memory with priority-based retention"""
    
    def __init__(self, priority_keywords=None, **kwargs):
        super().__init__(**kwargs)
        self.priority_keywords = priority_keywords or []
        self.priority_messages = []  # Store high-priority messages separately
    
    def save_context(self, inputs, outputs):
        """Save context với priority handling"""
        input_text = inputs.get('input', '').lower()
        
        # Check if this is high priority
        is_priority = any(keyword in input_text for keyword in self.priority_keywords)
        
        if is_priority:
            # Store in priority memory
            priority_entry = {
                'input': inputs.get('input', ''),
                'output': outputs.get('response', ''),
                'timestamp': datetime.now().isoformat()
            }
            self.priority_messages.append(priority_entry)
            print(f"⭐ High priority message saved!")
        
        # Save normally as well
        super().save_context(inputs, outputs)
    
    def load_memory_variables(self, inputs):
        """Load memory including priority messages"""
        # Get regular window memory
        memory_vars = super().load_memory_variables(inputs)
        
        # Add priority context if exists
        if self.priority_messages:
            priority_context = "\nImportant previous information:\n"
            for msg in self.priority_messages[-3:]:  # Last 3 priority messages
                priority_context += f"- {msg['input'][:50]}...\n"
            
            # This would need proper integration với prompt template
            # For demo, we'll just show the concept
        
        return memory_vars

# Test priority memory
priority_memory = PriorityConversationMemory(
    k=2,
    priority_keywords=['important', 'remember', 'critical'],
    return_messages=True
)

priority_chain = ConversationChain(
    llm=llm,
    memory=priority_memory,
    verbose=False
)

print("\n⭐ Testing Priority Memory:")
print("=" * 30)

test_messages = [
    "Tôi thích ăn pizza",
    "Important: Tôi allergic với peanuts",
    "Hôm nay trời đẹp",
    "Remember: Cuộc họp lúc 3pm",
    "Tôi đang đọc sách"
]

for msg in test_messages:
    print(f"\n👤 User: {msg}")
    response = priority_chain.predict(input=msg)

print(f"\n📊 Priority messages stored: {len(priority_memory.priority_messages)}")
for i, priority_msg in enumerate(priority_memory.priority_messages):
    print(f"   {i+1}. {priority_msg['input']}")

## 8. Production Considerations

In [None]:
# Production memory management
def production_memory_tips():
    print("🏭 Production Memory Management")
    print("=" * 40)
    
    tips = {
        "🔒 Security": [
            "Never store sensitive information in memory",
            "Implement content filtering for PII",
            "Regular memory cleanup for compliance",
            "Encrypt memory storage if persistent"
        ],
        "💰 Cost Optimization": [
            "Monitor token usage with memory size",
            "Use window memory for long conversations", 
            "Implement conversation summarization",
            "Set reasonable memory limits"
        ],
        "⚡ Performance": [
            "Cache frequent memory operations",
            "Async memory operations when possible",
            "Batch memory updates",
            "Monitor memory access patterns"
        ],
        "🔧 Scalability": [
            "Use external storage for persistence",
            "Implement memory sharding for users",
            "Stateless memory management",
            "Load balancing considerations"
        ],
        "📊 Monitoring": [
            "Track memory size growth",
            "Monitor conversation quality",
            "Alert on memory anomalies",
            "User satisfaction metrics"
        ]
    }
    
    for category, tip_list in tips.items():
        print(f"\n{category}:")
        for tip in tip_list:
            print(f"   • {tip}")

production_memory_tips()

In [None]:
# Example production configuration
def create_production_memory_config(user_id, conversation_type="general"):
    """Create production-ready memory configuration"""
    
    configs = {
        "customer_support": {
            "memory_type": "window",
            "window_size": 5,
            "max_conversation_length": 50,
            "filter_pii": True
        },
        "educational": {
            "memory_type": "buffer", 
            "max_conversation_length": 20,
            "summarize_after": 15,
            "filter_pii": True
        },
        "general": {
            "memory_type": "window",
            "window_size": 3,
            "max_conversation_length": 30,
            "filter_pii": True
        }
    }
    
    config = configs.get(conversation_type, configs["general"])
    config["user_id"] = user_id
    config["created_at"] = datetime.now().isoformat()
    
    return config

# Example configurations
print("🏭 Production Memory Configurations:")
print("=" * 40)

examples = [
    ("user123", "customer_support"),
    ("student456", "educational"),
    ("visitor789", "general")
]

for user_id, conv_type in examples:
    config = create_production_memory_config(user_id, conv_type)
    print(f"\n👤 {user_id} ({conv_type}):")
    for key, value in config.items():
        if key != 'created_at':
            print(f"   {key}: {value}")

## 9. Summary và Best Practices

In [None]:
# Final summary
def memory_best_practices_summary():
    print("📚 Conversation Memory: Key Takeaways")
    print("=" * 50)
    
    summary = {
        "🧠 Memory Types": {
            "ConversationBufferMemory": "Stores all history, perfect recall",
            "ConversationBufferWindowMemory": "Fixed-size window, recent focus",
            "ConversationSummaryMemory": "Compressed history, key points",
            "Custom Memory": "Tailored for specific use cases"
        },
        "✅ When to Use What": {
            "Short conversations": "ConversationBufferMemory",
            "Long conversations": "ConversationBufferWindowMemory", 
            "Cost-sensitive apps": "ConversationBufferWindowMemory",
            "Critical context": "ConversationBufferMemory"
        },
        "⚡ Performance Tips": [
            "Monitor memory size growth",
            "Set appropriate window sizes (k=3-7)",
            "Use async operations when possible",
            "Implement memory cleanup routines"
        ],
        "🔒 Security Considerations": [
            "Filter sensitive information",
            "Implement data retention policies", 
            "Encrypt stored conversations",
            "Regular security audits"
        ],
        "💰 Cost Management": [
            "Window memory for predictable costs",
            "Summary memory for compression",
            "Monitor token usage patterns",
            "Set conversation length limits"
        ]
    }
    
    for section, content in summary.items():
        print(f"\n{section}:")
        if isinstance(content, dict):
            for key, value in content.items():
                print(f"   • {key}: {value}")
        elif isinstance(content, list):
            for item in content:
                print(f"   • {item}")

memory_best_practices_summary()

print("\n🎯 Next Steps:")
print("   1. Experiment với different memory types")
print("   2. Build custom memory solutions")
print("   3. Integrate memory với production systems")
print("   4. Monitor và optimize memory performance")
print("   5. Explore advanced memory patterns")

print("\n🎉 Conversation Memory Tutorial Complete!")