# Live Chat Testing Notebook

This notebook provides a live chat interface for real-time testing of the Human-in-the-Loop AI system.
A human user can chat with the chatbot and see real-time quality and frustration assessments.

## Features:
- Interactive chat window for real human-chatbot conversations
- Real-time quality scoring after each chatbot response
- Real-time frustration detection after each user message
- Live display of agent assessments and scores
- Export complete conversation with all scoring data
- Perfect for demos, validation, and real-world testing

## Getting Started:
1. Run cells in order from top to bottom
2. Configure agents as needed
3. Use the chat interface to have a real conversation
4. Watch scores update in real-time below each interaction
5. Export results when finished

## Use Cases:
- **Stakeholder Demos**: Show live AI quality and frustration detection
- **System Validation**: Test with real human interactions vs simulated
- **Configuration Tuning**: Immediately see impact of settings changes
- **Training Data**: Generate high-quality human-validated conversation examples

In [None]:
# Import required libraries
import json
import os
import sys
from datetime import datetime, timedelta
from pathlib import Path
import uuid
import asyncio
from typing import Dict, List, Any

import ipywidgets as widgets
import pandas as pd
from ruamel.yaml import YAML
from IPython.display import display, HTML, clear_output

# Set the working directory to the root of the project
os.chdir('/workspace')

# Add workspace to path for imports
sys.path.insert(0, '/workspace')

# Import our system components
from src.nodes.chatbot_agent import ChatbotAgentNode
from src.nodes.quality_agent import QualityAgentNode
from src.nodes.frustration_agent import FrustrationAgentNode
from src.core.config.agent_config_manager import AgentConfigManager
from src.core.context_manager import ContextManager
from src.integrations.llm_providers import LLMProviderFactory
from src.interfaces.core.state_schema import HybridSystemState
from src.interfaces.core.context import ContextEntry

print("✅ All libraries imported successfully!")
print("Ready to start live chat testing with real-time agent scoring.")

## Step 1: Initialize AI Agents

Load and initialize the three AI agents for live testing:
- **Chatbot Agent**: Generates responses to user queries
- **Quality Agent**: Scores chatbot response quality in real-time
- **Frustration Agent**: Detects user frustration levels

In [None]:
# Initialize all three agents
chatbot_agent = None
quality_agent = None
frustration_agent = None
context_provider = None

def initialize_agents():
    """Initialize all agents for live chat testing"""
    global chatbot_agent, quality_agent, frustration_agent, context_provider
    
    try:
        print("🔧 Initializing agents for live chat testing...")
        
        # Initialize shared context provider (in-memory for demo)
        context_provider = ContextManager(db_path=":memory:")
        print("✅ Context provider initialized")
        
        # Initialize Chatbot Agent
        config_manager = AgentConfigManager('/workspace/config')
        chatbot_agent = ChatbotAgentNode(
            config_manager=config_manager,
            context_provider=context_provider
        )
        print(f"✅ Chatbot Agent initialized: {chatbot_agent.llm_provider.model_name if chatbot_agent.llm_provider else 'Unknown model'}")
        
        # Initialize Quality Agent
        quality_config_manager = AgentConfigManager(
            config_dir='/workspace/config',
            agent_name='quality_agent'
        )
        quality_agent = QualityAgentNode(
            config_manager=quality_config_manager,
            context_provider=context_provider
        )
        print(f"✅ Quality Agent initialized: {quality_agent.llm_provider.model_name if quality_agent.llm_provider else 'Unknown model'}")
        
        # Initialize Frustration Agent
        frustration_config_manager = AgentConfigManager(
            config_dir='/workspace/config',
            agent_name='frustration_agent'
        )
        frustration_agent = FrustrationAgentNode(
            config_manager=frustration_config_manager,
            context_provider=context_provider
        )
        print(f"✅ Frustration Agent initialized: {frustration_agent.llm_provider.model_name if frustration_agent.llm_provider else 'Unknown model'}")
        
        return True
        
    except Exception as e:
        print(f"❌ Error initializing agents: {e}")
        import traceback
        traceback.print_exc()
        return False

# Initialize agents
if initialize_agents():
    print("\n🚀 All agents ready for live chat testing!")
    print("\n📋 Agent Configuration Summary:")
    if chatbot_agent and chatbot_agent.agent_config:
        print(f"  Chatbot: {chatbot_agent.agent_config.agent_info['name']} v{chatbot_agent.agent_config.agent_info['version']}")
    if quality_agent and quality_agent.agent_config:
        quality_thresholds = quality_agent.agent_config.settings.get('quality_thresholds', {})
        print(f"  Quality Agent: Adequate ≥{quality_thresholds.get('adequate_score', 7.0)}, Adjustment ≤{quality_thresholds.get('adjustment_score', 5.0)}")
    if frustration_agent and frustration_agent.agent_config:
        frustration_thresholds = frustration_agent.agent_config.settings.get('frustration_thresholds', {})
        print(f"  Frustration Agent: High ≥{frustration_thresholds.get('high', 6.0)}, Critical ≥{frustration_thresholds.get('critical', 8.0)}")
else:
    print("\n❌ Agent initialization failed. Please check configuration and try again.")

## Step 2: Configure Chat Session

Set up the chat session parameters and customer profile for testing.

In [None]:
# Chat session configuration
chat_config = {
    'session_id': f"live_chat_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
    'user_id': f"user_{uuid.uuid4().hex[:8]}",
    'customer_type': 'normal',  # Will be updated based on user behavior
    'complexity': 'medium'      # Will be updated based on queries
}

# Conversation tracking
conversation_history = []
turn_number = 0
live_assessments = []

print(f"🆔 Chat Session Configuration:")
print(f"  Session ID: {chat_config['session_id']}")
print(f"  User ID: {chat_config['user_id']}")
print(f"  Initial Customer Type: {chat_config['customer_type']}")
print(f"  Initial Complexity: {chat_config['complexity']}")

# Customer profile options
print(f"\n👥 Customer Profile Options:")
customer_type_selector = widgets.Dropdown(
    options=['normal', 'frustrated', 'polite', 'technical', 'non_technical'],
    value='normal',
    description='Customer Type:'
)

complexity_selector = widgets.Dropdown(
    options=['simple', 'medium', 'complex'],
    value='medium',
    description='Query Complexity:'
)

def update_chat_config(change):
    """Update chat configuration when selectors change"""
    chat_config['customer_type'] = customer_type_selector.value
    chat_config['complexity'] = complexity_selector.value
    print(f"Updated: Customer Type = {chat_config['customer_type']}, Complexity = {chat_config['complexity']}")

customer_type_selector.observe(update_chat_config, names='value')
complexity_selector.observe(update_chat_config, names='value')

display(widgets.HBox([customer_type_selector, complexity_selector]))

print(f"\n💡 Tips:")
print(f"  • Customer Type affects chatbot response style")
print(f"  • Query Complexity influences routing decisions")
print(f"  • These can be changed during the conversation")
print(f"  • Agents will also detect these automatically")

## Step 3: Live Chat Interface

Interactive chat interface with real-time AI agent scoring.
Type your messages and see live quality and frustration assessments.

In [None]:
# Create chat interface
chat_output = widgets.Output(layout={'border': '1px solid #ccc', 'height': '400px', 'overflow': 'auto'})
message_input = widgets.Text(
    placeholder="Type your message here and press Enter...",
    layout={'width': '70%'}
)
send_button = widgets.Button(
    description="Send",
    button_style='primary',
    layout={'width': '10%'}
)
clear_button = widgets.Button(
    description="Clear Chat",
    button_style='warning',
    layout={'width': '15%'}
)

# Score display area
score_output = widgets.Output(layout={'border': '1px solid #ddd', 'height': '200px', 'overflow': 'auto'})

def create_state_for_chat(query: str, ai_response: str = None) -> HybridSystemState:
    """Create HybridSystemState for chat interaction"""
    global turn_number
    turn_number += 1
    
    return HybridSystemState({
        "query_id": f"live_chat_{chat_config['session_id']}_{turn_number}",
        "user_id": chat_config['user_id'],
        "session_id": chat_config['session_id'],
        "timestamp": datetime.now(),
        "query": query,
        "ai_response": ai_response,
        "customer_type": chat_config['customer_type'],
        "complexity": chat_config['complexity'],
        "turn_number": turn_number,
        "conversation_metadata": {
            "live_chat": True,
            "total_turns": turn_number,
            "session_start": chat_config['session_id']
        }
    })

def get_chatbot_response(user_message: str) -> tuple:
    """Get chatbot response and return (response, confidence, metadata)"""
    try:
        state = create_state_for_chat(user_message)
        result_state = chatbot_agent(state)
        
        response = result_state.get('ai_response', 'Sorry, I encountered an error.')
        confidence = result_state.get('confidence', 0.5)
        metadata = result_state.get('response_metadata', {})
        
        return response, confidence, metadata
        
    except Exception as e:
        return f"Error generating response: {str(e)}", 0.0, {'error': str(e)}

def assess_quality(user_message: str, chatbot_response: str) -> dict:
    """Assess response quality in real-time"""
    try:
        state = create_state_for_chat(user_message, chatbot_response)
        result_state = quality_agent(state)
        
        quality_assessment = result_state.get('quality_assessment', {})
        
        return {
            'score': quality_assessment.get('overall_score', 0.0),
            'decision': quality_assessment.get('decision', 'unknown'),
            'confidence': quality_assessment.get('confidence', 0.0),
            'reasoning': quality_assessment.get('reasoning', 'No reasoning available'),
            'adjustment_needed': quality_assessment.get('adjustment_needed', False),
            'adjusted_response': quality_assessment.get('adjusted_response')
        }
        
    except Exception as e:
        return {
            'score': 0.0,
            'decision': 'error',
            'confidence': 0.0,
            'reasoning': f'Quality assessment error: {str(e)}',
            'adjustment_needed': False
        }

def assess_frustration(user_message: str) -> dict:
    """Assess user frustration in real-time"""
    try:
        # Add message to context history for pattern analysis
        context_entry = ContextEntry(
            entry_id=f"{chat_config['session_id']}_{turn_number}_query",
            user_id=chat_config['user_id'],
            session_id=chat_config['session_id'],
            timestamp=datetime.now(),
            entry_type="query",
            content=user_message,
            metadata={'turn_number': turn_number}
        )
        context_provider.save_context_entry(context_entry)
        
        state = create_state_for_chat(user_message)
        result_state = frustration_agent(state)
        
        frustration_analysis = result_state.get('frustration_analysis', {})
        
        return {
            'score': frustration_analysis.get('overall_score', 0.0),
            'level': frustration_analysis.get('overall_level', 'low'),
            'confidence': frustration_analysis.get('confidence', 0.0),
            'intervention_needed': result_state.get('frustration_intervention_needed', False),
            'contributing_factors': frustration_analysis.get('contributing_factors', []),
            'escalation_trend': frustration_analysis.get('history_analysis', {}).get('escalation_trend', 'stable')
        }
        
    except Exception as e:
        return {
            'score': 0.0,
            'level': 'error',
            'confidence': 0.0,
            'intervention_needed': False,
            'contributing_factors': [f'Frustration analysis error: {str(e)}'],
            'escalation_trend': 'unknown'
        }

def format_score_display(user_message: str, chatbot_response: str, quality_result: dict, frustration_result: dict) -> str:
    """Format the real-time scoring display"""
    html = f"""
    <div style='background: #f8f9fa; border-left: 4px solid #007bff; padding: 10px; margin: 5px 0;'>
        <strong>Turn {turn_number} - Real-time Assessment</strong><br>
        
        <div style='margin-top: 8px;'>
            <span style='background: #e7f3ff; padding: 2px 6px; border-radius: 3px; margin-right: 10px;'>
                🤖 <strong>Quality:</strong> {quality_result['score']:.1f}/10.0 
                ({quality_result['decision']} - {quality_result['confidence']:.2f} confidence)
            </span>
            
            <span style='background: #fff2e7; padding: 2px 6px; border-radius: 3px;'>
                😤 <strong>Frustration:</strong> {frustration_result['score']:.1f}/10.0 
                ({frustration_result['level']} - {frustration_result['confidence']:.2f} confidence)
            </span>
        </div>
        
        <div style='margin-top: 5px; font-size: 0.9em; color: #666;'>
            <strong>Quality:</strong> {quality_result['reasoning'][:100]}{'...' if len(quality_result['reasoning']) > 100 else ''}<br>
    """
    
    if frustration_result['contributing_factors']:
        factors = ', '.join(frustration_result['contributing_factors'][:3])
        html += f"<strong>Frustration factors:</strong> {factors}<br>"
    
    if quality_result['adjustment_needed'] and quality_result.get('adjusted_response'):
        html += f"<strong>Suggested improvement:</strong> {quality_result['adjusted_response'][:100]}{'...' if len(quality_result['adjusted_response']) > 100 else ''}<br>"
    
    if frustration_result['intervention_needed']:
        html += "<span style='color: red; font-weight: bold;'>⚠️ Human intervention recommended</span><br>"
    
    html += "</div></div>"
    
    return html

def process_message():
    """Process user message and generate response with real-time scoring"""
    user_message = message_input.value.strip()
    if not user_message:
        return
    
    with chat_output:
        # Display user message
        print(f"👤 You: {user_message}")
        
        # Assess frustration BEFORE chatbot responds (real-world scenario)
        frustration_result = assess_frustration(user_message)
        
        # Get chatbot response
        chatbot_response, confidence, metadata = get_chatbot_response(user_message)
        
        # Display chatbot response
        print(f"🤖 Chatbot: {chatbot_response}")
        
        # Assess response quality AFTER chatbot responds
        quality_result = assess_quality(user_message, chatbot_response)
        
        print("\n" + "="*60)
    
    # Display real-time scores
    with score_output:
        clear_output(wait=False)
        display(HTML(format_score_display(user_message, chatbot_response, quality_result, frustration_result)))
        
        # Show previous turns' scores for context
        if len(live_assessments) > 0:
            print(f"\n📊 Conversation Summary ({len(live_assessments)} previous turns):")
            avg_quality = sum(a['quality']['score'] for a in live_assessments[-5:]) / min(len(live_assessments), 5)
            avg_frustration = sum(a['frustration']['score'] for a in live_assessments[-5:]) / min(len(live_assessments), 5)
            interventions = sum(1 for a in live_assessments if a['frustration']['intervention_needed'])
            
            print(f"  Average Quality (last 5): {avg_quality:.1f}/10.0")
            print(f"  Average Frustration (last 5): {avg_frustration:.1f}/10.0")
            print(f"  Total Interventions Recommended: {interventions}")
    
    # Store turn data
    turn_data = {
        'turn_number': turn_number,
        'timestamp': datetime.now().isoformat(),
        'user_message': user_message,
        'chatbot_response': chatbot_response,
        'chatbot_confidence': confidence,
        'chatbot_metadata': metadata,
        'quality': quality_result,
        'frustration': frustration_result,
        'customer_type': chat_config['customer_type'],
        'complexity': chat_config['complexity']
    }
    
    conversation_history.append(turn_data)
    live_assessments.append(turn_data)
    
    # Clear input
    message_input.value = ""

def clear_chat():
    """Clear the chat display but keep conversation data"""
    with chat_output:
        clear_output(wait=False)
        print("Chat display cleared. (Conversation data preserved for export)")
    
    with score_output:
        clear_output(wait=False)
        print("Ready for new messages...")

# Event handlers
def on_send_click(b):
    process_message()

def on_enter_key(text):
    process_message()

def on_clear_click(b):
    clear_chat()

send_button.on_click(on_send_click)
message_input.on_submit(on_enter_key)
clear_button.on_click(on_clear_click)

# Display interface
print("💬 Live Chat Interface with Real-time AI Assessment")
print("Type a message below and press Enter or click Send.")
print("Watch the scores update in real-time after each interaction!\n")

input_box = widgets.HBox([message_input, send_button, clear_button])
display(input_box)

print("\n📱 Chat Window:")
display(chat_output)

print("\n📊 Real-time Agent Assessments:")
display(score_output)

# Initialize with welcome message
with chat_output:
    print(f"🚀 Live Chat Session Started: {chat_config['session_id']}")
    print(f"💡 Try asking questions, expressing frustration, or testing different scenarios.")
    print(f"📊 Quality and frustration scores will appear below after each interaction.")
    print("\n" + "="*60)

with score_output:
    print("Ready for your first message! Scores will appear here after each turn.")

## Step 4: Conversation Analysis Dashboard

Real-time analysis of the ongoing conversation with trends and insights.

In [None]:
# Conversation analysis dashboard
analysis_output = widgets.Output()

def update_analysis():
    """Update conversation analysis in real-time"""
    if not conversation_history:
        with analysis_output:
            clear_output(wait=False)
            print("No conversation data yet. Start chatting to see analysis!")
        return
    
    with analysis_output:
        clear_output(wait=False)
        
        df = pd.DataFrame(conversation_history)
        
        print(f"📊 Live Conversation Analysis Dashboard")
        print(f"Session: {chat_config['session_id']}")
        print(f"Total Turns: {len(conversation_history)}")
        print(f"Duration: {datetime.now() - datetime.fromisoformat(conversation_history[0]['timestamp']):.0f}" if len(conversation_history) > 0 else "")
        print("\n" + "="*50)
        
        # Quality trends
        quality_scores = [turn['quality']['score'] for turn in conversation_history]
        avg_quality = sum(quality_scores) / len(quality_scores)
        quality_trend = "↗️" if len(quality_scores) > 1 and quality_scores[-1] > quality_scores[-2] else "↘️" if len(quality_scores) > 1 and quality_scores[-1] < quality_scores[-2] else "➡️"
        
        print(f"🎯 QUALITY METRICS:")
        print(f"  Average Quality: {avg_quality:.1f}/10.0 {quality_trend}")
        print(f"  Latest Quality: {quality_scores[-1]:.1f}/10.0")
        
        quality_decisions = [turn['quality']['decision'] for turn in conversation_history]
        decision_counts = pd.Series(quality_decisions).value_counts()
        for decision, count in decision_counts.items():
            print(f"  {decision.replace('_', ' ').title()}: {count}/{len(conversation_history)} ({count/len(conversation_history)*100:.0f}%)")
        
        # Frustration trends
        frustration_scores = [turn['frustration']['score'] for turn in conversation_history]
        avg_frustration = sum(frustration_scores) / len(frustration_scores)
        frustration_trend = "⚠️" if len(frustration_scores) > 1 and frustration_scores[-1] > frustration_scores[-2] else "✅" if len(frustration_scores) > 1 and frustration_scores[-1] < frustration_scores[-2] else "➡️"
        
        print(f"\n😤 FRUSTRATION METRICS:")
        print(f"  Average Frustration: {avg_frustration:.1f}/10.0 {frustration_trend}")
        print(f"  Latest Frustration: {frustration_scores[-1]:.1f}/10.0")
        
        frustration_levels = [turn['frustration']['level'] for turn in conversation_history]
        level_counts = pd.Series(frustration_levels).value_counts()
        for level, count in level_counts.items():
            print(f"  {level.replace('_', ' ').title()}: {count}/{len(conversation_history)} ({count/len(conversation_history)*100:.0f}%)")
        
        # Intervention recommendations
        interventions = sum(1 for turn in conversation_history if turn['frustration']['intervention_needed'])
        if interventions > 0:
            print(f"\n🚨 INTERVENTION ALERTS:")
            print(f"  Human intervention recommended: {interventions} times")
            print(f"  Intervention rate: {interventions/len(conversation_history)*100:.0f}%")
        
        # Recent turn details
        print(f"\n📝 RECENT TURNS:")
        for turn in conversation_history[-3:]:
            print(f"  Turn {turn['turn_number']}: Quality {turn['quality']['score']:.1f}, Frustration {turn['frustration']['score']:.1f} ({turn['frustration']['level']})")
            if turn['quality']['adjustment_needed']:
                print(f"    ⚠️ Quality improvement suggested")
            if turn['frustration']['intervention_needed']:
                print(f"    🚨 Human intervention recommended")

# Auto-refresh button
refresh_button = widgets.Button(description="Refresh Analysis", button_style='info')
refresh_button.on_click(lambda b: update_analysis())

print("📊 Live Conversation Analysis")
print("Click 'Refresh Analysis' to see updated metrics during your conversation.")
display(refresh_button)
display(analysis_output)

# Initial update
update_analysis()

## Step 5: Export Conversation Results

Export the complete conversation with all AI assessments for analysis or integration.

In [None]:
# Export conversation results
def export_live_chat_results():
    """Export complete conversation with real-time assessments"""
    if not conversation_history:
        print("❌ No conversation data to export. Start chatting first!")
        return None
    
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    output_dir = Path('live_chat_exports')
    output_dir.mkdir(exist_ok=True)
    
    # Prepare export data
    export_data = {
        'metadata': {
            'export_timestamp': datetime.now().isoformat(),
            'export_type': 'live_chat_session',
            'session_id': chat_config['session_id'],
            'user_id': chat_config['user_id'],
            'total_turns': len(conversation_history),
            'session_duration_minutes': ((datetime.now() - datetime.fromisoformat(conversation_history[0]['timestamp'])).total_seconds() / 60) if conversation_history else 0,
            'customer_profile': {
                'final_customer_type': chat_config['customer_type'],
                'final_complexity': chat_config['complexity']
            },
            'agent_versions': {
                'chatbot_agent': chatbot_agent.agent_config.agent_info['version'] if chatbot_agent.agent_config else 'unknown',
                'quality_agent': quality_agent.agent_config.agent_info['version'] if quality_agent.agent_config else 'unknown',
                'frustration_agent': frustration_agent.agent_config.agent_info['version'] if frustration_agent.agent_config else 'unknown'
            }
        },
        'conversation_history': conversation_history,
        'summary_statistics': {
            'quality_metrics': {
                'average_score': sum(turn['quality']['score'] for turn in conversation_history) / len(conversation_history),
                'score_range': [min(turn['quality']['score'] for turn in conversation_history), max(turn['quality']['score'] for turn in conversation_history)],
                'decision_distribution': dict(pd.Series([turn['quality']['decision'] for turn in conversation_history]).value_counts()),
                'adjustments_needed': sum(1 for turn in conversation_history if turn['quality']['adjustment_needed'])
            },
            'frustration_metrics': {
                'average_score': sum(turn['frustration']['score'] for turn in conversation_history) / len(conversation_history),
                'score_range': [min(turn['frustration']['score'] for turn in conversation_history), max(turn['frustration']['score'] for turn in conversation_history)],
                'level_distribution': dict(pd.Series([turn['frustration']['level'] for turn in conversation_history]).value_counts()),
                'interventions_recommended': sum(1 for turn in conversation_history if turn['frustration']['intervention_needed']),
                'escalation_trends': dict(pd.Series([turn['frustration']['escalation_trend'] for turn in conversation_history]).value_counts())
            },
            'conversation_flow': {
                'quality_trend': 'improving' if len(conversation_history) > 1 and conversation_history[-1]['quality']['score'] > conversation_history[0]['quality']['score'] else 'declining' if len(conversation_history) > 1 and conversation_history[-1]['quality']['score'] < conversation_history[0]['quality']['score'] else 'stable',
                'frustration_trend': 'escalating' if len(conversation_history) > 1 and conversation_history[-1]['frustration']['score'] > conversation_history[0]['frustration']['score'] else 'de-escalating' if len(conversation_history) > 1 and conversation_history[-1]['frustration']['score'] < conversation_history[0]['frustration']['score'] else 'stable'
            }
        }
    }
    
    # Export main results
    results_filename = f'live_chat_results_{timestamp}.json'
    results_path = output_dir / results_filename
    
    with open(results_path, 'w') as f:
        json.dump(export_data, f, indent=2, default=str)
    
    # Export CSV summary
    csv_data = []
    for turn in conversation_history:
        csv_data.append({
            'turn_number': turn['turn_number'],
            'timestamp': turn['timestamp'],
            'user_message': turn['user_message'],
            'chatbot_response': turn['chatbot_response'],
            'quality_score': turn['quality']['score'],
            'quality_decision': turn['quality']['decision'],
            'quality_confidence': turn['quality']['confidence'],
            'frustration_score': turn['frustration']['score'],
            'frustration_level': turn['frustration']['level'],
            'frustration_confidence': turn['frustration']['confidence'],
            'intervention_needed': turn['frustration']['intervention_needed'],
            'customer_type': turn['customer_type'],
            'complexity': turn['complexity']
        })
    
    csv_filename = f'live_chat_summary_{timestamp}.csv'
    csv_path = output_dir / csv_filename
    pd.DataFrame(csv_data).to_csv(csv_path, index=False)
    
    # Generate session report
    report_filename = f'live_chat_report_{timestamp}.txt'
    report_path = output_dir / report_filename
    
    with open(report_path, 'w') as f:
        f.write("LIVE CHAT SESSION REPORT\n")
        f.write("=" * 50 + "\n\n")
        f.write(f"Session ID: {chat_config['session_id']}\n")
        f.write(f"Export Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"Total Turns: {len(conversation_history)}\n")
        f.write(f"Session Duration: {export_data['metadata']['session_duration_minutes']:.1f} minutes\n\n")
        
        f.write("QUALITY ASSESSMENT SUMMARY:\n")
        quality_stats = export_data['summary_statistics']['quality_metrics']
        f.write(f"  Average Quality Score: {quality_stats['average_score']:.2f}/10.0\n")
        f.write(f"  Quality Range: {quality_stats['score_range'][0]:.1f} - {quality_stats['score_range'][1]:.1f}\n")
        f.write(f"  Adjustments Needed: {quality_stats['adjustments_needed']}/{len(conversation_history)} ({quality_stats['adjustments_needed']/len(conversation_history)*100:.1f}%)\n\n")
        
        f.write("FRUSTRATION ANALYSIS SUMMARY:\n")
        frustration_stats = export_data['summary_statistics']['frustration_metrics']
        f.write(f"  Average Frustration Score: {frustration_stats['average_score']:.2f}/10.0\n")
        f.write(f"  Frustration Range: {frustration_stats['score_range'][0]:.1f} - {frustration_stats['score_range'][1]:.1f}\n")
        f.write(f"  Interventions Recommended: {frustration_stats['interventions_recommended']}/{len(conversation_history)} ({frustration_stats['interventions_recommended']/len(conversation_history)*100:.1f}%)\n\n")
        
        f.write("CONVERSATION FLOW:\n")
        flow = export_data['summary_statistics']['conversation_flow']
        f.write(f"  Quality Trend: {flow['quality_trend']}\n")
        f.write(f"  Frustration Trend: {flow['frustration_trend']}\n\n")
        
        f.write("TURN-BY-TURN SUMMARY:\n")
        for turn in conversation_history:
            f.write(f"  Turn {turn['turn_number']}: Q={turn['quality']['score']:.1f} F={turn['frustration']['score']:.1f} ({turn['frustration']['level']})\n")
    
    print(f"✅ Live chat session exported successfully!")
    print(f"\n📁 Export Files:")
    print(f"  Complete Data: {results_path}")
    print(f"  CSV Summary: {csv_path}")
    print(f"  Session Report: {report_path}")
    
    print(f"\n📊 Session Summary:")
    print(f"  Total turns: {len(conversation_history)}")
    print(f"  Average quality: {export_data['summary_statistics']['quality_metrics']['average_score']:.1f}/10.0")
    print(f"  Average frustration: {export_data['summary_statistics']['frustration_metrics']['average_score']:.1f}/10.0")
    print(f"  Interventions needed: {export_data['summary_statistics']['frustration_metrics']['interventions_recommended']}")
    
    return {
        'results_file': str(results_path),
        'csv_file': str(csv_path),
        'report_file': str(report_path)
    }

# Export button
export_button = widgets.Button(
    description="Export Conversation",
    button_style='success',
    icon='download'
)

export_output = widgets.Output()

def on_export_click(b):
    with export_output:
        clear_output(wait=False)
        export_live_chat_results()

export_button.on_click(on_export_click)

print("💾 Export Live Chat Results")
print("Click to export your conversation with all AI assessments and scoring data.")
display(export_button)
display(export_output)

## Step 6: Session Summary and Insights

Final summary of the live chat session with insights and recommendations.

In [None]:
# Session summary and insights
def generate_session_insights():
    """Generate insights and recommendations from the live chat session"""
    if not conversation_history:
        print("❌ No conversation data available. Start chatting to generate insights!")
        return
    
    print(f"🎯 LIVE CHAT SESSION INSIGHTS")
    print(f"Session: {chat_config['session_id']}")
    print("=" * 60)
    
    # Calculate key metrics
    quality_scores = [turn['quality']['score'] for turn in conversation_history]
    frustration_scores = [turn['frustration']['score'] for turn in conversation_history]
    
    avg_quality = sum(quality_scores) / len(quality_scores)
    avg_frustration = sum(frustration_scores) / len(frustration_scores)
    
    interventions = sum(1 for turn in conversation_history if turn['frustration']['intervention_needed'])
    adjustments = sum(1 for turn in conversation_history if turn['quality']['adjustment_needed'])
    
    print(f"\n📊 SESSION METRICS:")
    print(f"  Total Conversation Turns: {len(conversation_history)}")
    print(f"  Average Response Quality: {avg_quality:.1f}/10.0")
    print(f"  Average User Frustration: {avg_frustration:.1f}/10.0")
    print(f"  Quality Adjustments Suggested: {adjustments} ({adjustments/len(conversation_history)*100:.0f}%)")
    print(f"  Human Interventions Recommended: {interventions} ({interventions/len(conversation_history)*100:.0f}%)")
    
    # Performance assessment
    print(f"\n🏆 PERFORMANCE ASSESSMENT:")
    
    if avg_quality >= 8.0:
        print(f"  Quality: ✅ EXCELLENT - Chatbot performing very well")
    elif avg_quality >= 7.0:
        print(f"  Quality: ✅ GOOD - Chatbot meeting quality standards")
    elif avg_quality >= 6.0:
        print(f"  Quality: ⚠️ ADEQUATE - Some improvement opportunities")
    else:
        print(f"  Quality: ❌ NEEDS IMPROVEMENT - Quality below acceptable threshold")
    
    if avg_frustration <= 2.0:
        print(f"  Frustration: ✅ EXCELLENT - Customer remained calm and satisfied")
    elif avg_frustration <= 4.0:
        print(f"  Frustration: ✅ GOOD - Minor frustration levels")
    elif avg_frustration <= 6.0:
        print(f"  Frustration: ⚠️ MODERATE - Some customer frustration detected")
    else:
        print(f"  Frustration: ❌ HIGH - Significant customer frustration present")
    
    # Trend analysis
    print(f"\n📈 TREND ANALYSIS:")
    if len(quality_scores) > 2:
        early_quality = sum(quality_scores[:len(quality_scores)//2]) / (len(quality_scores)//2)
        late_quality = sum(quality_scores[len(quality_scores)//2:]) / (len(quality_scores) - len(quality_scores)//2)
        
        if late_quality > early_quality + 0.5:
            print(f"  Quality Trend: ↗️ IMPROVING ({early_quality:.1f} → {late_quality:.1f})")
        elif late_quality < early_quality - 0.5:
            print(f"  Quality Trend: ↘️ DECLINING ({early_quality:.1f} → {late_quality:.1f})")
        else:
            print(f"  Quality Trend: ➡️ STABLE ({avg_quality:.1f} average)")
        
        early_frustration = sum(frustration_scores[:len(frustration_scores)//2]) / (len(frustration_scores)//2)
        late_frustration = sum(frustration_scores[len(frustration_scores)//2:]) / (len(frustration_scores) - len(frustration_scores)//2)
        
        if late_frustration > early_frustration + 0.5:
            print(f"  Frustration Trend: ⚠️ ESCALATING ({early_frustration:.1f} → {late_frustration:.1f})")
        elif late_frustration < early_frustration - 0.5:
            print(f"  Frustration Trend: ✅ DE-ESCALATING ({early_frustration:.1f} → {late_frustration:.1f})")
        else:
            print(f"  Frustration Trend: ➡️ STABLE ({avg_frustration:.1f} average)")
    
    # Recommendations
    print(f"\n💡 RECOMMENDATIONS:")
    
    if avg_quality < 7.0:
        print(f"  🔧 Consider adjusting chatbot prompts or model settings to improve response quality")
    
    if adjustments > len(conversation_history) * 0.3:
        print(f"  📝 High rate of quality adjustments suggests need for prompt engineering")
    
    if avg_frustration > 5.0:
        print(f"  😤 High frustration levels indicate need for better frustration handling")
    
    if interventions > 0:
        print(f"  🚨 {interventions} intervention(s) recommended - ensure human agents are available")
    
    if len(conversation_history) > 8:
        print(f"  📞 Long conversation ({len(conversation_history)} turns) - consider proactive escalation")
    
    # Success factors
    print(f"\n🌟 SESSION HIGHLIGHTS:")
    
    best_quality_turn = max(conversation_history, key=lambda x: x['quality']['score'])
    print(f"  Best Quality Response: Turn {best_quality_turn['turn_number']} ({best_quality_turn['quality']['score']:.1f}/10.0)")
    
    lowest_frustration_turn = min(conversation_history, key=lambda x: x['frustration']['score'])
    print(f"  Lowest Frustration: Turn {lowest_frustration_turn['turn_number']} ({lowest_frustration_turn['frustration']['score']:.1f}/10.0)")
    
    if interventions == 0:
        print(f"  🎉 No human interventions needed - AI handled conversation successfully")
    
    if avg_quality >= 8.0 and avg_frustration <= 3.0:
        print(f"  🏆 OUTSTANDING SESSION - High quality responses with low customer frustration")
    
    print(f"\n" + "="*60)
    print(f"✅ Session analysis complete! Export data for detailed review.")

# Generate insights button
insights_button = widgets.Button(
    description="Generate Insights",
    button_style='info',
    icon='chart-line'
)

insights_output = widgets.Output()

def on_insights_click(b):
    with insights_output:
        clear_output(wait=False)
        generate_session_insights()

insights_button.on_click(on_insights_click)

print("🎯 Session Insights and Recommendations")
print("Click to generate detailed insights about your live chat session.")
display(insights_button)
display(insights_output)

---

## 🚀 You're Ready to Test!

### Quick Start Guide:
1. **Initialize Agents** (Step 1) - Run once to set up all AI agents
2. **Configure Session** (Step 2) - Set customer type and complexity
3. **Start Chatting** (Step 3) - Use the chat interface like a real customer
4. **Watch Scores** - See real-time quality and frustration assessments
5. **Analyze Results** (Step 4) - View conversation trends and metrics
6. **Export Data** (Step 5) - Save complete results for analysis
7. **Generate Insights** (Step 6) - Get recommendations and summary

### Perfect for:
- **Stakeholder Demos**: Show live AI quality control and frustration detection
- **System Validation**: Test with real human interactions vs. simulated customers
- **Configuration Tuning**: See immediate impact of agent settings changes
- **Training Data**: Generate high-quality human-validated conversation examples
- **Performance Testing**: Validate AI agent accuracy with real-world scenarios

### Tips for Effective Testing:
- Try different conversation styles (frustrated, polite, technical, etc.)
- Test edge cases and challenging scenarios
- Experiment with different customer types and complexity levels
- Watch how scores change in real-time as the conversation evolves
- Export results to compare different configuration settings

**Happy testing! 🎉**