# Chatbot Agent Testing Notebook

This notebook provides a user-friendly interface for testing the Chatbot Agent with pre-generated questions.
It's designed for users with little programming experience.

## Features:
- Load and edit agent configuration settings
- Load test questions from JSON files
- Process questions through the Chatbot Agent
- Review and analyze results
- Export results with timestamps

## Getting Started:
1. Run cells in order from top to bottom
2. Edit configuration values as needed
3. Load test questions from file (generated using question_generator.ipynb)
4. Review questions before processing
5. Run the agent and review results

## Question Generation:
Use the separate `question_generator.ipynb` notebook to create test questions, then load them here.

In [1]:
# Import required libraries
import json
import os
import sys
from datetime import datetime
from pathlib import Path

import ipywidgets as widgets
import pandas as pd
from ruamel.yaml import YAML  # Changed from: import yaml
from IPython.display import display, HTML, clear_output

# Set the working directory to the root of the project
os.chdir('/workspace')

# Add workspace to path for imports (this helps with relative imports)
sys.path.insert(0, '/workspace')

# Import our system components
from src.nodes.chatbot_agent import ChatbotAgentNode
from src.core.config.agent_config_manager import AgentConfigManager
from src.integrations.llm_providers import LLMProviderFactory

print("✅ All libraries imported successfully!")
print("Ready to start testing the Chatbot Agent.")

✅ All libraries imported successfully!
Ready to start testing the Chatbot Agent.


## Step 1: Load Configuration Settings

The following cell loads the current configuration for the Chatbot Agent.
You can edit these values to customize the agent's behavior.

In [2]:
# Load configuration from files and create temporary editable copies
config_base_path = Path('/workspace/config')
agent_config_path = config_base_path / 'agents' / 'chatbot_agent'
temp_config_dir = Path('/tmp/chatbot_agent_configs')

def load_and_create_temp_configs():
    """Load all configuration files and create temporary editable copies with comments preserved"""
    configs = {}
    
    # Create YAML instance for comment preservation
    yaml = YAML()
    yaml.preserve_quotes = True
    yaml.default_flow_style = False
    
    # Create temp directory
    temp_config_dir.mkdir(exist_ok=True)
    
    # Load original files for parsing (to access data)
    with open(agent_config_path / 'config.yaml', 'r') as f:
        configs['agent'] = yaml.load(f)
    
    with open(agent_config_path / 'prompts.yaml', 'r') as f:
        configs['prompts'] = yaml.load(f)
    
    with open(agent_config_path / 'models.yaml', 'r') as f:
        configs['models'] = yaml.load(f)
    
    # Load shared models for reference
    with open(config_base_path / 'shared' / 'models.yaml', 'r') as f:
        configs['shared_models'] = yaml.load(f)
    
    # Create temp file paths
    temp_agent_path = temp_config_dir / 'config.yaml'
    temp_prompts_path = temp_config_dir / 'prompts.yaml'
    temp_models_path = temp_config_dir / 'models.yaml'
    
    # Copy original files to temp directory to preserve comments and formatting
    import shutil
    shutil.copy2(agent_config_path / 'config.yaml', temp_agent_path)
    shutil.copy2(agent_config_path / 'prompts.yaml', temp_prompts_path)  
    shutil.copy2(agent_config_path / 'models.yaml', temp_models_path)
    
    return configs, {
        'agent_config': temp_agent_path,
        'prompts_config': temp_prompts_path,
        'models_config': temp_models_path
    }

# Load configurations and create temp files
configs, temp_file_paths = load_and_create_temp_configs()

print("📁 Configuration files loaded and temporary copies created with comments preserved!")
print(f"Agent name: {configs['agent']['agent']['name']}")
print(f"Agent version: {configs['agent']['agent']['version']}")

# Get preferred model from models config (not agent config)
preferred_model = "Unknown"
if 'primary_model' in configs['models']:
    preferred_model = configs['models']['primary_model']
elif 'preferred' in configs['models']:
    preferred_model = configs['models']['preferred']

print(f"Preferred model: {preferred_model}")
print(f"\n💾 Temporary config files created at:")
for config_type, path in temp_file_paths.items():
    print(f"  {config_type}: {path}")
print(f"\n💡 These temp files retain original comments and can be edited directly in Step 2.")

📁 Configuration files loaded and temporary copies created with comments preserved!
Agent name: chatbot_agent
Agent version: 1.0.0
Preferred model: local_general_standard

💾 Temporary config files created at:
  agent_config: /tmp/chatbot_agent_configs/config.yaml
  prompts_config: /tmp/chatbot_agent_configs/prompts.yaml
  models_config: /tmp/chatbot_agent_configs/models.yaml

💡 These temp files retain original comments and can be edited directly in Step 2.


## Step 2: Editable Configuration Settings

Edit these settings to customize how the Chatbot Agent behaves.
These variables map directly to the configuration files and can be exported later.

In [3]:
# Display and edit configuration files in separate windows

def load_config_file_contents():
    """Load current config file contents from temp files"""
    with open(temp_file_paths['agent_config'], 'r') as f:
        agent_config_content = f.read()
    with open(temp_file_paths['prompts_config'], 'r') as f:
        prompts_config_content = f.read()
    with open(temp_file_paths['models_config'], 'r') as f:
        models_config_content = f.read()
    
    return agent_config_content, prompts_config_content, models_config_content

# Load current config file contents
agent_config_content, prompts_config_content, models_config_content = load_config_file_contents()

print("⚙️ Configuration File Editor")
print("Edit the YAML configuration files below and use the Save buttons to apply changes.")
print("Changes are saved to temporary files and will be used in Step 5.\n")

# Create text areas for each config file
print("📄 1. Agent Configuration (config.yaml)")
print("Contains: agent settings, behavior, escalation thresholds")

agent_config_editor = widgets.Textarea(
    value=agent_config_content,
    description="",
    layout=widgets.Layout(width='100%', height='250px'),
    style={'description_width': '0px'}
)

def save_agent_config(button):
    """Save agent config changes with comments preserved"""
    try:
        # Create YAML instance for validation
        yaml = YAML()
        yaml.preserve_quotes = True
        yaml.default_flow_style = False
        
        # Validate YAML syntax
        yaml.load(agent_config_editor.value)
        
        # Save to temp file (preserves comments in the editor content)
        with open(temp_file_paths['agent_config'], 'w') as f:
            f.write(agent_config_editor.value)
        
        print("✅ Agent config saved successfully with comments preserved!")
        
    except Exception as e:
        print(f"❌ YAML syntax error in agent config: {e}")

agent_save_btn = widgets.Button(description="Save Agent Config", button_style='success')
agent_save_btn.on_click(save_agent_config)

display(agent_config_editor)
display(agent_save_btn)

print("\n📄 2. Prompts Configuration (prompts.yaml)")
print("Contains: system prompts, response guidelines, communication style")

prompts_config_editor = widgets.Textarea(
    value=prompts_config_content,
    description="",
    layout=widgets.Layout(width='100%', height='250px'),
    style={'description_width': '0px'}
)

def save_prompts_config(button):
    """Save prompts config changes with comments preserved"""
    try:
        # Create YAML instance for validation
        yaml = YAML()
        yaml.preserve_quotes = True
        yaml.default_flow_style = False
        
        # Validate YAML syntax
        yaml.load(prompts_config_editor.value)
        
        # Save to temp file (preserves comments in the editor content)
        with open(temp_file_paths['prompts_config'], 'w') as f:
            f.write(prompts_config_editor.value)
        
        print("✅ Prompts config saved successfully with comments preserved!")
        
    except Exception as e:
        print(f"❌ YAML syntax error in prompts config: {e}")

prompts_save_btn = widgets.Button(description="Save Prompts Config", button_style='success')
prompts_save_btn.on_click(save_prompts_config)

display(prompts_config_editor)
display(prompts_save_btn)

print("\n📄 3. Models Configuration (models.yaml)")
print("Contains: preferred model, fallback models, model-specific settings")

# Show available model aliases from shared models config
def display_available_models():
    """Display available model aliases and their actual models"""
    try:
        shared_models = configs['shared_models']
        
        # Extract model aliases and models sections
        model_aliases = shared_models.get('model_aliases', {})
        models = shared_models.get('models', {})
        
        if not model_aliases:
            print("❌ No model aliases found in shared configuration")
            return
            
        print("\n🔍 Available Model Aliases:")
        print("Use these aliases in your models configuration below:\n")
        
        # Group by provider for better organization
        providers = {}
        for alias, actual_model_name in model_aliases.items():
            # Get model details from models section
            model_details = models.get(actual_model_name, {})
            provider = model_details.get('type', 'unknown')
            description = model_details.get('description', '')
            
            if provider not in providers:
                providers[provider] = []
            providers[provider].append({
                'alias': alias,
                'model_name': actual_model_name,
                'description': description
            })
        
        # Display by provider
        for provider, provider_models in providers.items():
            print(f"📡 {provider.upper()} Provider:")
            for model in provider_models:
                desc = f" - {model['description']}" if model['description'] else ""
                print(f"  • {model['alias']} → {model['model_name']}{desc}")
            print()
        
        # Show current configuration
        try:
            # Use ruamel.yaml for parsing
            yaml = YAML()
            yaml.preserve_quotes = True
            current_models_config = yaml.load(models_config_content)
            current_preferred = current_models_config.get('primary_model', current_models_config.get('preferred', 'unknown'))
            current_fallback = current_models_config.get('fallback', [])
        except:
            current_preferred = 'unknown'
            current_fallback = []
        
        print(f"📋 Current Models Configuration:")
        print(f"  Preferred: {current_preferred}")
        print(f"  Fallback: {current_fallback}")
        print()
        
        print("💡 Tips for editing:")
        print("  • Change 'primary_model' or 'preferred' to any alias from the list above")
        print("  • Add/remove aliases in the 'fallback' list")
        print("  • Aliases are case-sensitive")
        print("  • Invalid aliases will cause errors during processing")
        print()
        
    except Exception as e:
        print(f"❌ Error loading available models: {e}")
        print("Continuing with models configuration editor...\n")

# Display available models before showing the editor
display_available_models()

models_config_editor = widgets.Textarea(
    value=models_config_content,
    description="",
    layout=widgets.Layout(width='100%', height='200px'),
    style={'description_width': '0px'}
)

def save_models_config(button):
    """Save models config changes with comments preserved"""
    try:
        # Create YAML instance for validation
        yaml = YAML()
        yaml.preserve_quotes = True
        yaml.default_flow_style = False
        
        # Validate YAML syntax
        parsed_config = yaml.load(models_config_editor.value)
        
        # Additional validation for model aliases
        if isinstance(parsed_config, dict):
            preferred = parsed_config.get('primary_model') or parsed_config.get('preferred')
            fallback = parsed_config.get('fallback', [])
            
            # Get available aliases
            model_aliases = configs['shared_models'].get('model_aliases', {})
            
            # Check if preferred model exists
            if preferred and preferred not in model_aliases:
                print(f"⚠️ Warning: Preferred model '{preferred}' not found in available model aliases")
            
            # Check fallback models
            for fb_model in fallback:
                if fb_model not in model_aliases:
                    print(f"⚠️ Warning: Fallback model '{fb_model}' not found in available model aliases")
        
        # Save to temp file (preserves comments in the editor content)
        with open(temp_file_paths['models_config'], 'w') as f:
            f.write(models_config_editor.value)
        
        print("✅ Models config saved successfully with comments preserved!")
        
    except Exception as e:
        print(f"❌ YAML syntax error in models config: {e}")

models_save_btn = widgets.Button(description="Save Models Config", button_style='success')
models_save_btn.on_click(save_models_config)

display(models_config_editor)
display(models_save_btn)

# Save All button for convenience
def save_all_configs(button):
    """Save all config changes at once"""
    save_agent_config(None)
    save_prompts_config(None)
    save_models_config(None)

print("\n💾 Save All Changes")
save_all_btn = widgets.Button(description="Save All Configs", button_style='info')
save_all_btn.on_click(save_all_configs)
display(save_all_btn)

print(f"\n💾 Temp config files location:")
for config_type, path in temp_file_paths.items():
    print(f"  {config_type}: {path}")

print("\n💡 Tips:")
print("  • Edit YAML directly in the text areas above")
print("  • Use Save buttons to apply changes to temp files")
print("  • YAML syntax is validated before saving")
print("  • Model aliases are validated against available models")
print("  • Changes will be used in Step 5 when processing questions")
print("  • Original config files remain unchanged")
print("  • Comments and formatting are preserved during editing!")

⚙️ Configuration File Editor
Edit the YAML configuration files below and use the Save buttons to apply changes.
Changes are saved to temporary files and will be used in Step 5.

📄 1. Agent Configuration (config.yaml)
Contains: agent settings, behavior, escalation thresholds


Textarea(value='# Chatbot Agent Configuration\n# This agent provides primary customer service responses and ha…

Button(button_style='success', description='Save Agent Config', style=ButtonStyle())


📄 2. Prompts Configuration (prompts.yaml)
Contains: system prompts, response guidelines, communication style


Textarea(value='system: |\n  You are a professional customer service chatbot dedicated to providing exceptiona…

Button(button_style='success', description='Save Prompts Config', style=ButtonStyle())


📄 3. Models Configuration (models.yaml)
Contains: preferred model, fallback models, model-specific settings

🔍 Available Model Aliases:
Use these aliases in your models configuration below:

📡 ANTHROPIC Provider:
  • anthropic_general_budget → claude-3-5-haiku-20241022 - Anthropic Claude 3.5 Haiku - fast and efficient
  • anthropic_general_standard → claude-3-5-sonnet-20241022 - Anthropic Claude 3.5 Sonnet - balanced performance and reasoning
  • anthropic_reasoning_premium → claude-3-5-sonnet-20241022 - Anthropic Claude 3.5 Sonnet - balanced performance and reasoning
  • anthropic_coding_premium → claude-3-5-sonnet-20241022 - Anthropic Claude 3.5 Sonnet - balanced performance and reasoning
  • anthropic_flagship → claude-3-5-sonnet-20241022 - Anthropic Claude 3.5 Sonnet - balanced performance and reasoning

📡 OPENAI Provider:
  • openai_general_standard → gpt-4 - OpenAI GPT-4 - highest quality, requires API key
  • openai_general_budget → gpt-3.5-turbo - OpenAI GPT-3.5 Turbo - fast a

Textarea(value='primary_model: "local_general_standard"\n\nmodel_preferences:\n  general_queries:\n    primary…

Button(button_style='success', description='Save Models Config', style=ButtonStyle())


💾 Save All Changes


Button(button_style='info', description='Save All Configs', style=ButtonStyle())


💾 Temp config files location:
  agent_config: /tmp/chatbot_agent_configs/config.yaml
  prompts_config: /tmp/chatbot_agent_configs/prompts.yaml
  models_config: /tmp/chatbot_agent_configs/models.yaml

💡 Tips:
  • Edit YAML directly in the text areas above
  • Use Save buttons to apply changes to temp files
  • YAML syntax is validated before saving
  • Model aliases are validated against available models
  • Changes will be used in Step 5 when processing questions
  • Original config files remain unchanged
  • Comments and formatting are preserved during editing!


## Step 2.5: Understanding Chatbot Internal Logic

Now that you've seen the configuration settings, let's understand how the chatbot processes customer queries and how your configuration changes affect its behavior.

### How the Chatbot Agent Works Internally

The Chatbot Agent follows a sophisticated multi-stage process for each customer interaction:

#### 1. **Initial Assessment Phase**
- **Query Analysis**: The chatbot first analyzes the customer's question to understand:
  - Intent and urgency level
  - Technical complexity 
  - Emotional tone (frustration, confusion, etc.)
  - Required information categories
- **Context Integration**: Retrieves previous conversation history and relevant customer data
- **Confidence Scoring**: Assigns a confidence score (0.0-1.0) based on how well it understands the query

#### 2. **Response Generation Phase**
- **System Prompt Application**: Uses the `system` prompt from `prompts.yaml` to establish personality and behavior
- **Model Selection**: Chooses the appropriate AI model based on query complexity:
  - Simple queries → Uses `primary_model` from `models.yaml`
  - Complex technical queries → May escalate to premium models if configured
  - Reasoning-heavy queries → Uses models optimized for reasoning
- **Response Crafting**: Generates response following the guidelines in `prompts.yaml`

#### 3. **Quality Control Phase**
- **Escalation Threshold Check**: Compares confidence score against thresholds in `config.yaml`:
  - If confidence < `escalation.confidence_threshold` → Triggers escalation
  - If query complexity > agent capabilities → Routes to human specialist
- **Response Validation**: Ensures response meets quality standards
- **Safety Checks**: Verifies no inappropriate or harmful content

### How Configuration Settings Affect Behavior

#### **Temperature Setting** (`config.yaml: settings.temperature`)
- **Low (0.0-0.3)**: More deterministic, consistent responses
  - *Best for*: Policy explanations, factual information
  - *Trade-off*: May sound robotic or repetitive
- **Medium (0.4-0.7)**: Balanced creativity and consistency  
  - *Best for*: General customer service, explanations
  - *Trade-off*: Good balance of personality and accuracy
- **High (0.8-1.0)**: More creative, varied responses
  - *Best for*: Empathetic responses, complex problem-solving
  - *Trade-off*: May be less predictable or occasionally off-topic

#### **System Prompt** (`prompts.yaml: system`)
This is the chatbot's "personality and instructions." Key components:
- **Role Definition**: "You are a professional customer service chatbot..."
- **Communication Style**: Friendly, helpful, professional tone guidelines
- **Response Structure**: How to organize information in responses
- **Escalation Guidance**: When and how to refer customers to human agents
- **Service Standards**: Quality expectations and customer satisfaction goals

*Example Impact*: Adding "Be concise and direct" to the system prompt will make responses shorter and more to-the-point.

#### **Model Selection** (`models.yaml`)
- **Primary Model**: The default model used for most queries
  - `anthropic_general_standard`: Balanced performance, good for most queries
  - `local_general_standard`: Faster local processing, may be less sophisticated
  - `openai_general_standard`: High-quality responses, requires API access
- **Fallback Models**: Used when primary model is unavailable
- **Model Preferences by Query Type**: Different models for different scenarios
  - `general_queries.primary`: Standard customer service questions
  - `complex_queries.primary`: Technical or complicated issues  
  - `escalation_queries.primary`: When preparing to transfer to human

#### **Escalation Thresholds** (`config.yaml: escalation`)
- **Confidence Threshold**: Minimum confidence to attempt answering
  - Lower values → More willing to try answering difficult questions
  - Higher values → More conservative, escalates uncertain queries faster
- **Max Attempts**: How many clarification questions to ask before escalating
- **Priority Routing**: Which human departments to route different query types

#### **Behavior Settings** (`config.yaml: behavior`)
- **Response Length Preferences**: Target length for different response types
- **Follow-up Strategy**: How proactive to be in asking clarifying questions
- **Empathy Level**: How much emotional intelligence to apply
- **Technical Depth**: How detailed to get with technical explanations

### Real-World Impact Examples

#### Scenario: Customer asks "My claim was denied, what do I do?"

**With Conservative Settings** (high escalation threshold, low temperature):
- High confidence threshold → May immediately escalate to human agent
- Low temperature → Gives standard, policy-based response
- Result: Quick escalation, consistent but potentially impersonal

**With Balanced Settings** (medium thresholds, medium temperature):
- Medium confidence → Asks clarifying questions first
- Balanced temperature → Provides empathetic but accurate response
- Result: Attempts to help while showing understanding

**With Aggressive Settings** (low escalation threshold, high temperature):
- Low confidence threshold → Attempts detailed explanation
- High temperature → Creative, personalized response approach
- Result: More comprehensive help but potentially longer interaction

### Configuration Optimization Tips

1. **For High-Volume, Simple Queries**:
   - Use faster models (`local_general_standard`)
   - Lower temperature for consistency
   - Higher escalation thresholds to reduce human load

2. **For Complex Technical Support**:
   - Use reasoning-optimized models (`anthropic_reasoning_premium`)
   - Medium-high temperature for creative problem-solving
   - Lower escalation thresholds to ensure accuracy

3. **For Frustrated Customers**:
   - Emphasize empathy in system prompts
   - Medium temperature for personalized responses
   - Faster escalation to human agents when emotions are high

4. **For Cost Optimization**:
   - Prefer local models over API-based models
   - Use budget models for simple queries
   - Set appropriate escalation thresholds to balance cost and quality

### Monitoring and Adjustment

The chatbot's performance can be monitored through:
- **Confidence Scores**: Track how certain the chatbot is about responses
- **Escalation Rates**: Monitor how often queries are passed to humans
- **Customer Satisfaction**: Measure response quality and helpfulness
- **Response Times**: Balance quality with speed requirements
- **Conversation Length**: Optimize for efficiency while maintaining quality

*Understanding these internals helps you make informed configuration changes that improve customer experience while managing operational costs.*

## Step 3: Load Test Questions

Load test questions from a JSON file. Use question_generator.ipynb to create new questions.

In [4]:
# File upload widget for loading questions
file_upload = widgets.FileUpload(
    accept='.json',
    multiple=False,
    description='Upload questions file:'
)

# Instructions for file format
print("📝 Load Test Questions")
print("\n💡 How to get test questions:")
print("1. Use question_generator.ipynb to create test questions")
print("2. Upload the generated JSON file below")
print("3. Supported formats: Full export or questions-only JSON")
print("\n📄 Expected JSON format:")
print("- Full export: {'metadata': {...}, 'questions': [...]}")
print("- Questions only: [{'id': 1, 'question': '...', 'customer_type': '...', 'complexity': '...'}]")
print("\n📁 Upload your questions file:")
display(file_upload)

📝 Load Test Questions

💡 How to get test questions:
1. Use question_generator.ipynb to create test questions
2. Upload the generated JSON file below
3. Supported formats: Full export or questions-only JSON

📄 Expected JSON format:
- Full export: {'metadata': {...}, 'questions': [...]}
- Questions only: [{'id': 1, 'question': '...', 'customer_type': '...', 'complexity': '...'}]

📁 Upload your questions file:


FileUpload(value=(), accept='.json', description='Upload questions file:')

In [5]:
# Load test questions from uploaded file
test_questions = []

def load_questions_from_file(file_content, filename):
    """Load questions from uploaded JSON file"""
    try:
        # Handle different content types
        if isinstance(file_content, memoryview):
            # Convert memoryview to bytes
            content_bytes = file_content.tobytes()
        elif hasattr(file_content, 'decode'):
            # Already bytes
            content_bytes = file_content
        else:
            # Convert to bytes if it's a string or other type
            content_bytes = str(file_content).encode('utf-8')
        
        # Decode to string and parse JSON
        data = json.loads(content_bytes.decode('utf-8'))
        
        # Handle different JSON formats
        if isinstance(data, dict):
            # Full export format with metadata
            if 'questions' in data:
                questions = data['questions']
                metadata = data.get('metadata', {})
                print(f"📄 Loaded file with metadata:")
                print(f"  Generation model: {metadata.get('generation_model', 'unknown')}")
                print(f"  Generation timestamp: {metadata.get('generation_timestamp', 'unknown')}")
                print(f"  Question count: {metadata.get('question_count', len(questions))}")
                return questions
            else:
                # Single question object
                return [data]
        elif isinstance(data, list):
            # Questions-only format
            return data
        else:
            print(f"❌ Unexpected data format: {type(data)}")
            return []
            
    except json.JSONDecodeError as e:
        print(f"❌ JSON parsing error: {e}")
        return []
    except Exception as e:
        print(f"❌ Error loading file: {e}")
        print(f"   File content type: {type(file_content)}")
        return []

def validate_questions(questions):
    """Validate and normalize question format"""
    validated_questions = []
    
    for i, q in enumerate(questions):
        if isinstance(q, dict):
            # Ensure required fields exist
            validated_q = {
                "id": q.get("id", i + 1),
                "question": q.get("question", f"Question {i + 1}"),
                "customer_type": q.get("customer_type", "normal"),
                "complexity": q.get("complexity", "medium")
            }
            validated_questions.append(validated_q)
        else:
            # Convert string to dict if needed
            validated_q = {
                "id": i + 1,
                "question": str(q),
                "customer_type": "normal",
                "complexity": "medium"
            }
            validated_questions.append(validated_q)
    
    return validated_questions

# Process uploaded file
if file_upload.value:
    uploaded_file = None
    filename = None
    file_content = None
    
    # Handle different file upload widget formats
    if isinstance(file_upload.value, tuple) and len(file_upload.value) > 0:
        print("📋 Using tuple format")
        uploaded_file = file_upload.value[0]
        filename = uploaded_file['name']
        file_content = uploaded_file['content']
        print(f"🔍 File content type: {type(file_content)}")
    elif isinstance(file_upload.value, dict) and len(file_upload.value) > 0:
        print("📋 Using dict format")
        uploaded_file = list(file_upload.value.values())[0]
        filename = uploaded_file['metadata']['name']
        file_content = uploaded_file['content']
        print(f"🔍 File content type: {type(file_content)}")
    else:
        print(f"❌ Unable to read uploaded file format")
        print(f"   Type: {type(file_upload.value)}")
        print(f"   Length: {len(file_upload.value) if hasattr(file_upload.value, '__len__') else 'No length'}")
        print(f"   Content: {file_upload.value}")
    
    if uploaded_file and filename and file_content is not None:
        print(f"📁 Loading questions from: {filename}")
        
        raw_questions = load_questions_from_file(file_content, filename)
        
        if raw_questions:
            test_questions = validate_questions(raw_questions)
            print(f"✅ Loaded {len(test_questions)} test questions")
            
            # Display first few questions as preview
            print("\n📋 Preview of loaded questions:")
            for i, q in enumerate(test_questions[:3]):
                print(f"  {i+1}. {q['question']} [{q['customer_type']}]")
            if len(test_questions) > 3:
                print(f"  ... and {len(test_questions) - 3} more questions")
                
            # Show distribution
            df_preview = pd.DataFrame(test_questions)
            print("\n📊 Question Distribution:")
            print(f"  Customer types: {dict(df_preview['customer_type'].value_counts())}")
            print(f"  Complexities: {dict(df_preview['complexity'].value_counts())}")
            
        else:
            print("❌ No questions loaded from file")
    else:
        print("❌ Error accessing uploaded file")
        print(f"   uploaded_file: {uploaded_file is not None}")
        print(f"   filename: {filename}")
        print(f"   file_content: {file_content is not None}")
else:
    print("⚠️ Please upload a JSON file with test questions.")
    print("💡 Use question_generator.ipynb to create test questions first.")

📋 Using tuple format
🔍 File content type: <class 'memoryview'>
📁 Loading questions from: test_questions_20250718_1744.json
📄 Loaded file with metadata:
  Generation model: anthropic_general_standard
  Generation timestamp: 20250718_1744
  Question count: 20
✅ Loaded 20 test questions

📋 Preview of loaded questions:
  1. Why did my premium increase by $200? This is ridiculous - I haven't had any claims! [frustrated]
  2. I just had a car accident. What do I need to do right now? [urgent]
  3. Can someone explain what a deductible is? I keep seeing this word but don't understand. [confused]
  ... and 17 more questions

📊 Question Distribution:
  Customer types: {'frustrated': np.int64(5), 'urgent': np.int64(5), 'confused': np.int64(5), 'normal': np.int64(5)}
  Complexities: {'medium': np.int64(7), 'simple': np.int64(7), 'complex': np.int64(6)}


## Step 4: Review and Edit Questions

Review the loaded questions and make any edits before processing.

In [6]:
# Create interactive question editor
if test_questions:
    print("📝 Question Editor - You can modify questions before processing")
    print("Edit the questions in the table below, then run the next cell to process them.\n")
    
    # Convert to DataFrame for easy editing
    df = pd.DataFrame(test_questions)
    
    # Ensure required columns exist
    if 'id' not in df.columns:
        df['id'] = range(1, len(df) + 1)
    if 'customer_type' not in df.columns:
        df['customer_type'] = 'normal'
    if 'complexity' not in df.columns:
        df['complexity'] = 'medium'
    
    # Display editable table
    print("Current questions (you can edit the JSON below if needed):")
    display(df)
    
    # Show JSON for manual editing if needed
    questions_json = widgets.Textarea(
        value=json.dumps(test_questions, indent=2),
        description="Questions JSON:",
        layout=widgets.Layout(width='100%', height='200px')
    )
    
    print("\nAdvanced: Edit questions as JSON (optional):")
    display(questions_json)
    
    def update_questions_from_json():
        """Update questions from the JSON editor"""
        global test_questions
        try:
            test_questions = json.loads(questions_json.value)
            print("✅ Questions updated from JSON editor")
        except Exception as e:
            print(f"❌ Error parsing JSON: {e}")
    
    # Button to update from JSON
    update_btn = widgets.Button(description="Update from JSON")
    update_btn.on_click(lambda b: update_questions_from_json())
    display(update_btn)
    
else:
    print("⚠️ No questions loaded. Please upload a questions file in the previous step.")

📝 Question Editor - You can modify questions before processing
Edit the questions in the table below, then run the next cell to process them.

Current questions (you can edit the JSON below if needed):


Unnamed: 0,id,question,customer_type,complexity
0,1,Why did my premium increase by $200? This is r...,frustrated,medium
1,2,I just had a car accident. What do I need to d...,urgent,simple
2,3,Can someone explain what a deductible is? I ke...,confused,simple
3,4,How do I add my teenage daughter to my auto po...,normal,simple
4,5,I've been a customer for 15 years and you deni...,frustrated,complex
5,6,What's the difference between comprehensive an...,confused,medium
6,7,My house just flooded - I need emergency assis...,urgent,complex
7,8,Can I get proof of insurance sent to my phone?...,urgent,simple
8,9,How do I update my billing information?,normal,simple
9,10,What exactly is an 'act of God' and why isn't ...,confused,medium



Advanced: Edit questions as JSON (optional):


Textarea(value='[\n  {\n    "id": 1,\n    "question": "Why did my premium increase by $200? This is ridiculous…

Button(description='Update from JSON', style=ButtonStyle())

## Step 5: Process Questions Through Chatbot Agent

Run the questions through the Chatbot Agent and collect results.

In [7]:
# Process questions through the Chatbot Agent
if not test_questions:
    print("⚠️ No questions to process. Please load or generate questions first.")
else:
    print(f"🤖 Processing {len(test_questions)} questions through Chatbot Agent...")
    
    # Get model preference from current temp config
    try:
        from ruamel.yaml import YAML
        yaml = YAML()
        with open(temp_file_paths['models_config'], 'r') as f:
            models_config = yaml.load(f)
        agent_preferred_model = models_config.get('primary_model', models_config.get('preferred', 'unknown'))
    except:
        agent_preferred_model = 'unknown'
    
    # Get temperature from agent config
    try:
        with open(temp_file_paths['agent_config'], 'r') as f:
            agent_config = yaml.load(f)
        agent_temperature = agent_config.get('settings', {}).get('temperature', 0.5)
    except:
        agent_temperature = 0.5
    
    print(f"Using model: {agent_preferred_model}")
    print(f"Temperature: {agent_temperature}")
    print("\n" + "="*50)
    
    def analyze_chatbot_response(response: str, confidence: float) -> dict:
        """
        Analyze chatbot response to determine escalation and input needs
        """
        response_lower = response.lower()
        
        # Escalation indicators - phrases that suggest referring to human/department
        escalation_phrases = [
            "let me transfer you to",
            "i'll connect you with",
            "please contact our",
            "speak with a specialist",
            "escalate this to",
            "connect you with the",
            "transfer to our",
            "speak with someone from",
            "you'll need to contact",
            "reach out to our",
            "i recommend speaking with",
            "forward this to"
        ]
        
        # More input indicators - phrases requesting additional information
        input_request_phrases = [
            "could you provide",
            "can you tell me more about",
            "what is your",
            "to better assist you",
            "i need more information",
            "could you clarify",
            "can you specify",
            "what type of",
            "please let me know",
            "can you share",
            "what specific",
            "which policy"
        ]
        
        # Question indicators (asking user for info to continue conversation)
        question_patterns = [
            "could you",
            "can you",
            "would you mind",
            "please provide",
            "do you have",
            "what is",
            "which",
            "when did",
            "?"
        ]
        
        # Check for escalation intent
        escalation_detected = any(phrase in response_lower for phrase in escalation_phrases)
        
        # Check for information requests
        input_needed = any(phrase in response_lower for phrase in input_request_phrases)
        
        # Check if response is primarily asking questions to continue conversation
        question_count = response.count('?')
        is_asking_questions = question_count > 0 and any(pattern in response_lower for pattern in question_patterns)
        
        # Confidence-based escalation (very low confidence = likely needs human help)
        low_confidence_escalation = confidence < 0.6
        
        # Final determination with priority logic
        needs_escalation = escalation_detected or low_confidence_escalation
        # Only flag needs_more_input if NOT escalating (escalation takes priority)
        needs_more_input = (input_needed or is_asking_questions) and not needs_escalation
        
        return {
            'needs_escalation': needs_escalation,
            'needs_more_input': needs_more_input,
            'escalation_reason': 'explicit_transfer' if escalation_detected else 'low_confidence' if low_confidence_escalation else None,
            'input_reason': 'information_request' if input_needed else 'clarifying_questions' if is_asking_questions else None,
            'question_count': question_count,
            'confidence_score': confidence,
            'analysis_details': {
                'escalation_detected': escalation_detected,
                'input_needed': input_needed,
                'is_asking_questions': is_asking_questions,
                'low_confidence': low_confidence_escalation
            }
        }
    
    # Create a simple mock context provider for testing
    class MockContextProvider:
        """Simple mock context provider for testing"""
        
        def get_context_summary(self, user_id: str, session_id: str) -> dict:
            """Return a simple mock context summary"""
            return {
                'entries_count': 0,
                'type_breakdown': {},
                'recent_queries': [],
                'escalation_count': 0,
                'last_activity': None
            }
        
        def save_context_entry(self, entry) -> bool:
            """Mock save method"""
            return True
        
        def get_recent_context(self, user_id: str, session_id: str, limit: int = 10) -> list:
            """Mock recent context method"""
            return []
    
    # Initialize the Chatbot Agent
    try:
        # Create config manager with the correct config directory
        config_manager = AgentConfigManager(config_dir='/workspace/config')
        
        # Create a mock context provider for testing
        context_provider = MockContextProvider()
        
        # Initialize Chatbot Agent
        chatbot_agent = ChatbotAgentNode(config_manager, context_provider)
        
        print("✅ Chatbot Agent initialized successfully")
        
    except Exception as e:
        print(f"❌ Error initializing Chatbot Agent: {e}")
        print("Continuing with mock responses for demonstration...")
        chatbot_agent = None
    
    # Process each question
    results = []
    
    for i, question_data in enumerate(test_questions):
        question_id = question_data.get('id', i + 1)
        question_text = question_data.get('question', '')
        customer_type = question_data.get('customer_type', 'normal')
        complexity = question_data.get('complexity', 'medium')
        
        print(f"\n🔄 Processing question {question_id}: {question_text[:60]}...")
        
        try:
            if chatbot_agent:
                # Create state for the Chatbot Agent
                from datetime import datetime
                state = {
                    'query': question_text,
                    'user_id': 'test_user',
                    'session_id': f'test_session_{i}',
                    'query_id': f'query_{question_id}',
                    'timestamp': datetime.now().isoformat(),
                    'messages': []
                }
                
                # Process through agent using __call__ method
                response_state = chatbot_agent(state)
                
                answer = response_state.get('ai_response', 'No response generated')
                confidence = response_state.get('initial_assessment', {}).get('confidence', 0.8)
                
                # Use improved flag analysis
                flag_analysis = analyze_chatbot_response(answer, confidence)
                needs_escalation = flag_analysis['needs_escalation']
                needs_more_input = flag_analysis['needs_more_input']
                
            else:
                # Mock processing for demonstration
                import random
                answer = f"Thank you for your question about {question_text[:30]}... I'd be happy to help you with that. [This is a mock response for demonstration]"
                confidence = random.uniform(0.6, 0.95)
                
                # Use improved flag analysis even for mock responses
                flag_analysis = analyze_chatbot_response(answer, confidence)
                needs_escalation = flag_analysis['needs_escalation']
                needs_more_input = flag_analysis['needs_more_input']
            
            # Create result entry with enhanced analysis
            result = {
                'id': question_id,
                'original_question': question_text,
                'customer_type': customer_type,
                'complexity': complexity,
                'ai_answer': answer,
                'confidence_score': confidence,
                'needs_escalation': needs_escalation,
                'needs_more_input': needs_more_input,
                'escalation_reason': flag_analysis.get('escalation_reason'),
                'input_reason': flag_analysis.get('input_reason'),
                'question_count': flag_analysis.get('question_count', 0),
                'analysis_details': flag_analysis.get('analysis_details', {}),
                'processing_time': datetime.now().isoformat(),
                'model_used': agent_preferred_model,
                'temperature': agent_temperature
            }
            
            results.append(result)
            
            # Show progress with enhanced status
            if needs_more_input:
                reason = flag_analysis.get('input_reason', 'unknown')
                status = f"🔄 Needs more input ({reason})"
            elif needs_escalation:
                reason = flag_analysis.get('escalation_reason', 'unknown')
                status = f"⚠️ Escalation needed ({reason})"
            else:
                status = "✅ Complete"
            
            print(f"   {status} (confidence: {confidence:.2f})")
            
        except Exception as e:
            print(f"   ❌ Error processing question: {e}")
            # Add error result
            result = {
                'id': question_id,
                'original_question': question_text,
                'customer_type': customer_type,
                'complexity': complexity,
                'ai_answer': f"Error processing question: {e}",
                'confidence_score': 0.0,
                'needs_escalation': True,
                'needs_more_input': False,
                'escalation_reason': 'processing_error',
                'input_reason': None,
                'question_count': 0,
                'analysis_details': {'error': True},
                'processing_time': datetime.now().isoformat(),
                'model_used': agent_preferred_model,
                'temperature': agent_temperature,
                'error': str(e)
            }
            results.append(result)
    
    print("\n" + "="*50)
    print(f"✅ Processing complete! Processed {len(results)} questions.")
    
    # Enhanced summary statistics
    total_questions = len(results)
    needs_more_input_count = sum(1 for r in results if r['needs_more_input'])
    needs_escalation_count = sum(1 for r in results if r['needs_escalation'])
    avg_confidence = sum(r['confidence_score'] for r in results) / total_questions if total_questions > 0 else 0
    
    # Breakdown by escalation reason
    escalation_reasons = {}
    input_reasons = {}
    for r in results:
        if r['needs_escalation'] and r['escalation_reason']:
            escalation_reasons[r['escalation_reason']] = escalation_reasons.get(r['escalation_reason'], 0) + 1
        if r['needs_more_input'] and r['input_reason']:
            input_reasons[r['input_reason']] = input_reasons.get(r['input_reason'], 0) + 1
    
    print(f"\n📊 Enhanced Summary:")
    print(f"  Total questions: {total_questions}")
    print(f"  Need more input: {needs_more_input_count} ({needs_more_input_count/total_questions*100:.1f}%)")
    if input_reasons:
        print(f"    Reasons: {input_reasons}")
    print(f"  Need escalation: {needs_escalation_count} ({needs_escalation_count/total_questions*100:.1f}%)")
    if escalation_reasons:
        print(f"    Reasons: {escalation_reasons}")
    print(f"  Average confidence: {avg_confidence:.2f}")

🤖 Processing 20 questions through Chatbot Agent...
Using model: local_general_standard
Temperature: 0.7

✅ 12:49:34.894 [INFO    ] factory         | Attempting to create provider: local_general_standard
✅ Creating LLM provider: local_general_standard → llama-7b (llama)
✅ 12:49:34.898 [INFO    ] unknown         | Initializing LLM provider | model_name=unknown


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 12:49:38.920 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
✅ 12:49:38.921 [INFO    ] factory         | Successfully created provider: local_general_standard
✅ 12:49:38.922 [INFO    ] chatbot_agent   | Chatbot Agent LLM provider initialized | operation=initialize_llm_provider model_name=unknown
✅ Chatbot Agent initialized successfully

🔄 Processing question 1: Why did my premium increase by $200? This is ridiculous - I ...
✅ 12:49:59.322 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   🔄 Needs more input (clarifying_questions) (confidence: 0.90)

🔄 Processing question 2: I just had a car accident. What do I need to do right now?...
✅ 12:50:04.994 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ✅ Complete (confidence: 0.85)

🔄 Processing question 3: Can someone explain what a deductible is? I keep seei

## Step 6: Complete Conversations with Customer AI

Continue conversations between customer AI and chatbot AI until natural resolution or escalation.

In [9]:
# Complete conversations with customer AI simulation
if 'results' in locals() and results:
    import random  # Add missing import
    
    def create_customer_ai_simulator(customer_type, complexity, model_name='anthropic_general_budget'):
        """Create a customer AI simulator based on customer profile"""
        
        # Define customer personas
        customer_personas = {
            'frustrated': {
                'style': 'Impatient, demanding, may use caps or exclamation points. Wants quick resolution.',
                'follow_up_tendency': 'high',
                'satisfaction_threshold': 0.8,
                'escalation_patience': 2  # Will demand escalation after 2 responses if not satisfied
            },
            'urgent': {
                'style': 'Time-sensitive, focused on immediate action. Professional but hurried.',
                'follow_up_tendency': 'high', 
                'satisfaction_threshold': 0.75,
                'escalation_patience': 3
            },
            'confused': {
                'style': 'Asks many clarifying questions, needs simple explanations. Polite but persistent.',
                'follow_up_tendency': 'very_high',
                'satisfaction_threshold': 0.7,
                'escalation_patience': 4
            },
            'normal': {
                'style': 'Professional, patient, reasonable expectations.',
                'follow_up_tendency': 'medium',
                'satisfaction_threshold': 0.65,
                'escalation_patience': 3
            }
        }
        
        # Define complexity behaviors
        complexity_behaviors = {
            'simple': {
                'satisfaction_boost': 0.1,  # Easier to satisfy
                'max_turns': 3,
                'question_depth': 'surface-level'
            },
            'medium': {
                'satisfaction_boost': 0.0,
                'max_turns': 5,
                'question_depth': 'moderate detail'
            },
            'complex': {
                'satisfaction_boost': -0.1,  # Harder to satisfy
                'max_turns': 7,
                'question_depth': 'detailed technical'
            }
        }
        
        persona = customer_personas.get(customer_type, customer_personas['normal'])
        behavior = complexity_behaviors.get(complexity, complexity_behaviors['medium'])
        
        class CustomerAISimulator:
            def __init__(self, customer_type, complexity):
                self.customer_type = customer_type  # Store customer type as instance variable
                self.complexity = complexity  # Store complexity as instance variable
                self.persona = persona
                self.behavior = behavior
                self.turn_count = 0
                self.satisfaction_level = 0.0
                self.conversation_history = []
                
                # Create LLM for customer simulation
                try:
                    provider_factory = LLMProviderFactory()
                    self.llm = provider_factory.create_provider(model_name)
                except Exception as e:
                    print(f"Warning: Could not create customer AI simulator LLM: {e}")
                    self.llm = None
            
            def generate_response(self, chatbot_response, original_question):
                """Generate customer response to chatbot"""
                self.turn_count += 1
                
                # Analyze satisfaction with chatbot response
                satisfaction_score = self._analyze_satisfaction(chatbot_response)
                self.satisfaction_level = satisfaction_score
                
                # Check if conversation should end
                is_satisfied = satisfaction_score >= (self.persona['satisfaction_threshold'] + self.behavior['satisfaction_boost'])
                is_escalating = self.turn_count >= self.persona['escalation_patience'] and satisfaction_score < 0.6
                is_max_turns = self.turn_count >= self.behavior['max_turns']
                
                if is_satisfied:
                    return self._generate_satisfaction_response()
                elif is_escalating:
                    return self._generate_escalation_request()
                elif is_max_turns:
                    return self._generate_final_response()
                else:
                    return self._generate_follow_up_question(chatbot_response, original_question)
            
            def _analyze_satisfaction(self, response):
                """Analyze how satisfied the customer would be with the response"""
                response_lower = response.lower()
                
                # Positive indicators
                positive_score = 0
                if any(phrase in response_lower for phrase in ['specifically', "here's how", 'i can help', 'let me explain']):
                    positive_score += 0.2
                if any(phrase in response_lower for phrase in ['immediately', 'right away', 'quickly']):
                    positive_score += 0.15
                if len(response) > 100:  # Detailed response
                    positive_score += 0.1
                
                # Negative indicators  
                negative_score = 0
                if any(phrase in response_lower for phrase in ['i need more information', 'could you provide', 'what type']):
                    negative_score += 0.3  # Asking for more info is frustrating
                if any(phrase in response_lower for phrase in ['unfortunately', 'however', 'but']):
                    negative_score += 0.1
                if response.count('?') > 2:  # Too many questions back
                    negative_score += 0.2
                
                # Base satisfaction varies by customer type - use self.customer_type
                base_satisfaction = {
                    'frustrated': 0.3,
                    'urgent': 0.4, 
                    'confused': 0.5,
                    'normal': 0.6
                }.get(self.customer_type, 0.5)
                
                return max(0.0, min(1.0, base_satisfaction + positive_score - negative_score))
            
            def _generate_satisfaction_response(self):
                """Generate a satisfied customer response"""
                satisfied_responses = {
                    'frustrated': [
                        "Okay, that makes sense. Thanks for clearing that up.",
                        "Finally! Thank you for the explanation.",
                        "Alright, I understand now. That helps."
                    ],
                    'urgent': [
                        "Perfect, that's exactly what I needed to know. Thank you!",
                        "Great, I'll do that right away. Thanks for the quick help!",
                        "Excellent, that answers my question. Much appreciated."
                    ],
                    'confused': [
                        "Oh I see! That makes much more sense now. Thank you for explaining it so clearly.",
                        "Thank you! That explanation really helped me understand.",
                        "Perfect! Now I get it. I really appreciate your patience."
                    ],
                    'normal': [
                        "Thank you for the helpful information. That resolves my question.",
                        "Great, that's exactly what I needed to know. Thanks!",
                        "Perfect, I understand now. Thank you for your assistance."
                    ]
                }
                
                return random.choice(satisfied_responses.get(self.customer_type, satisfied_responses['normal']))
            
            def _generate_escalation_request(self):
                """Generate escalation request"""
                escalation_responses = {
                    'frustrated': [
                        "This isn't working. I need to speak to someone who can actually help me!",
                        "I'm getting nowhere with this. Transfer me to a supervisor NOW!",
                        "Enough! Get me a human who knows what they're doing!"
                    ],
                    'urgent': [
                        "I need this resolved immediately. Can you transfer me to someone who can handle this urgently?",
                        "Time is critical here. I need to speak with a specialist right away.",
                        "This is urgent - please connect me with someone who can resolve this now."
                    ],
                    'confused': [
                        "I'm still really confused. Could you please connect me with someone who can walk me through this step by step?",
                        "I don't think I'm understanding this correctly. Can I speak with someone who can explain this more simply?",
                        "I'm getting more confused. Could you transfer me to someone who specializes in helping customers like me?"
                    ],
                    'normal': [
                        "I think I need to speak with a specialist about this. Could you please transfer me?",
                        "This seems like it might require human expertise. Can you connect me with the right department?",
                        "I'd like to speak with someone who can provide more detailed assistance."
                    ]
                }
                
                return random.choice(escalation_responses.get(self.customer_type, escalation_responses['normal']))
            
            def _generate_final_response(self):
                """Generate final response when max turns reached"""
                final_responses = {
                    'frustrated': "Look, I've been going in circles here. Just transfer me to someone else.",
                    'urgent': "I've spent too much time on this already. I need to speak with a human.",
                    'confused': "I'm still not clear on this. I think I need to talk to someone in person.",
                    'normal': "I think this might be beyond what we can resolve here. Could you transfer me to the appropriate department?"
                }
                
                return final_responses.get(self.customer_type, final_responses['normal'])
            
            def _generate_follow_up_question(self, chatbot_response, original_question):
                """Generate intelligent follow-up question using AI if available"""
                
                if self.llm:
                    # Use AI to generate contextual follow-up
                    prompt = f'''You are a {self.customer_type} customer with a {self.complexity} question about insurance. 
                    
Your personality: {self.persona['style']}
Question complexity: {self.behavior['question_depth']}
Turn {self.turn_count} of conversation.

Original question: {original_question}
Chatbot's response: {chatbot_response}

Generate a follow-up response that a {self.customer_type} customer would realistically ask. Be specific to the chatbot's response and maintain the personality. Make it {self.behavior['question_depth']} in nature.

Respond as the customer would (2-3 sentences max):'''
                    
                    try:
                        response = self.llm.generate_response(prompt)
                        
                        if response and len(response.strip()) > 10:
                            return response.strip()
                    except Exception as e:
                        print(f"Warning: AI follow-up generation failed: {e}")
                
                # Fallback to template-based responses
                return self._generate_template_follow_up(chatbot_response, original_question)
            
            def _generate_template_follow_up(self, chatbot_response, original_question):
                """Generate template-based follow-up questions"""
                
                # Analyze what the chatbot asked for
                response_lower = chatbot_response.lower()
                
                if 'policy number' in response_lower:
                    return f"My policy number is POL-{random.randint(100000, 999999)}. Now what?"
                elif 'claim number' in response_lower:
                    return f"It's claim #{random.randint(10000, 99999)}. What's the next step?"
                elif 'type of coverage' in response_lower or 'which coverage' in response_lower:
                    coverage_types = ['comprehensive', 'collision', 'liability', 'homeowners', 'auto']
                    return f"I have {random.choice(coverage_types)} coverage. Does that help?"
                elif 'when did' in response_lower or 'what date' in response_lower:
                    return "This happened yesterday around 3 PM. What do I do now?"
                elif 'how much' in response_lower or 'what amount' in response_lower:
                    return f"It's about ${random.randint(500, 5000)} in damage. What's next?"
                elif 'documents' in response_lower or 'paperwork' in response_lower:
                    return "I have photos and a police report. How do I submit them?"
                else:
                    # Generic follow-ups based on customer type
                    generic_followups = {
                        'frustrated': "That doesn't really answer my question. Can you be more specific?",
                        'urgent': "Okay, but what do I do RIGHT NOW? This is time-sensitive!",
                        'confused': "I'm still not sure I understand. Can you explain it differently?",
                        'normal': "Could you provide more specific steps on what I should do next?"
                    }
                    return generic_followups.get(self.customer_type, generic_followups['normal'])
        
        return CustomerAISimulator(customer_type, complexity)
    
    def conduct_full_conversation(question_data, max_conversation_turns=8):
        """Conduct a full conversation between customer AI and chatbot until resolution"""
        
        question_id = question_data.get('id', 1)
        question_text = question_data.get('question', '')
        customer_type = question_data.get('customer_type', 'normal')
        complexity = question_data.get('complexity', 'medium')
        
        # Get the model from questions file or use default
        model_to_use = question_data.get('model', agent_preferred_model)
        
        print(f"🗣️ Starting full conversation for question {question_id}")
        print(f"   Customer: {customer_type}, Complexity: {complexity}, Model: {model_to_use}")
        
        # Create customer AI simulator
        try:
            customer_ai = create_customer_ai_simulator(customer_type, complexity, model_to_use)
        except Exception as e:
            print(f"   ❌ Error creating customer AI: {e}")
            return {
                'id': question_id,
                'original_question': question_text,
                'customer_type': customer_type,
                'complexity': complexity,
                'error': f"Failed to create customer AI: {e}",
                'conversation_complete': False,
                'final_outcome': 'error'
            }
        
        # Initialize conversation
        conversation_history = []
        current_query = question_text
        turn_number = 1
        
        # Conversation loop
        while turn_number <= max_conversation_turns:
            print(f"   Turn {turn_number}: Processing...")
            
            try:
                if chatbot_agent:
                    # Create state for chatbot
                    state = {
                        'query': current_query,
                        'user_id': 'test_user',
                        'session_id': f'conversation_session_{question_id}',
                        'query_id': f'query_{question_id}_turn_{turn_number}',
                        'timestamp': datetime.now().isoformat(),
                        'messages': []  # Start with empty messages for each turn
                    }
                    
                    # Get chatbot response
                    response_state = chatbot_agent(state)
                    chatbot_response = response_state.get('ai_response', 'No response generated')
                    confidence = response_state.get('initial_assessment', {}).get('confidence', 0.8)
                    
                else:
                    # Mock chatbot response
                    chatbot_response = f"Thank you for your question. Let me help you with that... [Mock response turn {turn_number}]"
                    confidence = 0.8
                
                # Add to conversation history
                conversation_turn = {
                    'turn_number': turn_number,
                    'customer_query': current_query,
                    'chatbot_response': chatbot_response,
                    'confidence': confidence,
                    'timestamp': datetime.now().isoformat()
                }
                conversation_history.append(conversation_turn)
                
                # Analyze chatbot response for escalation
                response_analysis = analyze_chatbot_response(chatbot_response, confidence)
                
                # Check if chatbot is escalating
                if response_analysis['needs_escalation']:
                    print(f"   🔄 Chatbot escalating: {response_analysis['escalation_reason']}")
                    conversation_turn['chatbot_action'] = 'escalation'
                    conversation_turn['escalation_reason'] = response_analysis['escalation_reason']
                    break
                
                # Generate customer response using AI simulator
                try:
                    customer_response = customer_ai.generate_response(chatbot_response, question_text)
                except Exception as e:
                    print(f"   ❌ Error generating customer response: {e}")
                    customer_response = "I'm having trouble understanding this. Can you help me differently?"
                
                # Add customer response to turn
                conversation_turn['customer_response'] = customer_response
                conversation_turn['customer_satisfaction'] = customer_ai.satisfaction_level
                
                # Check if customer is satisfied (conversation ends)
                if any(phrase in customer_response.lower() for phrase in ['thank you', 'that helps', 'perfect', 'great', 'excellent', 'makes sense']):
                    if customer_ai.satisfaction_level >= customer_ai.persona['satisfaction_threshold']:
                        print(f"   ✅ Customer satisfied!")
                        conversation_turn['customer_action'] = 'satisfied'
                        break
                
                # Check if customer is requesting escalation
                if any(phrase in customer_response.lower() for phrase in ['transfer', 'supervisor', 'specialist', 'human', 'someone else']):
                    print(f"   ⬆️ Customer requesting escalation")
                    conversation_turn['customer_action'] = 'escalation_request'
                    break
                
                # Prepare for next turn
                current_query = customer_response
                turn_number += 1
                
            except Exception as e:
                print(f"   ❌ Error in turn {turn_number}: {e}")
                conversation_turn = {
                    'turn_number': turn_number,
                    'customer_query': current_query,
                    'error': str(e),
                    'timestamp': datetime.now().isoformat()
                }
                conversation_history.append(conversation_turn)
                break
        
        # Determine final outcome
        final_turn = conversation_history[-1] if conversation_history else {}
        final_outcome = final_turn.get('customer_action', final_turn.get('chatbot_action', 'conversation_incomplete'))
        
        conversation_result = {
            'id': question_id,
            'original_question': question_text,
            'customer_type': customer_type,
            'complexity': complexity,
            'model_used': model_to_use,
            'conversation_history': conversation_history,
            'total_turns': len(conversation_history),
            'final_outcome': final_outcome,
            'customer_satisfaction': customer_ai.satisfaction_level if customer_ai else 0.0,
            'processing_time': datetime.now().isoformat(),
            'conversation_complete': final_outcome in ['satisfied', 'escalation', 'escalation_request']
        }
        
        print(f"   📋 Conversation complete: {final_outcome} in {len(conversation_history)} turns")
        return conversation_result
    
    # Process conversations for all questions
    print(f"🤖 Starting full conversations for {len(test_questions)} questions...")
    print(f"This will simulate realistic customer-chatbot interactions until resolution.\n")
    
    conversation_results = []
    
    for i, question_data in enumerate(test_questions):
        try:
            conversation_result = conduct_full_conversation(question_data)
            conversation_results.append(conversation_result)
            
            # Show progress
            outcome = conversation_result['final_outcome']
            turns = conversation_result['total_turns']
            satisfaction = conversation_result.get('customer_satisfaction', 0)
            
            print(f"Question {question_data['id']}: {outcome} ({turns} turns, satisfaction: {satisfaction:.2f})")
            
        except Exception as e:
            print(f"❌ Error processing conversation {question_data['id']}: {e}")
            # Add error result
            error_result = {
                'id': question_data['id'],
                'original_question': question_data.get('question', ''),
                'customer_type': question_data.get('customer_type', 'normal'),
                'complexity': question_data.get('complexity', 'medium'),
                'error': str(e),
                'conversation_complete': False,
                'final_outcome': 'error'
            }
            conversation_results.append(error_result)
    
    print(f"\n✅ All conversations completed!")
    
    # Update results with conversation data
    results = conversation_results
    
    # Summary statistics
    completed_conversations = [r for r in results if r.get('conversation_complete', False)]
    satisfied_customers = [r for r in results if r.get('final_outcome') == 'satisfied']
    escalated_conversations = [r for r in results if 'escalation' in r.get('final_outcome', '')]
    
    avg_turns = sum(r.get('total_turns', 0) for r in completed_conversations) / len(completed_conversations) if completed_conversations else 0
    avg_satisfaction = sum(r.get('customer_satisfaction', 0) for r in completed_conversations) / len(completed_conversations) if completed_conversations else 0
    
    print(f"\n📊 Conversation Summary:")
    print(f"  Completed conversations: {len(completed_conversations)}/{len(results)}")
    print(f"  Customer satisfaction: {len(satisfied_customers)} ({len(satisfied_customers)/len(results)*100:.1f}%)")
    print(f"  Escalations: {len(escalated_conversations)} ({len(escalated_conversations)/len(results)*100:.1f}%)")
    print(f"  Average turns per conversation: {avg_turns:.1f}")
    print(f"  Average customer satisfaction: {avg_satisfaction:.2f}")

else:
    print("⚠️ No questions loaded. Please load questions first.")

🤖 Starting full conversations for 20 questions...
This will simulate realistic customer-chatbot interactions until resolution.

🗣️ Starting full conversation for question 1
   Customer: frustrated, Complexity: medium, Model: local_general_standard
✅ Creating LLM provider: local_general_standard → llama-7b (llama)
✅ 13:02:59.823 [INFO    ] unknown         | Initializing LLM provider | model_name=unknown


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:03:06.979 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:03:11.954 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:03:24.820 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:03:29.950 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:03:43.719 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 3 turns
Question 1: escalation_request (3 turns, satisfaction: 0.55)
🗣️ Starting full conversation for question 2
   Customer: urgent, Complexity: simple, Model: local_general_standard
✅ Creating LLM pro

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:03:45.639 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:03:59.354 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:04:14.831 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:04:21.430 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:04:29.603 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:04:47.255 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 3 turns
Question 2: escalation_request (3 turns, satisfaction: 0.65)
🗣

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:04:49.165 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:04:55.467 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:05:06.279 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:05:21.002 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:05:34.896 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:05:58.463 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 4: Processing...
✅ 13:06:06.503 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:06:39.011 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:06:42.742 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:06:50.411 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:06:55.408 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:07:02.363 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:07:18.759 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 3 turns
Question 4: escalation_request (3 turns, satisfaction: 0.70)
🗣

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:07:20.543 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:07:26.892 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:07:39.598 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 1 turns
Question 5: escalation_request (1 turns, satisfaction: 0.55)
🗣️ Starting full conversation for question 6
   Customer: confused, Complexity: medium, Model: local_general_standard
✅ Creating LLM provider: local_general_standard → llama-7b (llama)
✅ 13:07:39.824 [INFO    ] unknown         | Initializing LLM provider | model_name=unknown


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:07:41.313 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:07:59.750 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:08:16.201 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:08:40.179 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:08:54.672 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:09:08.562 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ✅ Customer satisfied!
   📋 Conversation complete: satisfied in 3 turns
Question 6: satisfied (3 turns, satisfaction: 0.80)
🗣️ Starting full conversation f

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:09:10.399 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:09:16.123 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:09:21.281 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:09:32.285 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:09:36.678 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 4: Processing...
✅ 13:09:41.556 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 5: Processing...
✅ 13:09:48.939 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown op

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:09:55.951 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:10:09.677 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:10:26.462 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:10:45.658 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:10:58.454 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:11:06.618 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 3 turns
Question 8: escalation_request (3 turns, satisfaction: 0.65)
🗣

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:11:08.398 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:11:16.714 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:11:27.257 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:11:40.688 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:11:49.805 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:12:01.178 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 3 turns
Question 9: escalation_request (3 turns, satisfaction: 0.70)
🗣

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:12:03.141 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:12:16.473 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:12:32.106 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:12:49.464 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:13:02.285 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:13:08.231 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:13:17.688 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 4: Processing...

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:13:43.802 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:13:57.946 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:14:11.119 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:14:15.328 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 2 turns
Question 11: escalation_request (2 turns, satisfaction: 0.30)
🗣️ Starting full conversation for question 12
   Customer: confused, Complexity: complex, Model: local_general_standard
✅ Creating LLM provider: local_general_standard → llama-7b (llama)
✅ 13:14:15.563 [INFO    ] unknown         | Initializing LLM provider | model_name=unknown


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:14:17.039 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:14:31.426 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:14:40.910 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ✅ Customer satisfied!
   📋 Conversation complete: satisfied in 2 turns
Question 12: satisfied (2 turns, satisfaction: 0.75)
🗣️ Starting full conversation for question 13
   Customer: normal, Complexity: medium, Model: local_general_standard
✅ Creating LLM provider: local_general_standard → llama-7b (llama)
✅ 13:14:41.151 [INFO    ] unknown         | Initializing LLM provider | model_name=unknown


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:14:42.615 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:14:47.723 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:14:57.818 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:15:04.487 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ✅ Customer satisfied!
   📋 Conversation complete: satisfied in 2 turns
Question 13: satisfied (2 turns, satisfaction: 0.85)
🗣️ Starting full conversation for question 14
   Customer: urgent, Complexity: complex, Model: local_general_standard
✅ Creating LLM provider: local_general_standard → llama-7b (llama)
✅ 13:15:04.723 [INFO    ] unknown         | Initializing LLM provider | model_name=unknown


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:15:06.229 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:15:11.567 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:15:21.259 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:15:39.205 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:15:52.049 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:16:11.617 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 3 turns
Question 14: escalation_request (3 turns, satisfaction: 0.55)


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:16:13.459 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:16:24.787 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:16:41.398 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:16:55.632 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:17:11.035 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 3 turns
Question 15: escalation_request (3 turns, satisfaction: 0.55)
🗣️ Starting full conversation for question 16
   Customer: normal, Complexity: simple, Model: local_general_standard
✅ Creating LLM p

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:17:12.823 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:17:30.595 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:17:45.213 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ✅ Customer satisfied!
   📋 Conversation complete: satisfied in 1 turns
Question 16: satisfied (1 turns, satisfaction: 0.70)
🗣️ Starting full conversation for question 17
   Customer: confused, Complexity: complex, Model: local_general_standard
✅ Creating LLM provider: local_general_standard → llama-7b (llama)
✅ 13:17:45.446 [INFO    ] unknown         | Initializing LLM provider | model_name=unknown


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:17:46.944 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:17:59.184 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:18:11.150 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:18:24.263 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:18:29.025 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:18:44.183 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 4: Processing...
✅ 13:19:04.836 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response

llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:19:25.474 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:19:41.848 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:19:54.901 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:19:59.859 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:20:08.024 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:20:13.345 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 3 turns
Question 18: escalation_request (3 turns, satisfaction: 0.40)


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:20:14.991 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:20:27.218 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:20:39.175 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:20:44.870 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 2 turns
Question 19: escalation_request (2 turns, satisfaction: 0.30)
🗣️ Starting full conversation for question 20
   Customer: normal, Complexity: simple, Model: local_general_standard
✅ Creating LLM provider: local_general_standard → llama-7b (llama)
✅ 13:20:45.128 [INFO    ] unknown         | Initializing LLM provider | model_name=unknown


llama_context: n_batch is less than GGML_KQ_MASK_PAD - increasing to 64
llama_context: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


✅ 13:20:46.663 [INFO    ] unknown         | LLM provider initialized successfully | model_name=unknown
   Turn 1: Processing...
✅ 13:20:50.138 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:20:58.678 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 2: Processing...
✅ 13:21:11.581 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
✅ 13:21:21.765 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   Turn 3: Processing...
✅ 13:21:37.639 [INFO    ] unknown         | Model call: unknown - generate_response | model_name=unknown operation=generate_response
   ⬆️ Customer requesting escalation
   📋 Conversation complete: escalation_request in 3 turns
Question 20: escalation_request (3 turns, satisfaction: 0.60)


## Step 7: Export Results

Export the results of the chatbot tester to a file.

In [12]:
# Export conversation results and settings
if 'results' in locals() and results:
    # Create timestamp for filenames
    timestamp = datetime.now().strftime("%Y%m%d_%H%M")
    
    # Create experiment_runs directory if it doesn't exist
    experiment_dir = Path(f'/workspace/notebooks/agent_testers/outputs/chatbot_interactions/{timestamp}')
    experiment_dir.mkdir(exist_ok=True)
    
    # Export results with full conversation data
    results_filename = f"chatbot_agent_output.json"
    results_path = experiment_dir / results_filename
    
    # Create comprehensive export with conversation analysis
    export_data = {
        'experiment_metadata': {
            'timestamp': timestamp,
            'agent_type': 'chatbot_agent',
            'experiment_type': 'full_conversation_simulation',
            'total_questions': len(results),
            'completed_conversations': len([r for r in results if r.get('conversation_complete', False)]),
            'generation_model': agent_preferred_model,
            'generation_timestamp': timestamp
        },
        'conversation_results': results,
        'summary_statistics': {
            'total_conversations': len(results),
            'completed_conversations': len([r for r in results if r.get('conversation_complete', False)]),
            'satisfied_customers': len([r for r in results if r.get('final_outcome') == 'satisfied']),
            'escalated_conversations': len([r for r in results if 'escalation' in r.get('final_outcome', '')]),
            'error_conversations': len([r for r in results if r.get('final_outcome') == 'error']),
            'average_turns': sum(r.get('total_turns', 0) for r in results) / len(results) if results else 0,
            'average_satisfaction': sum(r.get('customer_satisfaction', 0) for r in results) / len(results) if results else 0,
            'satisfaction_rate': len([r for r in results if r.get('final_outcome') == 'satisfied']) / len(results) * 100 if results else 0,
            'escalation_rate': len([r for r in results if 'escalation' in r.get('final_outcome', '')]) / len(results) * 100 if results else 0
        },
        'conversation_analysis': {
            'by_customer_type': {},
            'by_complexity': {},
            'by_outcome': {}
        }
    }
    
    # Analyze by customer type
    customer_types = list(set(r.get('customer_type', 'unknown') for r in results))
    for ctype in customer_types:
        ctype_results = [r for r in results if r.get('customer_type') == ctype]
        export_data['conversation_analysis']['by_customer_type'][ctype] = {
            'count': len(ctype_results),
            'satisfaction_rate': len([r for r in ctype_results if r.get('final_outcome') == 'satisfied']) / len(ctype_results) * 100 if ctype_results else 0,
            'escalation_rate': len([r for r in ctype_results if 'escalation' in r.get('final_outcome', '')]) / len(ctype_results) * 100 if ctype_results else 0,
            'average_turns': sum(r.get('total_turns', 0) for r in ctype_results) / len(ctype_results) if ctype_results else 0,
            'average_satisfaction': sum(r.get('customer_satisfaction', 0) for r in ctype_results) / len(ctype_results) if ctype_results else 0
        }
    
    # Analyze by complexity
    complexities = list(set(r.get('complexity', 'unknown') for r in results))
    for complexity in complexities:
        complexity_results = [r for r in results if r.get('complexity') == complexity]
        export_data['conversation_analysis']['by_complexity'][complexity] = {
            'count': len(complexity_results),
            'satisfaction_rate': len([r for r in complexity_results if r.get('final_outcome') == 'satisfied']) / len(complexity_results) * 100 if complexity_results else 0,
            'escalation_rate': len([r for r in complexity_results if 'escalation' in r.get('final_outcome', '')]) / len(complexity_results) * 100 if complexity_results else 0,
            'average_turns': sum(r.get('total_turns', 0) for r in complexity_results) / len(complexity_results) if complexity_results else 0,
            'average_satisfaction': sum(r.get('customer_satisfaction', 0) for r in complexity_results) / len(complexity_results) if complexity_results else 0
        }
    
    # Analyze by outcome
    outcomes = list(set(r.get('final_outcome', 'unknown') for r in results))
    for outcome in outcomes:
        outcome_results = [r for r in results if r.get('final_outcome') == outcome]
        export_data['conversation_analysis']['by_outcome'][outcome] = {
            'count': len(outcome_results),
            'percentage': len(outcome_results) / len(results) * 100 if results else 0,
            'average_turns': sum(r.get('total_turns', 0) for r in outcome_results) / len(outcome_results) if outcome_results else 0,
            'average_satisfaction': sum(r.get('customer_satisfaction', 0) for r in outcome_results) / len(outcome_results) if outcome_results else 0
        }
    
    with open(results_path, 'w') as f:
        json.dump(export_data, f, indent=2)
    
    print(f"📄 Conversation results exported to: {results_path}")
    
    # Export settings
    settings_filename = f"chatbot_agent_settings_{timestamp}.json"
    settings_path = experiment_dir / settings_filename
    
    # Get uploaded filename safely
    uploaded_filename = 'unknown'
    if file_upload.value:
        if isinstance(file_upload.value, tuple) and len(file_upload.value) > 0:
            uploaded_filename = file_upload.value[0]['name']
        elif isinstance(file_upload.value, dict) and len(file_upload.value) > 0:
            uploaded_filename = list(file_upload.value.values())[0]['metadata']['name']
    
    # Get current settings from temp config files
    try:
        from ruamel.yaml import YAML
        yaml = YAML()
        
        # Load current settings
        with open(temp_file_paths['models_config'], 'r') as f:
            models_config = yaml.load(f)
        with open(temp_file_paths['agent_config'], 'r') as f:
            agent_config = yaml.load(f)
        
        settings_export = {
            'experiment_info': {
                'timestamp': timestamp,
                'agent_type': 'chatbot_agent',
                'experiment_type': 'full_conversation_simulation',
                'total_questions': len(results),
                'completed_conversations': len([r for r in results if r.get('conversation_complete', False)]),
                'input_file': uploaded_filename
            },
            'model_settings': {
                'preferred_model': models_config.get('primary_model', models_config.get('preferred', 'unknown')),
                'fallback_models': models_config.get('fallback', []),
                'temperature': agent_config.get('settings', {}).get('temperature', 0.5),
                'max_tokens': agent_config.get('settings', {}).get('max_tokens', 2000),
                'timeout': agent_config.get('settings', {}).get('timeout', 30)
            },
            'behavior_settings': agent_config.get('behavior', {}),
            'escalation_settings': agent_config.get('escalation', {}),
            'conversation_settings': {
                'max_conversation_turns': 8,
                'customer_ai_enabled': True,
                'realistic_simulation': True
            },
            'configuration_files': {
                'agent_config': agent_config,
                'models_config': models_config
            }
        }
    except Exception as e:
        print(f"Warning: Could not load current settings: {e}")
        settings_export = {
            'experiment_info': {
                'timestamp': timestamp,
                'agent_type': 'chatbot_agent',
                'experiment_type': 'full_conversation_simulation',
                'error': f"Could not load settings: {e}"
            }
        }
    
    with open(settings_path, 'w') as f:
        json.dump(settings_export, f, indent=2)
    
    print(f"⚙️ Settings exported to: {settings_path}")
    
    # Show enhanced summary
    stats = export_data['summary_statistics']
    print(f"\\n📊 Export Summary:")
    print(f"  Timestamp: {timestamp}")
    print(f"  Results file: {results_filename}")
    print(f"  Settings file: {settings_filename}")
    print(f"  Total conversations: {stats['total_conversations']}")
    print(f"  Completed conversations: {stats['completed_conversations']}")
    print(f"  Satisfaction rate: {stats['satisfaction_rate']:.1f}%")
    print(f"  Escalation rate: {stats['escalation_rate']:.1f}%")
    print(f"  Average turns per conversation: {stats['average_turns']:.1f}")
    print(f"  Average customer satisfaction: {stats['average_satisfaction']:.2f}")
    print(f"  Files saved to: {experiment_dir.absolute()}")
    
else:
    print("⚠️ No conversation results to export. Please complete conversations first.")

📄 Conversation results exported to: /workspace/notebooks/agent_testers/outputs/chatbot_interactions/20250723_1330/chatbot_agent_output.json
⚙️ Settings exported to: /workspace/notebooks/agent_testers/outputs/chatbot_interactions/20250723_1330/chatbot_agent_settings_20250723_1330.json
\n📊 Export Summary:
  Timestamp: 20250723_1330
  Results file: chatbot_agent_output.json
  Settings file: chatbot_agent_settings_20250723_1330.json
  Total conversations: 20
  Completed conversations: 19
  Satisfaction rate: 30.0%
  Escalation rate: 65.0%
  Average turns per conversation: 3.3
  Average customer satisfaction: 0.61
  Files saved to: /workspace/notebooks/agent_testers/outputs/chatbot_interactions/20250723_1330


## Step 8: Review Results

Display and analyze the final results.

In [13]:
# Review and analyze conversation results
if 'results' in locals() and results:
    print("📋 Conversation Results Review and Analysis\n")
    
    # Create DataFrame for analysis
    df_results = pd.DataFrame([
        {
            'id': r.get('id'),
            'customer_type': r.get('customer_type'),
            'complexity': r.get('complexity'),
            'total_turns': r.get('total_turns', 0),
            'final_outcome': r.get('final_outcome'),
            'customer_satisfaction': r.get('customer_satisfaction', 0),
            'conversation_complete': r.get('conversation_complete', False)
        }
        for r in results
    ])
    
    print("=== CONVERSATION SUMMARY STATISTICS ===")
    print(f"Total conversations: {len(df_results)}")
    print(f"Completed conversations: {df_results['conversation_complete'].sum()}")
    print(f"Average turns per conversation: {df_results['total_turns'].mean():.1f}")
    print(f"Average customer satisfaction: {df_results['customer_satisfaction'].mean():.3f}")
    
    # Outcome distribution
    print("\n=== CONVERSATION OUTCOMES ===")
    outcome_counts = df_results['final_outcome'].value_counts()
    for outcome, count in outcome_counts.items():
        percentage = count / len(df_results) * 100
        print(f"{outcome}: {count} ({percentage:.1f}%)")
    
    # Analysis by customer type
    print("\n=== ANALYSIS BY CUSTOMER TYPE ===")
    customer_analysis = df_results.groupby('customer_type').agg({
        'total_turns': 'mean',
        'customer_satisfaction': 'mean',
        'conversation_complete': 'sum'
    }).round(3)
    customer_analysis.columns = ['Avg Turns', 'Avg Satisfaction', 'Completed']
    print(customer_analysis)
    
    # Satisfaction rate by customer type
    print("\nSatisfaction Rate by Customer Type:")
    for ctype in df_results['customer_type'].unique():
        ctype_data = df_results[df_results['customer_type'] == ctype]
        satisfied = len(ctype_data[ctype_data['final_outcome'] == 'satisfied'])
        total = len(ctype_data)
        print(f"  {ctype}: {satisfied}/{total} ({satisfied/total*100:.1f}%)")
    
    # Analysis by complexity
    print("\n=== ANALYSIS BY COMPLEXITY ===")
    complexity_analysis = df_results.groupby('complexity').agg({
        'total_turns': 'mean',
        'customer_satisfaction': 'mean',
        'conversation_complete': 'sum'
    }).round(3)
    complexity_analysis.columns = ['Avg Turns', 'Avg Satisfaction', 'Completed']
    print(complexity_analysis)
    
    # Show detailed results table
    print("\n=== CONVERSATION SUMMARY TABLE ===")
    display_df = df_results[['id', 'customer_type', 'complexity', 'total_turns', 'final_outcome', 'customer_satisfaction']].copy()
    display_df['customer_satisfaction'] = display_df['customer_satisfaction'].round(3)
    display(display_df)
    
    # Show sample full conversations
    print("\n=== SAMPLE FULL CONVERSATIONS ===")
    
    # Show 2 satisfied and 2 escalated conversations for analysis
    satisfied_conversations = [r for r in results if r.get('final_outcome') == 'satisfied']
    escalated_conversations = [r for r in results if 'escalation' in r.get('final_outcome', '')]
    
    def display_conversation(conversation_data, max_turns=3):
        """Display a conversation with turn-by-turn analysis"""
        conv_id = conversation_data.get('id')
        customer_type = conversation_data.get('customer_type')
        complexity = conversation_data.get('complexity')
        outcome = conversation_data.get('final_outcome')
        satisfaction = conversation_data.get('customer_satisfaction', 0)
        
        print(f"\n--- Conversation {conv_id} ({customer_type}, {complexity}) ---")
        print(f"Original Question: {conversation_data.get('original_question', '')}")
        print(f"Final Outcome: {outcome} (Customer Satisfaction: {satisfaction:.2f})")
        
        conversation_history = conversation_data.get('conversation_history', [])
        turns_to_show = min(max_turns, len(conversation_history))
        
        for i, turn in enumerate(conversation_history[:turns_to_show]):
            turn_num = turn.get('turn_number', i+1)
            print(f"\n  Turn {turn_num}:")
            print(f"    Customer: {turn.get('customer_query', '')[:150]}{'...' if len(turn.get('customer_query', '')) > 150 else ''}")
            print(f"    Chatbot:  {turn.get('chatbot_response', '')[:150]}{'...' if len(turn.get('chatbot_response', '')) > 150 else ''}")
            
            if 'customer_response' in turn:
                print(f"    Customer Reply: {turn.get('customer_response', '')[:150]}{'...' if len(turn.get('customer_response', '')) > 150 else ''}")
                print(f"    Satisfaction: {turn.get('customer_satisfaction', 0):.2f}")
        
        if len(conversation_history) > turns_to_show:
            print(f"  ... and {len(conversation_history) - turns_to_show} more turns")
    
    # Show satisfied conversations
    if satisfied_conversations:
        print("\n🟢 SATISFIED CUSTOMER CONVERSATIONS:")
        for conv in satisfied_conversations[:2]:
            display_conversation(conv)
    
    # Show escalated conversations  
    if escalated_conversations:
        print("\n🔴 ESCALATED CONVERSATIONS:")
        for conv in escalated_conversations[:2]:
            display_conversation(conv)
    
    # Conversation insights
    print("\n=== CONVERSATION INSIGHTS ===")
    
    # Turn analysis
    short_conversations = len(df_results[df_results['total_turns'] <= 2])
    medium_conversations = len(df_results[(df_results['total_turns'] > 2) & (df_results['total_turns'] <= 5)])
    long_conversations = len(df_results[df_results['total_turns'] > 5])
    
    print(f"Conversation Length Distribution:")
    print(f"  Short (1-2 turns): {short_conversations} ({short_conversations/len(df_results)*100:.1f}%)")
    print(f"  Medium (3-5 turns): {medium_conversations} ({medium_conversations/len(df_results)*100:.1f}%)")
    print(f"  Long (6+ turns): {long_conversations} ({long_conversations/len(df_results)*100:.1f}%)")
    
    # Satisfaction insights
    high_satisfaction = len(df_results[df_results['customer_satisfaction'] >= 0.8])
    medium_satisfaction = len(df_results[(df_results['customer_satisfaction'] >= 0.5) & (df_results['customer_satisfaction'] < 0.8)])
    low_satisfaction = len(df_results[df_results['customer_satisfaction'] < 0.5])
    
    print(f"\nCustomer Satisfaction Distribution:")
    print(f"  High (0.8+): {high_satisfaction} ({high_satisfaction/len(df_results)*100:.1f}%)")
    print(f"  Medium (0.5-0.8): {medium_satisfaction} ({medium_satisfaction/len(df_results)*100:.1f}%)")
    print(f"  Low (<0.5): {low_satisfaction} ({low_satisfaction/len(df_results)*100:.1f}%)")
    
    print("\n" + "="*80)
    print("✅ Conversation analysis complete! Check the exported files for full conversation details.")
    print("💡 Each conversation includes turn-by-turn customer-AI interaction data for detailed analysis.")
    
else:
    print("⚠️ No conversation results to review. Please complete conversations first.")

📋 Conversation Results Review and Analysis

=== CONVERSATION SUMMARY STATISTICS ===
Total conversations: 20
Completed conversations: 19
Average turns per conversation: 3.3
Average customer satisfaction: 0.605

=== CONVERSATION OUTCOMES ===
escalation_request: 13 (65.0%)
satisfied: 6 (30.0%)
conversation_incomplete: 1 (5.0%)

=== ANALYSIS BY CUSTOMER TYPE ===
               Avg Turns  Avg Satisfaction  Completed
customer_type                                        
confused             5.0              0.73          4
frustrated           2.2              0.45          5
normal               2.4              0.71          5
urgent               3.6              0.53          5

Satisfaction Rate by Customer Type:
  frustrated: 0/5 (0.0%)
  urgent: 0/5 (0.0%)
  confused: 4/5 (80.0%)
  normal: 2/5 (40.0%)

=== ANALYSIS BY COMPLEXITY ===
            Avg Turns  Avg Satisfaction  Completed
complexity                                        
complex         3.500             0.558          6
m

Unnamed: 0,id,customer_type,complexity,total_turns,final_outcome,customer_satisfaction
0,1,frustrated,medium,3,escalation_request,0.55
1,2,urgent,simple,3,escalation_request,0.65
2,3,confused,simple,8,conversation_incomplete,0.5
3,4,normal,simple,3,escalation_request,0.7
4,5,frustrated,complex,1,escalation_request,0.55
5,6,confused,medium,3,satisfied,0.8
6,7,urgent,complex,6,escalation_request,0.4
7,8,urgent,simple,3,escalation_request,0.65
8,9,normal,simple,3,escalation_request,0.7
9,10,confused,medium,5,satisfied,0.8



=== SAMPLE FULL CONVERSATIONS ===

🟢 SATISFIED CUSTOMER CONVERSATIONS:

--- Conversation 6 (confused, medium) ---
Original Question: What's the difference between comprehensive and collision coverage? I'm trying to decide what I need.
Final Outcome: satisfied (Customer Satisfaction: 0.80)

  Turn 1:
    Customer: What's the difference between comprehensive and collision coverage? I'm trying to decide what I need.
    Chatbot:  
- Use warm, professional language

Possible response:
"Hi there! I'm happy to help you understand the differences between comprehensive and collision...
    Customer Reply: "Okay, so let me make sure I understand. So comprehensive coverage is for things like theft or a tree falling on my car, but collision coverage is onl...
    Satisfaction: 0.50

  Turn 2:
    Customer: "Okay, so let me make sure I understand. So comprehensive coverage is for things like theft or a tree falling on my car, but collision coverage is onl...
    Chatbot:  
- Use warm, professiona