# Quality Agent Testing Notebook

This notebook provides a user-friendly interface for testing the Quality Agent with chatbot interactions.
It reads output from chatbot_tester.ipynb and evaluates the quality of each chatbot response.

## Features:
- Load and edit Quality Agent configuration settings
- Import chatbot conversation results from chatbot_tester.ipynb
- Process each conversation turn through the Quality Agent
- Generate quality assessments, scores, and improvement suggestions
- Analyze quality patterns across different customer types and scenarios
- Export enhanced results with quality metrics

## Getting Started:
1. Run cells in order from top to bottom
2. Edit configuration values as needed
3. Load chatbot conversation results from chatbot_tester exports
4. Review conversations before quality assessment
5. Run quality analysis and review detailed results

## Input Requirements:
Load conversation results exported from chatbot_tester.ipynb containing:
- Customer queries and chatbot responses
- Conversation metadata and turn-by-turn interactions
- Customer types, complexity levels, and satisfaction scores

In [1]:
# Import required libraries
import json
import os
import sys
from datetime import datetime
from pathlib import Path
import uuid

import ipywidgets as widgets
import pandas as pd
from ruamel.yaml import YAML
from IPython.display import display, HTML, clear_output

# Set the working directory to the root of the project
os.chdir('/workspace')

# Add workspace to path for imports
sys.path.insert(0, '/workspace')

# Import our system components
from src.nodes.quality_agent import QualityAgentNode
from src.core.config import ConfigManager
from src.core.context_manager import SQLiteContextProvider
from src.interfaces.core.state_schema import HybridSystemState

print("✅ All libraries imported successfully!")
print("Ready to start testing the Quality Agent.")

✅ All libraries imported successfully!
Ready to start testing the Quality Agent.


## Step 1: Load Configuration Settings

The following cell loads the current configuration for the Quality Agent.
You can edit these values to customize the agent's quality assessment behavior.

In [2]:
# Load configuration from files and create temporary editable copies
config_base_path = Path('/workspace/config')
agent_config_path = config_base_path / 'agents' / 'quality_agent'
temp_config_dir = Path('/tmp/quality_agent_configs')

def load_and_create_temp_configs():
    """Load all configuration files and create temporary editable copies with comments preserved"""
    configs = {}
    
    # Create YAML instance for comment preservation
    yaml = YAML()
    yaml.preserve_quotes = True
    yaml.default_flow_style = False
    
    # Create temp directory
    temp_config_dir.mkdir(exist_ok=True)
    
    # Load original files for parsing (to access data)
    with open(agent_config_path / 'config.yaml', 'r') as f:
        configs['agent'] = yaml.load(f)
    
    with open(agent_config_path / 'prompts.yaml', 'r') as f:
        configs['prompts'] = yaml.load(f)
    
    with open(agent_config_path / 'models.yaml', 'r') as f:
        configs['models'] = yaml.load(f)
    
    # Load shared models for reference
    with open(config_base_path / 'shared' / 'models.yaml', 'r') as f:
        configs['shared_models'] = yaml.load(f)
    
    # Create temp file paths
    temp_agent_path = temp_config_dir / 'config.yaml'
    temp_prompts_path = temp_config_dir / 'prompts.yaml'
    temp_models_path = temp_config_dir / 'models.yaml'
    
    # Copy original files to temp directory to preserve comments and formatting
    import shutil
    shutil.copy2(agent_config_path / 'config.yaml', temp_agent_path)
    shutil.copy2(agent_config_path / 'prompts.yaml', temp_prompts_path)  
    shutil.copy2(agent_config_path / 'models.yaml', temp_models_path)
    
    return configs, {
        'agent_config': temp_agent_path,
        'prompts_config': temp_prompts_path,
        'models_config': temp_models_path
    }

# Load configurations and create temp files
configs, temp_file_paths = load_and_create_temp_configs()

print("📁 Configuration files loaded and temporary copies created with comments preserved!")
print(f"Agent name: {configs['agent']['agent']['name']}")
print(f"Agent version: {configs['agent']['agent']['version']}")

# Get preferred model from models config
preferred_model = "Unknown"
if 'primary_model' in configs['models']:
    preferred_model = configs['models']['primary_model']
elif 'preferred' in configs['models']:
    preferred_model = configs['models']['preferred']

print(f"Preferred model: {preferred_model}")

# Display key quality thresholds
quality_thresholds = configs['agent']['settings']['quality_thresholds']
print(f"\n🎯 Quality Assessment Thresholds:")
print(f"  Adequate score: {quality_thresholds['adequate_score']}")
print(f"  Adjustment needed: {quality_thresholds['adjustment_score']}")

print(f"\n💾 Temporary config files created at:")
for config_type, path in temp_file_paths.items():
    print(f"  {config_type}: {path}")
print(f"\n💡 These temp files retain original comments and can be edited directly in Step 2.")

📁 Configuration files loaded and temporary copies created with comments preserved!
Agent name: quality_agent
Agent version: 1.0.0
Preferred model: anthropic_general_standard

🎯 Quality Assessment Thresholds:
  Adequate score: 7.0
  Adjustment needed: 5.0

💾 Temporary config files created at:
  agent_config: /tmp/quality_agent_configs/config.yaml
  prompts_config: /tmp/quality_agent_configs/prompts.yaml
  models_config: /tmp/quality_agent_configs/models.yaml

💡 These temp files retain original comments and can be edited directly in Step 2.


## Step 2: Editable Configuration Settings

Edit these settings to customize how the Quality Agent assesses chatbot responses.
These variables map directly to the configuration files and can be exported later.

In [3]:
# Display and edit configuration files in separate windows

def load_config_file_contents():
    """Load current config file contents from temp files"""
    with open(temp_file_paths['agent_config'], 'r') as f:
        agent_config_content = f.read()
    with open(temp_file_paths['prompts_config'], 'r') as f:
        prompts_config_content = f.read()
    with open(temp_file_paths['models_config'], 'r') as f:
        models_config_content = f.read()
    
    return agent_config_content, prompts_config_content, models_config_content

# Load current config file contents
agent_config_content, prompts_config_content, models_config_content = load_config_file_contents()

print("⚙️ Quality Agent Configuration File Editor")
print("Edit the YAML configuration files below and use the Save buttons to apply changes.")
print("Changes are saved to temporary files and will be used in quality assessment.\n")

# Create text areas for each config file
print("📄 1. Quality Agent Configuration (config.yaml)")
print("Contains: quality thresholds, assessment settings, escalation rules")

agent_config_editor = widgets.Textarea(
    value=agent_config_content,
    description="",
    layout=widgets.Layout(width='100%', height='300px'),
    style={'description_width': '0px'}
)

def save_agent_config(button):
    """Save agent config changes with comments preserved"""
    try:
        yaml = YAML()
        yaml.preserve_quotes = True
        yaml.default_flow_style = False
        
        # Validate YAML syntax
        yaml.load(agent_config_editor.value)
        
        # Save to temp file (preserves comments in the editor content)
        with open(temp_file_paths['agent_config'], 'w') as f:
            f.write(agent_config_editor.value)
        
        print("✅ Quality Agent config saved successfully with comments preserved!")
        
    except Exception as e:
        print(f"❌ YAML syntax error in agent config: {e}")

agent_save_btn = widgets.Button(description="Save Agent Config", button_style='success')
agent_save_btn.on_click(save_agent_config)

display(agent_config_editor)
display(agent_save_btn)

print("\n📄 2. Quality Assessment Prompts Configuration (prompts.yaml)")
print("Contains: system prompts, quality assessment criteria, adjustment templates")

prompts_config_editor = widgets.Textarea(
    value=prompts_config_content,
    description="",
    layout=widgets.Layout(width='100%', height='300px'),
    style={'description_width': '0px'}
)

def save_prompts_config(button):
    """Save prompts config changes with comments preserved"""
    try:
        yaml = YAML()
        yaml.preserve_quotes = True
        yaml.default_flow_style = False
        
        # Validate YAML syntax
        yaml.load(prompts_config_editor.value)
        
        # Save to temp file (preserves comments in the editor content)
        with open(temp_file_paths['prompts_config'], 'w') as f:
            f.write(prompts_config_editor.value)
        
        print("✅ Prompts config saved successfully with comments preserved!")
        
    except Exception as e:
        print(f"❌ YAML syntax error in prompts config: {e}")

prompts_save_btn = widgets.Button(description="Save Prompts Config", button_style='success')
prompts_save_btn.on_click(save_prompts_config)

display(prompts_config_editor)
display(prompts_save_btn)

print("\n📄 3. Quality Assessment Models Configuration (models.yaml)")
print("Contains: preferred models for quality assessment, response adjustment")

# Show available model aliases from shared models config
def display_available_models():
    """Display available model aliases and their actual models"""
    try:
        shared_models = configs['shared_models']
        
        # Extract model aliases and models sections
        model_aliases = shared_models.get('model_aliases', {})
        models = shared_models.get('models', {})
        
        if not model_aliases:
            print("❌ No model aliases found in shared configuration")
            return
            
        print("\n🔍 Available Model Aliases for Quality Assessment:")
        print("Use these aliases in your models configuration below:\n")
        
        # Group by provider for better organization
        providers = {}
        for alias, actual_model_name in model_aliases.items():
            # Get model details from models section
            model_details = models.get(actual_model_name, {})
            provider = model_details.get('type', 'unknown')
            description = model_details.get('description', '')
            
            if provider not in providers:
                providers[provider] = []
            providers[provider].append({
                'alias': alias,
                'model_name': actual_model_name,
                'description': description
            })
        
        # Display by provider with quality assessment recommendations
        for provider, provider_models in providers.items():
            print(f"📡 {provider.upper()} Provider:")
            for model in provider_models:
                desc = f" - {model['description']}" if model['description'] else ""
                print(f"  • {model['alias']} → {model['model_name']}{desc}")
            print()
        
        # Show current configuration
        try:
            yaml = YAML()
            yaml.preserve_quotes = True
            current_models_config = yaml.load(models_config_content)
            current_preferred = current_models_config.get('primary_model', current_models_config.get('preferred', 'unknown'))
            
            # Show model preferences for different tasks
            model_preferences = current_models_config.get('model_preferences', {})
        except:
            current_preferred = 'unknown'
            model_preferences = {}
        
        print(f"📋 Current Models Configuration:")
        print(f"  Primary model: {current_preferred}")
        if model_preferences:
            print(f"  Task-specific preferences:")
            for task, prefs in model_preferences.items():
                print(f"    {task}: {prefs.get('primary', 'unknown')}")
        print()
        
        print("💡 Quality Assessment Model Recommendations:")
        print("  • Lower temperature models (0.1-0.3) for consistent quality scoring")
        print("  • Reasoning models for complex quality analysis")
        print("  • Standard models sufficient for basic quality checks")
        print("  • Budget models acceptable for rule-based assessments")
        print()
        
    except Exception as e:
        print(f"❌ Error loading available models: {e}")
        print("Continuing with models configuration editor...\n")

# Display available models before showing the editor
display_available_models()

models_config_editor = widgets.Textarea(
    value=models_config_content,
    description="",
    layout=widgets.Layout(width='100%', height='200px'),
    style={'description_width': '0px'}
)

def save_models_config(button):
    """Save models config changes with comments preserved"""
    try:
        yaml = YAML()
        yaml.preserve_quotes = True
        yaml.default_flow_style = False
        
        # Validate YAML syntax
        parsed_config = yaml.load(models_config_editor.value)
        
        # Additional validation for model aliases
        if isinstance(parsed_config, dict):
            preferred = parsed_config.get('primary_model') or parsed_config.get('preferred')
            
            # Get available aliases
            model_aliases = configs['shared_models'].get('model_aliases', {})
            
            # Check if preferred model exists
            if preferred and preferred not in model_aliases:
                print(f"⚠️ Warning: Preferred model '{preferred}' not found in available model aliases")
            
            # Check model preferences
            model_preferences = parsed_config.get('model_preferences', {})
            for task, prefs in model_preferences.items():
                task_primary = prefs.get('primary')
                if task_primary and task_primary not in model_aliases:
                    print(f"⚠️ Warning: {task} primary model '{task_primary}' not found in available model aliases")
        
        # Save to temp file (preserves comments in the editor content)
        with open(temp_file_paths['models_config'], 'w') as f:
            f.write(models_config_editor.value)
        
        print("✅ Models config saved successfully with comments preserved!")
        
    except Exception as e:
        print(f"❌ YAML syntax error in models config: {e}")

models_save_btn = widgets.Button(description="Save Models Config", button_style='success')
models_save_btn.on_click(save_models_config)

display(models_config_editor)
display(models_save_btn)

# Save All button for convenience
def save_all_configs(button):
    """Save all config changes at once"""
    save_agent_config(None)
    save_prompts_config(None)
    save_models_config(None)

print("\n💾 Save All Changes")
save_all_btn = widgets.Button(description="Save All Configs", button_style='info')
save_all_btn.on_click(save_all_configs)
display(save_all_btn)

print(f"\n💾 Temp config files location:")
for config_type, path in temp_file_paths.items():
    print(f"  {config_type}: {path}")

print("\n💡 Quality Assessment Tips:")
print("  • Lower adequate_score = more responses marked as adequate")
print("  • Higher adjustment_score = more attempts at response improvement")
print("  • Lower temperature = more consistent quality scoring")
print("  • Edit quality assessment prompts to customize evaluation criteria")
print("  • Model preferences allow different models for different quality tasks")

⚙️ Quality Agent Configuration File Editor
Edit the YAML configuration files below and use the Save buttons to apply changes.
Changes are saved to temporary files and will be used in quality assessment.

📄 1. Quality Agent Configuration (config.yaml)
Contains: quality thresholds, assessment settings, escalation rules


Textarea(value='# Quality Agent Configuration\n# Responsibility: Review chatbot answers and decide if adequate…

Button(button_style='success', description='Save Agent Config', style=ButtonStyle())


📄 2. Quality Assessment Prompts Configuration (prompts.yaml)
Contains: system prompts, quality assessment criteria, adjustment templates


Textarea(value='# Quality Agent Prompts\n\nsystem: |\n  You are a Quality Assessment Agent responsible for eva…

Button(button_style='success', description='Save Prompts Config', style=ButtonStyle())


📄 3. Quality Assessment Models Configuration (models.yaml)
Contains: preferred models for quality assessment, response adjustment

🔍 Available Model Aliases for Quality Assessment:
Use these aliases in your models configuration below:

📡 ANTHROPIC Provider:
  • anthropic_general_budget → claude-3-5-haiku-20241022 - Anthropic Claude 3.5 Haiku - fast and efficient
  • anthropic_general_standard → claude-3-5-sonnet-20241022 - Anthropic Claude 3.5 Sonnet - balanced performance and reasoning
  • anthropic_reasoning_premium → claude-3-5-sonnet-20241022 - Anthropic Claude 3.5 Sonnet - balanced performance and reasoning
  • anthropic_coding_premium → claude-3-5-sonnet-20241022 - Anthropic Claude 3.5 Sonnet - balanced performance and reasoning
  • anthropic_flagship → claude-3-5-sonnet-20241022 - Anthropic Claude 3.5 Sonnet - balanced performance and reasoning

📡 OPENAI Provider:
  • openai_general_standard → gpt-4 - OpenAI GPT-4 - highest quality, requires API key
  • openai_general_budget → 

Textarea(value='primary_model: "anthropic_general_standard"\n\nmodel_preferences:\n  quality_assessment:\n    …

Button(button_style='success', description='Save Models Config', style=ButtonStyle())


💾 Save All Changes


Button(button_style='info', description='Save All Configs', style=ButtonStyle())


💾 Temp config files location:
  agent_config: /tmp/quality_agent_configs/config.yaml
  prompts_config: /tmp/quality_agent_configs/prompts.yaml
  models_config: /tmp/quality_agent_configs/models.yaml

💡 Quality Assessment Tips:
  • Lower adequate_score = more responses marked as adequate
  • Higher adjustment_score = more attempts at response improvement
  • Lower temperature = more consistent quality scoring
  • Edit quality assessment prompts to customize evaluation criteria
  • Model preferences allow different models for different quality tasks


## Step 3: Load Chatbot Conversation Results

Load conversation results from chatbot_tester.ipynb exports for quality assessment.

In [4]:
# File upload widget for loading conversation results
file_upload = widgets.FileUpload(
    accept='.json',
    multiple=False,
    description='Upload conversation results:'
)

# Instructions for file format
print("📝 Load Chatbot Conversation Results")
print("\n💡 How to get conversation data:")
print("1. Run chatbot_tester.ipynb and export conversation results")
print("2. Upload the exported JSON file below")
print("3. The file should contain conversation turns with queries and responses")
print("\n📄 Expected JSON format:")
print("- Array of conversations with conversation_history containing:")
print("- Each turn: {'customer_query': '...', 'chatbot_response': '...', 'turn_number': N}")
print("- Metadata: customer_type, complexity, conversation details")
print("\n📁 Upload your conversation results file:")
display(file_upload)

📝 Load Chatbot Conversation Results

💡 How to get conversation data:
1. Run chatbot_tester.ipynb and export conversation results
2. Upload the exported JSON file below
3. The file should contain conversation turns with queries and responses

📄 Expected JSON format:
- Array of conversations with conversation_history containing:
- Each turn: {'customer_query': '...', 'chatbot_response': '...', 'turn_number': N}
- Metadata: customer_type, complexity, conversation details

📁 Upload your conversation results file:


FileUpload(value=(), accept='.json', description='Upload conversation results:')

In [5]:
# Load conversation results from uploaded file
conversation_data = []

def load_conversations_from_file(file_content, filename):
    """Load conversations from uploaded JSON file"""
    try:
        # Handle different content types
        if isinstance(file_content, memoryview):
            content_bytes = file_content.tobytes()
        elif hasattr(file_content, 'decode'):
            content_bytes = file_content
        else:
            content_bytes = str(file_content).encode('utf-8')
        
        # Decode to string and parse JSON
        data = json.loads(content_bytes.decode('utf-8'))
        
        # Handle different JSON formats
        if isinstance(data, dict):
            # Check for different export formats from chatbot_tester
            if 'conversation_results' in data:  # This is the correct key!
                conversations = data['conversation_results']
            elif 'conversations' in data:
                conversations = data['conversations']
            elif 'results' in data:
                conversations = data['results']
            else:
                # Single conversation object
                conversations = [data]
        elif isinstance(data, list):
            conversations = data
        else:
            print(f"❌ Unexpected data format: {type(data)}")
            return []
            
        return conversations
            
    except json.JSONDecodeError as e:
        print(f"❌ JSON parsing error: {e}")
        return []
    except Exception as e:
        print(f"❌ Error loading file: {e}")
        return []

def extract_conversation_turns(conversations):
    """Extract individual turns from conversations for quality assessment"""
    turns = []
    
    for conv_idx, conversation in enumerate(conversations):
        conv_id = conversation.get('id', conv_idx + 1)
        customer_type = conversation.get('customer_type', 'unknown')
        complexity = conversation.get('complexity', 'medium')
        
        # Get conversation history
        history = conversation.get('conversation_history', [])
        
        # If no conversation_history, try to extract from other fields
        if not history:
            # Check for single turn conversation
            query = conversation.get('query') or conversation.get('customer_query') or conversation.get('original_question')
            response = conversation.get('ai_response') or conversation.get('chatbot_response') or conversation.get('response')
            
            if query and response:
                history = [{
                    'turn_number': 1,
                    'customer_query': query,
                    'chatbot_response': response
                }]
        
        # Extract turns from conversation history
        for turn_idx, turn in enumerate(history):
            turn_data = {
                'conversation_id': conv_id,
                'turn_number': turn.get('turn_number', turn_idx + 1),
                'customer_type': customer_type,
                'complexity': complexity,
                'customer_query': turn.get('customer_query', ''),
                'chatbot_response': turn.get('chatbot_response', ''),
                'original_customer_satisfaction': turn.get('customer_satisfaction'),
                'conversation_metadata': {
                    'total_turns': len(history),
                    'final_outcome': conversation.get('final_outcome'),
                    'original_question': conversation.get('original_question', ''),
                }
            }
            turns.append(turn_data)
    
    return turns

# Process uploaded file
if file_upload.value:
    uploaded_file = None
    filename = None
    file_content = None
    
    # Handle different file upload widget formats
    if isinstance(file_upload.value, tuple) and len(file_upload.value) > 0:
        uploaded_file = file_upload.value[0]
        filename = uploaded_file['name']
        file_content = uploaded_file['content']
    elif isinstance(file_upload.value, dict) and len(file_upload.value) > 0:
        uploaded_file = list(file_upload.value.values())[0]
        filename = uploaded_file['metadata']['name']
        file_content = uploaded_file['content']
    else:
        print(f"❌ Unable to read uploaded file format")
    
    if uploaded_file and filename and file_content is not None:
        print(f"📁 Loading conversations from: {filename}")
        
        raw_conversations = load_conversations_from_file(file_content, filename)
        
        if raw_conversations:
            conversation_data = extract_conversation_turns(raw_conversations)
            print(f"✅ Loaded {len(conversation_data)} conversation turns from {len(raw_conversations)} conversations")
            
            # Display conversation statistics only if we have data
            if conversation_data:
                df_preview = pd.DataFrame(conversation_data)
                print(f"\n📊 Conversation Turn Distribution:")
                print(f"  Total turns: {len(conversation_data)}")
                print(f"  Unique conversations: {df_preview['conversation_id'].nunique()}")
                print(f"  Customer types: {dict(df_preview['customer_type'].value_counts())}")
                print(f"  Complexity levels: {dict(df_preview['complexity'].value_counts())}")
                
                # Show sample turns
                print(f"\n📋 Sample Conversation Turns:")
                for i, turn in enumerate(conversation_data[:3]):
                    query_preview = turn['customer_query'][:80] + "..." if len(turn['customer_query']) > 80 else turn['customer_query']
                    response_preview = turn['chatbot_response'][:80] + "..." if len(turn['chatbot_response']) > 80 else turn['chatbot_response']
                    print(f"  Turn {turn['conversation_id']}.{turn['turn_number']} [{turn['customer_type']}]:")
                    print(f"    Query: {query_preview}")
                    print(f"    Response: {response_preview}")
                if len(conversation_data) > 3:
                    print(f"  ... and {len(conversation_data) - 3} more turns")
            else:
                print("⚠️ No conversation turns found in the loaded data")
                
        else:
            print("❌ No conversations loaded from file")
    else:
        print("❌ Error accessing uploaded file")
else:
    print("⚠️ Please upload a JSON file with conversation results from chatbot_tester.ipynb.")

📁 Loading conversations from: chatbot_agent_output.json
✅ Loaded 66 conversation turns from 20 conversations

📊 Conversation Turn Distribution:
  Total turns: 66
  Unique conversations: 20
  Customer types: {'confused': np.int64(25), 'urgent': np.int64(18), 'normal': np.int64(12), 'frustrated': np.int64(11)}
  Complexity levels: {'simple': np.int64(24), 'medium': np.int64(21), 'complex': np.int64(21)}

📋 Sample Conversation Turns:
  Turn 1.1 [frustrated]:
    Query: Why did my premium increase by $200? This is ridiculous - I haven't had any clai...
    Response: 
- Maintain a warm and friendly tone throughout.
  Turn 1.2 [frustrated]:
    Query: "I can't believe you are giving me this runaround! I demand to know why my premi...
    Response: 
- Use warm, professional language.
  Turn 1.3 [frustrated]:
    Query: This isn't working. I need to speak to someone who can actually help me!
    Response: 
- Use warm, friendly language

RESPONSE: 
Thank you for reaching out to us. I a...
  ...

## Step 4: Initialize Quality Agent

Create the Quality Agent instance with the configured settings.

In [6]:
# Initialize Quality Agent with temporary configuration
def initialize_quality_agent():
    """Initialize Quality Agent using temporary configuration files"""
    try:
        # Create a proper config structure for the temp config
        # We need to set up the directory structure that AgentConfigManager expects
        temp_config_root = temp_config_dir.parent / 'quality_agent_test_config'
        temp_config_root.mkdir(exist_ok=True)
        
        # Create the expected directory structure
        temp_shared_dir = temp_config_root / 'shared'
        temp_agents_dir = temp_config_root / 'agents' / 'quality_agent'
        temp_shared_dir.mkdir(parents=True, exist_ok=True)
        temp_agents_dir.mkdir(parents=True, exist_ok=True)
        
        # Copy shared config files from the main config
        import shutil
        main_shared_dir = Path('/workspace/config/shared')
        if main_shared_dir.exists():
            for config_file in main_shared_dir.glob('*.yaml'):
                shutil.copy2(config_file, temp_shared_dir)
        
        # Copy our temporary agent config files to the expected location
        shutil.copy2(temp_file_paths['agent_config'], temp_agents_dir / 'config.yaml')
        shutil.copy2(temp_file_paths['prompts_config'], temp_agents_dir / 'prompts.yaml')
        shutil.copy2(temp_file_paths['models_config'], temp_agents_dir / 'models.yaml')
        
        # Create ConfigManager pointing to temp config root
        config_manager = ConfigManager(config_dir=temp_config_root)
        
        # Initialize context provider (using in-memory for testing)
        from src.core.context_manager import SQLiteContextProvider
        context_provider = SQLiteContextProvider(db_path=":memory:")
        
        # Create Quality Agent
        quality_agent = QualityAgentNode(
            config_manager=config_manager,
            context_provider=context_provider
        )
        
        print("✅ Quality Agent initialized successfully!")
        print(f"  Agent name: {quality_agent.agent_config.name if quality_agent.agent_config else 'Unknown'}")
        print(f"  Agent version: {quality_agent.agent_config.version if quality_agent.agent_config else 'Unknown'}")
        print(f"  LLM provider: {quality_agent.llm_provider.model_name if quality_agent.llm_provider else 'None'}")
        
        return quality_agent, context_provider
        
    except Exception as e:
        print(f"❌ Error initializing Quality Agent: {e}")
        import traceback
        traceback.print_exc()
        return None, None

quality_agent = None
context_provider = None

print("🔧 Initializing Quality Agent with configured settings...")

# Initialize the quality agent
quality_agent, context_provider = initialize_quality_agent()

if quality_agent:
    # Display agent configuration
    if quality_agent.agent_config:
        agent_info = quality_agent.agent_config.__dict__
        settings = quality_agent.agent_config.settings
        print(f"\n📋 Quality Agent Configuration:")
        print(f"  Name: {agent_info.get('name', 'Unknown')}")
        print(f"  Description: {agent_info.get('description', 'Unknown')}")
        print(f"  Version: {agent_info.get('version', 'Unknown')}")
        
        print(f"  Quality thresholds:")
        quality_thresholds = settings.get('quality_thresholds', {})
        for threshold_name, threshold_value in quality_thresholds.items():
            print(f"    {threshold_name}: {threshold_value}")
        
        assessment_settings = settings.get('assessment', {})
        print(f"  Assessment settings:")
        print(f"    Use LLM evaluation: {assessment_settings.get('use_llm_evaluation', True)}")
        print(f"    Confidence threshold: {assessment_settings.get('confidence_threshold', 0.7)}")
        print(f"    Context weight: {assessment_settings.get('context_weight', 0.3)}")
    
    print(f"\n🚀 Quality Agent is ready for conversation assessment!")
else:
    print(f"\n❌ Quality Agent initialization failed. Please check configuration and try again.")

🔧 Initializing Quality Agent with configured settings...
✅ 20:14:43.466 [INFO    ] context_manager | SQLite context provider initialized | operation=__init__
✅ 20:14:43.494 [INFO    ] factory         | Attempting to create provider: anthropic_general_standard
✅ Creating LLM provider: anthropic_general_standard → claude-3-5-sonnet-20241022 (anthropic)
✅ 20:14:43.498 [INFO    ] claude-3-5-sonnet-20241022 | Initializing LLM provider | model_name=claude-3-5-sonnet-20241022
✅ 20:14:43.504 [INFO    ] claude-3-5-sonnet-20241022 | LLM provider initialized successfully | model_name=claude-3-5-sonnet-20241022
✅ 20:14:43.505 [INFO    ] factory         | Successfully created provider: anthropic_general_standard
✅ 20:14:43.506 [INFO    ] quality_agent   | Quality Agent LLM provider initialized | operation=initialize_llm_provider model_name=claude-3-5-sonnet-20241022
✅ Quality Agent initialized successfully!
  Agent name: quality_agent
  Agent version: 1.0.0
  LLM provider: claude-3-5-sonnet-2024102

## Step 5: Run Quality Assessment

Process each conversation turn through the Quality Agent to generate quality scores and assessments.

In [7]:
# Run quality assessment on all conversation turns
quality_results = []

def create_state_for_turn(turn_data):
    """Create HybridSystemState for a conversation turn"""
    return HybridSystemState({
        "query_id": f"quality_test_{turn_data['conversation_id']}_{turn_data['turn_number']}",
        "user_id": f"user_{turn_data['conversation_id']}",
        "session_id": f"session_{turn_data['conversation_id']}",
        "timestamp": datetime.now(),
        "query": turn_data['customer_query'],
        "ai_response": turn_data['chatbot_response'],
        "customer_type": turn_data['customer_type'],
        "complexity": turn_data['complexity'],
        "conversation_metadata": turn_data['conversation_metadata']
    })

def assess_conversation_quality(conversation_turns):
    """Run quality assessment on conversation turns"""
    if not quality_agent:
        print("❌ Quality Agent not initialized. Please run the previous step.")
        return []
    
    if not conversation_turns:
        print("❌ No conversation turns to assess. Please load conversation data first.")
        return []
    
    print(f"🔍 Running quality assessment on {len(conversation_turns)} conversation turns...")
    
    assessed_turns = []
    
    for i, turn in enumerate(conversation_turns):
        try:
            # Create state for this turn
            state = create_state_for_turn(turn)
            
            print(f"\rProcessing turn {i+1}/{len(conversation_turns)} (Conv {turn['conversation_id']}.{turn['turn_number']})...", end='', flush=True)
            
            # Run quality assessment
            result_state = quality_agent(state)
            
            # Extract quality assessment results
            quality_assessment = result_state.get('quality_assessment', {})
            next_action = result_state.get('next_action', 'unknown')
            
            # Create enhanced turn data with quality metrics
            enhanced_turn = {
                **turn,  # Original turn data
                'quality_assessment': quality_assessment,
                'next_action': next_action,
                'quality_score': quality_assessment.get('overall_score', 0.0),
                'quality_decision': quality_assessment.get('decision', 'unknown'),
                'quality_confidence': quality_assessment.get('confidence', 0.0),
                'quality_reasoning': quality_assessment.get('reasoning', ''),
                'adjustment_needed': quality_assessment.get('adjustment_needed', False),
                'adjusted_response': quality_assessment.get('adjusted_response'),
                'assessment_timestamp': datetime.now().isoformat()
            }
            
            assessed_turns.append(enhanced_turn)
            
        except Exception as e:
            print(f"\n❌ Error assessing turn {turn['conversation_id']}.{turn['turn_number']}: {e}")
            # Add turn with error information
            error_turn = {
                **turn,
                'quality_assessment': {'error': str(e)},
                'quality_score': 0.0,
                'quality_decision': 'error',
                'quality_confidence': 0.0,
                'quality_reasoning': f'Assessment error: {str(e)}',
                'adjustment_needed': False,
                'assessment_timestamp': datetime.now().isoformat()
            }
            assessed_turns.append(error_turn)
    
    print(f"\n✅ Quality assessment completed for {len(assessed_turns)} turns!")
    return assessed_turns

# Run assessment if we have data and agent
if conversation_data and quality_agent:
    print("🚀 Starting quality assessment process...")
    quality_results = assess_conversation_quality(conversation_data)
    
    if quality_results:
        # Display summary statistics
        df_results = pd.DataFrame(quality_results)
        
        print(f"\n📊 Quality Assessment Summary:")
        print(f"  Total turns assessed: {len(quality_results)}")
        print(f"  Average quality score: {df_results['quality_score'].mean():.2f}")
        print(f"  Quality score range: {df_results['quality_score'].min():.2f} - {df_results['quality_score'].max():.2f}")
        
        # Decision distribution
        decision_counts = df_results['quality_decision'].value_counts()
        print(f"\n📋 Quality Decisions:")
        for decision, count in decision_counts.items():
            percentage = count / len(quality_results) * 100
            print(f"  {decision}: {count} ({percentage:.1f}%)")
        
        # Show sample assessments
        print(f"\n📝 Sample Quality Assessments:")
        for i, result in enumerate(quality_results[:3]):
            print(f"\n  Turn {result['conversation_id']}.{result['turn_number']} [{result['customer_type']}]:")
            print(f"    Quality Score: {result['quality_score']:.2f}/10.0")
            print(f"    Decision: {result['quality_decision']}")
            print(f"    Confidence: {result['quality_confidence']:.2f}")
            reasoning = result['quality_reasoning'][:150] + "..." if len(result['quality_reasoning']) > 150 else result['quality_reasoning']
            print(f"    Reasoning: {reasoning}")
            if result['adjustment_needed'] and result.get('adjusted_response'):
                adjusted = result['adjusted_response'][:100] + "..." if len(result['adjusted_response']) > 100 else result['adjusted_response']
                print(f"    Adjusted Response: {adjusted}")
        
        if len(quality_results) > 3:
            print(f"  ... and {len(quality_results) - 3} more assessments")
            
        print(f"\n✅ Quality assessment data ready for analysis and export!")
    
elif not conversation_data:
    print("⚠️ No conversation data loaded. Please upload conversation results in Step 3.")
elif not quality_agent:
    print("⚠️ Quality Agent not initialized. Please run Step 4 first.")
else:
    print("⚠️ Unable to start quality assessment. Please check previous steps.")

🚀 Starting quality assessment process...
🔍 Running quality assessment on 66 conversation turns...
Processing turn 1/66 (Conv 1.1)...✅ 20:15:13.302 [INFO    ] claude-3-5-sonnet-20241022 | Model call: claude-3-5-sonnet-20241022 - generate_response | model_name=claude-3-5-sonnet-20241022 operation=generate_response
❌ 20:15:13.303 [ERROR   ] context_manager | Failed to get context summary | user_id=user_1 session_id=session_1 operation=get_context_summary
✅ 20:15:13.304 [INFO    ] quality_agent   | Quality assessment completed | operation=quality_assessment query_id=quality_test_1_1 user_id=user_1 session_id=session_1
Processing turn 2/66 (Conv 1.2)...✅ 20:15:21.454 [INFO    ] claude-3-5-sonnet-20241022 | Model call: claude-3-5-sonnet-20241022 - generate_response | model_name=claude-3-5-sonnet-20241022 operation=generate_response
❌ 20:15:21.455 [ERROR   ] context_manager | Failed to get context summary | user_id=user_1 session_id=session_1 operation=get_context_summary
✅ 20:15:21.456 [INFO

## Step 6: Analyze Quality Assessment Results

Detailed analysis of quality patterns, scores, and improvement recommendations.

In [8]:
# Detailed analysis of quality assessment results
if quality_results:
    print("📊 Quality Assessment Analysis Dashboard")
    print("=" * 60)
    
    # Create DataFrame for analysis
    df_quality = pd.DataFrame(quality_results)
    
    # Overall Quality Statistics
    print(f"\n🎯 OVERALL QUALITY STATISTICS")
    print(f"Total turns assessed: {len(df_quality)}")
    print(f"Mean quality score: {df_quality['quality_score'].mean():.2f}/10.0")
    print(f"Median quality score: {df_quality['quality_score'].median():.2f}/10.0")
    print(f"Standard deviation: {df_quality['quality_score'].std():.2f}")
    print(f"Score range: {df_quality['quality_score'].min():.2f} - {df_quality['quality_score'].max():.2f}")
    
    # Quality Decision Breakdown
    print(f"\n📋 QUALITY DECISION BREAKDOWN")
    decision_analysis = df_quality['quality_decision'].value_counts()
    for decision, count in decision_analysis.items():
        percentage = count / len(df_quality) * 100
        print(f"  {decision.replace('_', ' ').title()}: {count} ({percentage:.1f}%)")
    
    # Quality by Customer Type
    print(f"\n👥 QUALITY BY CUSTOMER TYPE")
    customer_type_analysis = df_quality.groupby('customer_type').agg({
        'quality_score': ['mean', 'std', 'count'],
        'quality_confidence': 'mean',
        'adjustment_needed': 'sum'
    }).round(2)
    
    for customer_type in df_quality['customer_type'].unique():
        type_data = df_quality[df_quality['customer_type'] == customer_type]
        avg_score = type_data['quality_score'].mean()
        avg_confidence = type_data['quality_confidence'].mean()
        adjustment_rate = type_data['adjustment_needed'].sum() / len(type_data) * 100
        print(f"  {customer_type.title()}:")
        print(f"    Average quality: {avg_score:.2f}/10.0")
        print(f"    Average confidence: {avg_confidence:.2f}")
        print(f"    Adjustment rate: {adjustment_rate:.1f}%")
        print(f"    Sample size: {len(type_data)}")
    
    # Quality by Complexity
    print(f"\n🔢 QUALITY BY COMPLEXITY LEVEL")
    for complexity in df_quality['complexity'].unique():
        complexity_data = df_quality[df_quality['complexity'] == complexity]
        avg_score = complexity_data['quality_score'].mean()
        avg_confidence = complexity_data['quality_confidence'].mean()
        escalation_rate = len(complexity_data[complexity_data['quality_decision'] == 'human_intervention']) / len(complexity_data) * 100
        print(f"  {complexity.title()} Complexity:")
        print(f"    Average quality: {avg_score:.2f}/10.0")
        print(f"    Average confidence: {avg_confidence:.2f}")
        print(f"    Human escalation rate: {escalation_rate:.1f}%")
        print(f"    Sample size: {len(complexity_data)}")
    
    # Response Improvement Analysis
    print(f"\n🛠️ RESPONSE IMPROVEMENT ANALYSIS")
    adjustment_needed = df_quality['adjustment_needed'].sum()
    adjusted_responses = len([r for r in quality_results if r.get('adjusted_response')])
    print(f"Turns needing adjustment: {adjustment_needed} ({adjustment_needed/len(df_quality)*100:.1f}%)")
    print(f"Adjusted responses generated: {adjusted_responses}")
    
    # Show quality distribution
    print(f"\n📈 QUALITY SCORE DISTRIBUTION")
    quality_ranges = [
        (9.0, 10.0, "Excellent"),
        (8.0, 8.99, "Very Good"),
        (7.0, 7.99, "Good"),
        (6.0, 6.99, "Acceptable"),
        (5.0, 5.99, "Needs Improvement"),
        (0.0, 4.99, "Poor")
    ]
    
    for min_score, max_score, label in quality_ranges:
        count = len(df_quality[(df_quality['quality_score'] >= min_score) & (df_quality['quality_score'] <= max_score)])
        percentage = count / len(df_quality) * 100
        print(f"  {label} ({min_score}-{max_score}): {count} ({percentage:.1f}%)")
    
    # Confidence vs Score Analysis
    print(f"\n🎯 CONFIDENCE vs QUALITY CORRELATION")
    high_conf_high_qual = len(df_quality[(df_quality['quality_confidence'] >= 0.8) & (df_quality['quality_score'] >= 8.0)])
    low_conf_low_qual = len(df_quality[(df_quality['quality_confidence'] <= 0.5) & (df_quality['quality_score'] <= 5.0)])
    print(f"High confidence + High quality: {high_conf_high_qual} ({high_conf_high_qual/len(df_quality)*100:.1f}%)")
    print(f"Low confidence + Low quality: {low_conf_low_qual} ({low_conf_low_qual/len(df_quality)*100:.1f}%)")
    
    # Show detailed examples of each quality decision type
    print(f"\n📝 DETAILED EXAMPLES BY QUALITY DECISION")
    
    # Adequate responses
    adequate_examples = df_quality[df_quality['quality_decision'] == 'adequate'].head(2)
    if len(adequate_examples) > 0:
        print(f"\n✅ ADEQUATE RESPONSES:")
        for idx, row in adequate_examples.iterrows():
            print(f"  Example {row['conversation_id']}.{row['turn_number']} (Score: {row['quality_score']:.1f}):")
            print(f"    Query: {row['customer_query'][:100]}...")
            print(f"    Response: {row['chatbot_response'][:100]}...")
            print(f"    Reasoning: {row['quality_reasoning'][:80]}...")
    
    # Needs adjustment
    adjustment_examples = df_quality[df_quality['quality_decision'] == 'needs_adjustment'].head(2)
    if len(adjustment_examples) > 0:
        print(f"\n🔧 NEEDS ADJUSTMENT:")
        for idx, row in adjustment_examples.iterrows():
            print(f"  Example {row['conversation_id']}.{row['turn_number']} (Score: {row['quality_score']:.1f}):")
            print(f"    Query: {row['customer_query'][:100]}...")
            print(f"    Original: {row['chatbot_response'][:100]}...")
            if row.get('adjusted_response'):
                print(f"    Adjusted: {row['adjusted_response'][:100]}...")
            print(f"    Reasoning: {row['quality_reasoning'][:80]}...")
    
    # Human intervention
    escalation_examples = df_quality[df_quality['quality_decision'] == 'human_intervention'].head(2)
    if len(escalation_examples) > 0:
        print(f"\n🚨 HUMAN INTERVENTION:")
        for idx, row in escalation_examples.iterrows():
            print(f"  Example {row['conversation_id']}.{row['turn_number']} (Score: {row['quality_score']:.1f}):")
            print(f"    Query: {row['customer_query'][:100]}...")
            print(f"    Response: {row['chatbot_response'][:100]}...")
            print(f"    Reasoning: {row['quality_reasoning'][:80]}...")
    
    # Show summary table
    print(f"\n📊 QUALITY ASSESSMENT SUMMARY TABLE")
    summary_df = df_quality[['conversation_id', 'turn_number', 'customer_type', 'complexity', 
                           'quality_score', 'quality_decision', 'quality_confidence', 
                           'adjustment_needed']].copy()
    summary_df['quality_score'] = summary_df['quality_score'].round(1)
    summary_df['quality_confidence'] = summary_df['quality_confidence'].round(2)
    
    display(summary_df.head(10))
    
    if len(summary_df) > 10:
        print(f"... showing first 10 of {len(summary_df)} assessed turns")
    
    print(f"\n" + "="*60)
    print(f"✅ Quality analysis complete! Use Step 7 to export detailed results.")

else:
    print("⚠️ No quality assessment results to analyze. Please run quality assessment in Step 5 first.")

📊 Quality Assessment Analysis Dashboard

🎯 OVERALL QUALITY STATISTICS
Total turns assessed: 66
Mean quality score: 7.00/10.0
Median quality score: 7.00/10.0
Standard deviation: 0.00
Score range: 7.00 - 7.00

📋 QUALITY DECISION BREAKDOWN
  Adequate: 66 (100.0%)

👥 QUALITY BY CUSTOMER TYPE
  Frustrated:
    Average quality: 7.00/10.0
    Average confidence: 0.70
    Adjustment rate: 0.0%
    Sample size: 11
  Urgent:
    Average quality: 7.00/10.0
    Average confidence: 0.70
    Adjustment rate: 0.0%
    Sample size: 18
  Confused:
    Average quality: 7.00/10.0
    Average confidence: 0.70
    Adjustment rate: 0.0%
    Sample size: 25
  Normal:
    Average quality: 7.00/10.0
    Average confidence: 0.70
    Adjustment rate: 0.0%
    Sample size: 12

🔢 QUALITY BY COMPLEXITY LEVEL
  Medium Complexity:
    Average quality: 7.00/10.0
    Average confidence: 0.70
    Human escalation rate: 0.0%
    Sample size: 21
  Simple Complexity:
    Average quality: 7.00/10.0
    Average confidence: 0

Unnamed: 0,conversation_id,turn_number,customer_type,complexity,quality_score,quality_decision,quality_confidence,adjustment_needed
0,1,1,frustrated,medium,7.0,adequate,0.7,False
1,1,2,frustrated,medium,7.0,adequate,0.7,False
2,1,3,frustrated,medium,7.0,adequate,0.7,False
3,2,1,urgent,simple,7.0,adequate,0.7,False
4,2,2,urgent,simple,7.0,adequate,0.7,False
5,2,3,urgent,simple,7.0,adequate,0.7,False
6,3,1,confused,simple,7.0,adequate,0.7,False
7,3,2,confused,simple,7.0,adequate,0.7,False
8,3,3,confused,simple,7.0,adequate,0.7,False
9,3,4,confused,simple,7.0,adequate,0.7,False


... showing first 10 of 66 assessed turns

✅ Quality analysis complete! Use Step 7 to export detailed results.


## Step 7: Export Quality Assessment Results

Save the enhanced conversation data with quality metrics to files with timestamps.

In [10]:
# Export quality assessment results and settings to files with timestamps
def export_quality_results():
    """Export quality assessment results and configuration to JSON files"""
    if not quality_results:
        print("❌ No quality assessment results to export.")
        return
    
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    output_dir = Path(f'/workspace/notebooks/agent_testers/outputs/quality_evals/{timestamp}')
    output_dir.mkdir(exist_ok=True)
    

    # Prepare export data
    export_data = {
        'metadata': {
            'export_timestamp': datetime.now().isoformat(),
            'quality_agent_version': quality_agent.agent_config.version if quality_agent and quality_agent.agent_config else '1.0.0',
            'total_turns_assessed': len(quality_results),
            'average_quality_score': sum(r['quality_score'] for r in quality_results) / len(quality_results),
            'assessment_model': quality_agent.llm_provider.model_name if quality_agent and quality_agent.llm_provider else 'unknown',
            'quality_thresholds': quality_agent.agent_config.settings.get('quality_thresholds', {}) if quality_agent and quality_agent.agent_config else {}
        },
        'quality_assessments': quality_results,
        'summary_statistics': {
            'decision_distribution': dict(pd.DataFrame(quality_results)['quality_decision'].value_counts()),
            'score_statistics': {
                'mean': sum(r['quality_score'] for r in quality_results) / len(quality_results),
                'median': sorted([r['quality_score'] for r in quality_results])[len(quality_results)//2],
                'min': min(r['quality_score'] for r in quality_results),
                'max': max(r['quality_score'] for r in quality_results)
            },
            'customer_type_analysis': {},
            'complexity_analysis': {}
        }
    }
    
    # Add customer type analysis
    df_quality = pd.DataFrame(quality_results)
    for customer_type in df_quality['customer_type'].unique():
        type_data = df_quality[df_quality['customer_type'] == customer_type]
        export_data['summary_statistics']['customer_type_analysis'][customer_type] = {
            'count': len(type_data),
            'average_quality_score': type_data['quality_score'].mean(),
            'adjustment_rate': type_data['adjustment_needed'].sum() / len(type_data)
        }
    
    # Add complexity analysis
    for complexity in df_quality['complexity'].unique():
        complexity_data = df_quality[df_quality['complexity'] == complexity]
        export_data['summary_statistics']['complexity_analysis'][complexity] = {
            'count': len(complexity_data),
            'average_quality_score': complexity_data['quality_score'].mean(),
            'escalation_rate': len(complexity_data[complexity_data['quality_decision'] == 'human_intervention']) / len(complexity_data)
        }
    
    # Export main results
    results_filename = f'quality_assessment_results.json'
    results_path = output_dir / results_filename
    
    with open(results_path, 'w') as f:
        json.dump(export_data, f, indent=2, default=str)
    
    print(f"✅ Quality assessment results exported to: {results_path}")
    
    # Export configuration used
    config_export = {
        'export_timestamp': datetime.now().isoformat(),
        'agent_config': {},
        'prompts_config': {},
        'models_config': {}
    }
    
    # Read current config contents
    try:
        with open(temp_file_paths['agent_config'], 'r') as f:
            yaml = YAML()
            config_export['agent_config'] = yaml.load(f)
    except:
        pass
    
    try:
        with open(temp_file_paths['prompts_config'], 'r') as f:
            yaml = YAML()
            config_export['prompts_config'] = yaml.load(f)
    except:
        pass
    
    try:
        with open(temp_file_paths['models_config'], 'r') as f:
            yaml = YAML()
            config_export['models_config'] = yaml.load(f)
    except:
        pass
    
    config_filename = f'quality_agent_config.json'
    config_path = output_dir / config_filename
    
    with open(config_path, 'w') as f:
        json.dump(config_export, f, indent=2, default=str)
    
    print(f"✅ Configuration settings exported to: {config_path}")
    
    # Export summary CSV for easy analysis
    summary_df = pd.DataFrame(quality_results)
    csv_filename = f'quality_summary.csv'
    csv_path = output_dir / csv_filename
    
    # Select key columns for CSV
    csv_columns = ['conversation_id', 'turn_number', 'customer_type', 'complexity',
                   'quality_score', 'quality_decision', 'quality_confidence', 
                   'adjustment_needed', 'customer_query', 'chatbot_response', 
                   'quality_reasoning']
    
    summary_csv = summary_df[csv_columns].copy()
    summary_csv.to_csv(csv_path, index=False)
    
    print(f"✅ Summary CSV exported to: {csv_path}")
    
    # Generate quality report
    report_filename = f'quality_report.txt'
    report_path = output_dir / report_filename
    
    with open(report_path, 'w') as f:
        f.write("QUALITY ASSESSMENT REPORT\n")
        f.write("=" * 50 + "\n\n")
        f.write(f"Export Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"Total Turns Assessed: {len(quality_results)}\n")
        f.write(f"Average Quality Score: {export_data['metadata']['average_quality_score']:.2f}/10.0\n")
        f.write(f"Assessment Model: {export_data['metadata']['assessment_model']}\n\n")
        
        f.write("DECISION DISTRIBUTION:\n")
        for decision, count in export_data['summary_statistics']['decision_distribution'].items():
            percentage = count / len(quality_results) * 100
            f.write(f"  {decision.replace('_', ' ').title()}: {count} ({percentage:.1f}%)\n")
        f.write("\n")
        
        f.write("CUSTOMER TYPE ANALYSIS:\n")
        for customer_type, stats in export_data['summary_statistics']['customer_type_analysis'].items():
            f.write(f"  {customer_type.title()}:\n")
            f.write(f"    Sample Size: {stats['count']}\n")
            f.write(f"    Average Quality: {stats['average_quality_score']:.2f}/10.0\n")
            f.write(f"    Adjustment Rate: {stats['adjustment_rate']*100:.1f}%\n")
        f.write("\n")
        
        f.write("COMPLEXITY ANALYSIS:\n")
        for complexity, stats in export_data['summary_statistics']['complexity_analysis'].items():
            f.write(f"  {complexity.title()} Complexity:\n")
            f.write(f"    Sample Size: {stats['count']}\n")
            f.write(f"    Average Quality: {stats['average_quality_score']:.2f}/10.0\n")
            f.write(f"    Escalation Rate: {stats['escalation_rate']*100:.1f}%\n")
    
    print(f"✅ Quality report exported to: {report_path}")
    
    return {
        'results_file': str(results_path),
        'config_file': str(config_path),
        'csv_file': str(csv_path),
        'report_file': str(report_path)
    }

# Export results if available
if quality_results:
    print("💾 Exporting Quality Assessment Results and Configuration")
    print("=" * 60)
    
    export_files = export_quality_results()
    
    if export_files:
        print(f"\n📁 All files exported successfully!")
        print(f"\n📋 Export Summary:")
        print(f"  Results JSON: {export_files['results_file']}")
        print(f"  Configuration: {export_files['config_file']}")
        print(f"  Summary CSV: {export_files['csv_file']}")
        print(f"  Quality Report: {export_files['report_file']}")
        
        print(f"\n💡 File Usage:")
        print(f"  • Results JSON: Complete data for further analysis or integration")
        print(f"  • Configuration: Settings used for quality assessment")
        print(f"  • Summary CSV: Import into Excel, Google Sheets, or data analysis tools")
        print(f"  • Quality Report: Human-readable summary for stakeholder review")
        
        print(f"\n🔄 Next Steps:")
        print(f"  • Use results to fine-tune quality thresholds")
        print(f"  • Analyze patterns to improve chatbot response quality")
        print(f"  • Compare quality scores across different configurations")
        print(f"  • Share quality report with stakeholders")
        
    else:
        print("❌ Export failed. Please check for errors above.")

else:
    print("⚠️ No quality assessment results to export. Please run quality assessment in Step 5 first.")

💾 Exporting Quality Assessment Results and Configuration
✅ Quality assessment results exported to: /workspace/notebooks/agent_testers/outputs/quality_evals/20250723_204038/quality_assessment_results.json
✅ Configuration settings exported to: /workspace/notebooks/agent_testers/outputs/quality_evals/20250723_204038/quality_agent_config.json
✅ Summary CSV exported to: /workspace/notebooks/agent_testers/outputs/quality_evals/20250723_204038/quality_summary.csv
✅ Quality report exported to: /workspace/notebooks/agent_testers/outputs/quality_evals/20250723_204038/quality_report.txt

📁 All files exported successfully!

📋 Export Summary:
  Results JSON: /workspace/notebooks/agent_testers/outputs/quality_evals/20250723_204038/quality_assessment_results.json
  Configuration: /workspace/notebooks/agent_testers/outputs/quality_evals/20250723_204038/quality_agent_config.json
  Summary CSV: /workspace/notebooks/agent_testers/outputs/quality_evals/20250723_204038/quality_summary.csv
  Quality Report: