# Evaluation Workflow

This notebook outlines the evaluation process for the Persona Consistent Chatbot. It includes steps to assess the model's performance on various metrics, including persona consistency, engagement, and dialogue quality.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from src.evaluation import persona_consistency, engagement_metrics, dialogue_quality
from src.utils import logger

# Set up logging
logger.setup_logging()

# Load evaluation datasets
consistency_data = pd.read_json('data/benchmarks/consistency_test.jsonl', lines=True)
engagement_data = pd.read_json('data/benchmarks/engagement_test.jsonl', lines=True)

# Evaluate persona consistency
consistency_results = persona_consistency.evaluate(consistency_data)
logger.log('Persona consistency evaluation completed.')

# Evaluate engagement metrics
engagement_results = engagement_metrics.evaluate(engagement_data)
logger.log('Engagement metrics evaluation completed.')

# Evaluate dialogue quality
dialogue_results = dialogue_quality.evaluate(consistency_data)
logger.log('Dialogue quality evaluation completed.')

# Compile results
results_summary = {
    'Consistency': consistency_results,
    'Engagement': engagement_results,
    'Dialogue Quality': dialogue_results
}

# Save results to a CSV file
results_df = pd.DataFrame(results_summary)
results_df.to_csv('outputs/results/evaluation_summary.csv', index=False)
logger.log('Evaluation results saved to outputs/results/evaluation_summary.csv')

# Display results
results_df.head()

## Conclusion

This notebook has successfully evaluated the chatbot's performance across multiple metrics. The results have been saved for further analysis and reporting.