Objective
Create a working example demonstrating the complete evaluation workflow that can be used as a template or tutorial.
Tasks
Example Design
Implementation
Documentation
Distribution
Example Scenario (Proposed)
Use Case: Customer Support Chatbot for SaaS Product
Dataset: Common customer questions
- "How do I reset my password?"
- "What's included in the premium plan?"
- "How do I cancel my subscription?"
- "Is there a mobile app?"
- etc.
Evaluators:
- Simple: Exact match for factual answers
- Simple: Semantic similarity for paraphrased answers
- LLM: Answer completeness (1-5 scale)
- LLM: Tone appropriateness (professional, helpful)
Acceptance Criteria
Objective
Create a working example demonstrating the complete evaluation workflow that can be used as a template or tutorial.
Tasks
Example Design
Implementation
Documentation
Distribution
Example Scenario (Proposed)
Use Case: Customer Support Chatbot for SaaS Product
Dataset: Common customer questions
Evaluators:
Acceptance Criteria