Skip to content

Create Working Example for Evaluation Features #797

@maxtechera

Description

@maxtechera

Objective

Create a working example demonstrating the complete evaluation workflow that can be used as a template or tutorial.

Tasks

Example Design

  • Choose a realistic use case (e.g., customer support chatbot evaluation)
  • Design sample dataset with diverse test cases
  • Select appropriate evaluators for the use case
  • Define expected outcomes and success criteria

Implementation

  • Create sample dataset (10-20 rows)
  • Configure 2-3 simple evaluators (e.g., exact match, semantic similarity)
  • Configure 1-2 LLM evaluators (e.g., answer quality, helpfulness)
  • Create chatflow/agentflow to be evaluated
  • Run evaluation and capture results

Documentation

  • Document the example scenario and goals
  • Provide step-by-step instructions to recreate
  • Document expected results and how to interpret them
  • Add troubleshooting section
  • Export example as template (if possible)

Distribution

  • Add to user documentation as tutorial
  • Consider adding as marketplace template
  • Create video walkthrough (optional)

Example Scenario (Proposed)

Use Case: Customer Support Chatbot for SaaS Product

Dataset: Common customer questions

  • "How do I reset my password?"
  • "What's included in the premium plan?"
  • "How do I cancel my subscription?"
  • "Is there a mobile app?"
  • etc.

Evaluators:

  • Simple: Exact match for factual answers
  • Simple: Semantic similarity for paraphrased answers
  • LLM: Answer completeness (1-5 scale)
  • LLM: Tone appropriateness (professional, helpful)

Acceptance Criteria

  • Example covers common use case
  • All evaluation features demonstrated
  • Step-by-step instructions are clear
  • Results interpretation is explained
  • Example can be easily replicated by users

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions