chore: evaluators library update #130

OzBenSimhonTraceloop · 2025-12-08T19:48:12Z

Important

Updates evaluator-library.mdx with new evaluators and detailed descriptions for comprehensive AI output assessment.

Agent Evaluators:
- Added Agent Efficiency, Agent Flow Quality, Agent Goal Accuracy, Agent Goal Completeness, and Agent Tool Error Detector with detailed descriptions and implementations.
Answer Quality Evaluators:
- Added Answer Completeness, Answer Correctness, Answer Relevancy, Faithfulness, and Semantic Similarity with detailed descriptions and implementations.
Conversation Evaluators:
- Added Conversation Quality, Intent Change, Topic Adherence, Context Relevance, and Instruction Adherence with detailed descriptions and implementations.
Safety & Security Evaluators:
- Added PII Detector, Secrets Detector, Profanity Detector, Prompt Injection Detector, Toxicity Detector, and Sexism Detector with detailed descriptions and implementations.
Format Validators:
- Added JSON Validator, SQL Validator, Regex Validator, and Placeholder Regex with detailed descriptions and implementations.
Text Metrics:
- Added Word Count, Word Count Ratio, Char Count, Char Count Ratio, and Perplexity with detailed descriptions and implementations.
Specialized Evaluators:
- Added LLM as a Judge, Tone Detection, and Uncertainty with detailed descriptions and implementations.
Custom Evaluators:
- Updated section to include Custom Metric creation and input/output specifications.

^{This description was created by}^{for 3e2122b. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

Documentation
- Significantly expanded Evaluator Library with comprehensive new categories: Agent Quality, Answer Quality, Safety & Security, Formatting, Conversation, and Specialized evaluators.
- Added numerous new evaluators including Answer Completeness, Faithfulness, Semantic Similarity, Toxicity Detection, PII Detection, Secrets Detection, and Prompt Injection Detection.
- Enhanced documentation structure with clearer organization and improved implementation guidance for easier navigation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-08T19:48:31Z

Caution

Review failed

The pull request is closed.

Walkthrough

The Evaluator Library documentation undergoes comprehensive restructuring, introducing new evaluator categories—Agent Evaluators, Conversation Evaluators, Specialized Evaluators, and Formatting Evaluators—with reorganized subsections for Answer Quality, Text Metrics, and Safety & Security evaluators, including detailed descriptions, scoring mechanisms, and implementation guidance.

Changes

Cohort / File(s)	Summary
Evaluator Library Documentation Restructuring `evaluators/evaluator-library.mdx`	Significant reorganization and expansion of evaluator categories with new sections for Agent Evaluators (Efficiency, Flow Quality, Goal Accuracy, Goal Completeness, Tool Error Detector), expanded Answer Quality Evaluators (Completeness, Correctness, Relevancy, Faithfulness, Semantic Similarity), new Text Metrics section (Word Count, Char Count, Perplexity), reorganized Safety & Security Evaluators (PII, Secrets, Profanity, Toxicity, Sexism Detectors), new Formatting Evaluators (Prompt Injection, validation tools), new Conversation Evaluators (Quality, Intent Change, Topic Adherence, Context Relevance, Instruction Adherence), and new Specialized Evaluators (LLM as Judge, Tone Detection, Uncertainty). Includes detailed descriptions, scoring implementations, and usage guidance updates.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring attention:
- Verify accuracy and completeness of new evaluator descriptions and implementations
- Confirm all reorganized sections maintain consistency in formatting and terminology
- Review reorganized Safety & Security and Answer Quality categories for proper categorization and no duplicate coverage
- Check that Usage section updates align with the expanded evaluator categories

Poem

🐰✨ The library's grown with many new friends,
Agent, Conversation, Safety—the list extends!
Evaluators sorted in neat, tidy rows,
Organization blooms where structure still grows! 📚

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch obs/evaluators-implementation

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 08d39f7 and 3e2122b.

📒 Files selected for processing (1)

evaluators/evaluator-library.mdx (3 hunks)

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 3e2122b in 1 minute and 52 seconds. Click for details.

Reviewed 284 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. evaluators/evaluator-library.mdx:24

Draft comment:
Clarify evaluation criteria and thresholds for 'Agent Efficiency'; consider adding brief details on what constitutes redundant steps or tool calls.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking for clarification on evaluation criteria and thresholds, which seems to be asking the PR author to explain or provide more information. This violates the rule against asking the PR author to explain or confirm their intention. Therefore, this comment should be removed.

2. evaluators/evaluator-library.mdx:103

Draft comment:
Confirm the removal of 'Custom LLM Judge' from the Custom Evaluators section is intentional and update documentation/changelog if needed.
Reason this comment was not posted:
Comment was on unchanged code.

3. evaluators/evaluator-library.mdx:197

Draft comment:
Consider adding version details or reference links for implementation technologies (e.g., 'Custom GPT-4o prompt', 'Ragas metrics') for better traceability.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

4. evaluators/evaluator-library.mdx:1

Draft comment:
Overall structure update is clear; consider adding a summary table of evaluators for quick navigation.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_vMltL9c5obOKXLZz

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

OzBenSimhonTraceloop added 2 commits December 8, 2025 21:45

wip

909a155

wip

3e2122b

galzilber approved these changes Dec 8, 2025

View reviewed changes

mintlify bot deployed to staging December 8, 2025 19:49 View deployment

OzBenSimhonTraceloop merged commit d4eec83 into main Dec 8, 2025
3 of 4 checks passed

ellipsis-dev bot reviewed Dec 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: evaluators library update #130

chore: evaluators library update #130

Uh oh!

OzBenSimhonTraceloop commented Dec 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 8, 2025 •

edited

Loading

Review failed

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore: evaluators library update #130

chore: evaluators library update #130

Uh oh!

Conversation

OzBenSimhonTraceloop commented Dec 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

OzBenSimhonTraceloop commented Dec 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 8, 2025 •

edited

Loading