feat: ragas evals CLI #2086

shahules786 · 2025-06-21T05:46:21Z

❯ ragas evals test_app/evals/app_eval.py --dataset rag_dataset --metrics accuracy,fail_or_pass
Running evaluation: test_app/evals/app_eval.py
Dataset: rag_dataset
Getting dataset: rag_dataset
✓ Loaded dataset with 30 rows
✓ Completed experiments successfully
╭────────────────────────── Ragas Evaluation Results ──────────────────────────╮
│ Experiment: vibrant_naur                                                     │
│ Dataset: rag_dataset (30 rows)                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
  Numerical Metrics
┏━━━━━━━━━━┳━━━━━━━━━┓
┃ Metric   ┃ Current ┃
┡━━━━━━━━━━╇━━━━━━━━━┩
│ accuracy │   0.933 │
└──────────┴─────────┘
         Categorical Metrics
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ Metric       ┃ Category ┃ Current ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ fail or pass │ fail     │      26 │
│              │ pass     │       4 │
└──────────────┴──────────┴─────────┘
✓ Experiment results displayed
✓ Evaluation completed successfully

❯ ragas evals test_app/evals/app_eval.py --dataset rag_dataset --metrics accuracy,fail_or_pass --baseline suspicious_babbage
Running evaluation: test_app/evals/app_eval.py
Dataset: rag_dataset
Baseline: suspicious_babbage
Getting dataset: rag_dataset
✓ Loaded dataset with 30 rows
✓ Completed experiments successfully
Comparing against baseline: suspicious_babbage
╭────────────────────────── Ragas Evaluation Results ──────────────────────────╮
│ Experiment: pedantic_mccarthy                                                │
│ Dataset: rag_dataset (30 rows)                                               │
│ Baseline: suspicious_babbage                                                 │
╰──────────────────────────────────────────────────────────────────────────────╯
                Numerical Metrics
┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Metric   ┃ Current ┃ Baseline ┃  Delta ┃ Gate ┃
┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ accuracy │   0.900 │    1.000 │ ▼0.100 │ fail │
└──────────┴─────────┴──────────┴────────┴──────┘
                  Categorical Metrics
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Metric       ┃ Category ┃ Current ┃ Baseline ┃ Delta ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ fail or pass │ fail     │      26 │       25 │    ▲1 │
│              │ pass     │       4 │        5 │    ▼1 │
└──────────────┴──────────┴─────────┴──────────┴───────┘
✓ Comparison completed
✓ Evaluation completed successfully

greptile-apps

PR Summary

Added a new CLI interface for running Ragas evaluations with an intuitive command-line experience.

Added ragas evals command in cli.py with support for dataset selection and metric comparison
Updated progress tracking from tqdm to Rich library for enhanced visual feedback in base.py and experiments.py
Added get_fields_by_type method in dataset.py for type-based field filtering
Modified metric response models in discrete.py, ranking.py, and numeric.py to improve output formatting

_{9 files reviewed, 4 comments}
_{Edit PR Review Bot Settings | Greptile}

experimental/ragas_experimental/dataset.py

experimental/pyproject.toml

experimental/ragas_experimental/cli.py

experimental/ragas_experimental/project/experiments.py

jjmachan

overall looks good but do address the comments from greptile and mine before merging it in

experimental/ragas_experimental/project/experiments.py

experimental/ragas_experimental/cli.py

shahules786 added 12 commits June 18, 2025 12:29

llm as prompt as optional

b23b989

added CLI

cba6fcb

left todos

421581a

update cli to handle categorical metrics

260d686

add cli to pyproject

129eae2

remove create command

9913fdc

remove evals subcommand

89b763e

added evals subcommand

7b2ee4a

replace tqdm with rich

8d17d52

made progress rich

c261fde

added rich table

423a119

add console

a37e9b5

shahules786 requested review from jjmachan and removed request for jjmachan June 22, 2025 21:49

Merge branch 'main' into ragas-cli

f10f45a

shahules786 marked this pull request as ready for review June 22, 2025 22:13

dosubot bot added the size:XL label Jun 22, 2025

shahules786 requested a review from jjmachan June 22, 2025 22:13

greptile-apps bot reviewed Jun 23, 2025

View reviewed changes

jjmachan reviewed Jun 24, 2025

View reviewed changes

shahules786 added 4 commits June 25, 2025 18:37

removed imports

b476c13

simplify gate

90d9c7e

fix comments

983798b

refactor code

054b1a9

shahules786 requested a review from jjmachan June 26, 2025 01:57

jjmachan approved these changes Jun 27, 2025

View reviewed changes

jjmachan and others added 4 commits July 2, 2025 12:02

Merge branch 'main' into ragas-cli

39e4df4

fix an issue

be10815

add back alignment

cd716da

clear error reporting

5d6edf8

shahules786 merged commit 356d6bf into explodinggradients:main Jul 3, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: ragas evals CLI #2086

feat: ragas evals CLI #2086

Uh oh!

shahules786 commented Jun 21, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjmachan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: ragas evals CLI #2086

feat: ragas evals CLI #2086

Uh oh!

Conversation

shahules786 commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjmachan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shahules786 commented Jun 21, 2025 •

edited

Loading