Skip to content

feat: ragas evals CLI #2086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 3, 2025
Merged

Conversation

shahules786
Copy link
Member

@shahules786 shahules786 commented Jun 21, 2025

❯ ragas evals test_app/evals/app_eval.py --dataset rag_dataset --metrics accuracy,fail_or_pass
Running evaluation: test_app/evals/app_eval.py
Dataset: rag_dataset
Getting dataset: rag_dataset
✓ Loaded dataset with 30 rows
✓ Completed experiments successfully
╭────────────────────────── Ragas Evaluation Results ──────────────────────────╮
│ Experiment: vibrant_naur                                                     │
│ Dataset: rag_dataset (30 rows)                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
  Numerical Metrics
┏━━━━━━━━━━┳━━━━━━━━━┓
┃ Metric   ┃ Current ┃
┡━━━━━━━━━━╇━━━━━━━━━┩
│ accuracy │   0.933 │
└──────────┴─────────┘
         Categorical Metrics
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ Metric       ┃ Category ┃ Current ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ fail or pass │ fail     │      26 │
│              │ pass     │       4 │
└──────────────┴──────────┴─────────┘
✓ Experiment results displayed
✓ Evaluation completed successfully
❯ ragas evals test_app/evals/app_eval.py --dataset rag_dataset --metrics accuracy,fail_or_pass --baseline suspicious_babbage
Running evaluation: test_app/evals/app_eval.py
Dataset: rag_dataset
Baseline: suspicious_babbage
Getting dataset: rag_dataset
✓ Loaded dataset with 30 rows
✓ Completed experiments successfully
Comparing against baseline: suspicious_babbage
╭────────────────────────── Ragas Evaluation Results ──────────────────────────╮
│ Experiment: pedantic_mccarthy                                                │
│ Dataset: rag_dataset (30 rows)                                               │
│ Baseline: suspicious_babbage                                                 │
╰──────────────────────────────────────────────────────────────────────────────╯
                Numerical Metrics
┏━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Metric   ┃ Current ┃ Baseline ┃  Delta ┃ Gate ┃
┡━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ accuracy │   0.900 │    1.000 │ ▼0.100 │ fail │
└──────────┴─────────┴──────────┴────────┴──────┘
                  Categorical Metrics
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Metric       ┃ Category ┃ Current ┃ Baseline ┃ Delta ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ fail or pass │ fail     │      26 │       25 │    ▲1 │
│              │ pass     │       4 │        5 │    ▼1 │
└──────────────┴──────────┴─────────┴──────────┴───────┘
✓ Comparison completed
✓ Evaluation completed successfully

@shahules786 shahules786 requested review from jjmachan and removed request for jjmachan June 22, 2025 21:49
@shahules786 shahules786 marked this pull request as ready for review June 22, 2025 22:13
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 22, 2025
@shahules786 shahules786 requested a review from jjmachan June 22, 2025 22:13
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Added a new CLI interface for running Ragas evaluations with an intuitive command-line experience.

  • Added ragas evals command in cli.py with support for dataset selection and metric comparison
  • Updated progress tracking from tqdm to Rich library for enhanced visual feedback in base.py and experiments.py
  • Added get_fields_by_type method in dataset.py for type-based field filtering
  • Modified metric response models in discrete.py, ranking.py, and numeric.py to improve output formatting

9 files reviewed, 4 comments
Edit PR Review Bot Settings | Greptile

Copy link
Member

@jjmachan jjmachan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good but do address the comments from greptile and mine before merging it in

@shahules786 shahules786 requested a review from jjmachan June 26, 2025 01:57
@shahules786 shahules786 merged commit 356d6bf into explodinggradients:main Jul 3, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants