Add configurable judge model support to cody-bench #7979

Rohithgilla12 · 2025-05-23T06:18:53Z

🎯 Problem
The cody-bench command currently hardcodes the LLM judge model to anthropic/claude-3-5-sonnet-20240620, limiting flexibility for users who want to experiment with different models for evaluation or reduce costs by using smaller models like Claude Haiku.
💡 Solution
This PR adds a new --judge-model CLI option that allows users to specify which model to use for LLM-as-a-judge evaluations, while maintaining backward compatibility with the existing default

🔄 Backward Compatibility
✅ No breaking changes - existing code continues to work unchanged
✅ Default behavior preserved - same model used when option not specified
✅ Constructor backward compatibility - existing LlmJudge instantiation works

📋 Validation Checklist
[x] CLI option parsing works correctly
[x] Default model behavior maintained
[x] Custom models passed through correctly
[x] Strategy integration functions properly
[x] TypeScript types are correct
[x] All tests pass
[x] Linting passes

🎯 Benefits
Cost optimization - Users can choose cheaper models like Claude Haiku for large-scale evaluations
Quality tuning - Users can select Claude Opus for highest quality judging when needed
Experimentation - Researchers can compare different models' judging capabilities
Future-proofing - Easy to add support for new models as they become available

Test plan

Added unit tests

✓ 48 tests passed | 3 skipped (51 total)
✓ All linting checks pass
✓ Build verification successful

- Add `judgeModel` option to CodyBenchOptions - Update LlmJudge to accept custom model - Modify evaluation strategies to use specified model - Add `--judge-model` CLI option

Rohithgilla12 added 2 commits May 23, 2025 11:39

feat(bench): 🧠 Add custom model support for LLM judging

2c39227

- Add `judgeModel` option to CodyBenchOptions - Update LlmJudge to accept custom model - Modify evaluation strategies to use specified model - Add `--judge-model` CLI option

test: add tests for model selection

c097761

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add configurable judge model support to cody-bench #7979

Add configurable judge model support to cody-bench #7979

Uh oh!

Rohithgilla12 commented May 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add configurable judge model support to cody-bench #7979

Are you sure you want to change the base?

Add configurable judge model support to cody-bench #7979

Uh oh!

Conversation

Rohithgilla12 commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

Uh oh!

Rohithgilla12 commented May 23, 2025 •

edited

Loading