Skip to content

Add configurable judge model support to cody-bench #7979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Rohithgilla12
Copy link

@Rohithgilla12 Rohithgilla12 commented May 23, 2025

tenor-190309912

🎯 Problem
The cody-bench command currently hardcodes the LLM judge model to anthropic/claude-3-5-sonnet-20240620, limiting flexibility for users who want to experiment with different models for evaluation or reduce costs by using smaller models like Claude Haiku.
💡 Solution
This PR adds a new --judge-model CLI option that allows users to specify which model to use for LLM-as-a-judge evaluations, while maintaining backward compatibility with the existing default

🔄 Backward Compatibility
✅ No breaking changes - existing code continues to work unchanged
✅ Default behavior preserved - same model used when option not specified
✅ Constructor backward compatibility - existing LlmJudge instantiation works

📋 Validation Checklist
[x] CLI option parsing works correctly
[x] Default model behavior maintained
[x] Custom models passed through correctly
[x] Strategy integration functions properly
[x] TypeScript types are correct
[x] All tests pass
[x] Linting passes

🎯 Benefits
Cost optimization - Users can choose cheaper models like Claude Haiku for large-scale evaluations
Quality tuning - Users can select Claude Opus for highest quality judging when needed
Experimentation - Researchers can compare different models' judging capabilities
Future-proofing - Easy to add support for new models as they become available

Test plan

  • Added unit tests
✓ 48 tests passed | 3 skipped (51 total)
✓ All linting checks pass
✓ Build verification successful

- Add `judgeModel` option to CodyBenchOptions
- Update LlmJudge to accept custom model
- Modify evaluation strategies to use specified model
- Add `--judge-model` CLI option
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant