-
Notifications
You must be signed in to change notification settings - Fork 89
[GuideLLM Refactor] Reenablement of scenarios and fixes for benchmark package and CLI pathways #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GuideLLM Refactor] Reenablement of scenarios and fixes for benchmark package and CLI pathways #414
Conversation
…or the latest state of refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the benchmarking system to use a unified BenchmarkGenerativeTextArgs
configuration object for managing benchmark execution parameters. The key changes enable scenario-based configuration loading, improve the separation between internal benchmarking state and external configuration, and simplify the CLI/API interface.
Key Changes:
- Introduced
BenchmarkGenerativeTextArgs
as the single source of truth for benchmark configuration, replacing scattered parameters - Renamed
BenchmarkArgs
toBenchmarkerArgs
to distinguish internal runtime state from external configuration - Added scenario loading capabilities with built-in scenario discovery
- Updated CLI to load configuration from scenario files with command-line overrides
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
src/guidellm/utils/cli.py | Enhanced parse_json to handle key=value pairs and plain strings in addition to JSON |
src/guidellm/presentation/data_models.py | Added null checks for model name and request/output fields during report generation |
src/guidellm/data/deserializers/deserializer.py | Modified deserializer to prioritize non-HuggingFace deserializers, saving HuggingFace as fallback |
src/guidellm/benchmark/types.py | Removed legacy type aliases file (moved to entrypoints.py) |
src/guidellm/benchmark/schemas.py | Added BenchmarkGenerativeTextArgs class, renamed BenchmarkArgs to BenchmarkerArgs , enhanced documentation |
src/guidellm/benchmark/scenarios/rag.json | Wrapped data field value in array for consistency |
src/guidellm/benchmark/scenarios/chat.json | Wrapped data field value in array for consistency |
src/guidellm/benchmark/scenarios/init.py | New module for built-in scenario discovery and loading |
src/guidellm/benchmark/scenario.py | Removed legacy scenario management code (replaced by new Args-based approach) |
src/guidellm/benchmark/profile.py | Updated documentation, fixed rate handling for single values vs lists |
src/guidellm/benchmark/entrypoints.py | Refactored benchmark_generative_text to accept BenchmarkGenerativeTextArgs , moved type aliases here |
src/guidellm/benchmark/benchmarker.py | Updated to use BenchmarkerArgs instead of BenchmarkArgs |
src/guidellm/benchmark/init.py | Updated exports to reflect renamed classes and new scenario functions |
src/guidellm/main.py | Major CLI refactor to support scenario loading and simplified option handling |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few issues, but otherwise looks good. Tested with a synchronous text run.
7e83e07
to
44051aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of these changes are ones that I've wanted for a while. It's also nice to see the additional documentation.
Tested a scenario and a non-scenario run, and it works.
Summary
Changed the benchmarking entrypoint to take in an Args object which is now used to load scenarios. It enables a single source of truth in addition to being able to save the exact configurations in the report output.
Details
Test Plan
Related Issues
Use of AI
## WRITTEN BY AI ##
)