Releases: kubernetes-sigs/inference-perf
Releases · kubernetes-sigs/inference-perf
v0.1.1
What's Changed
- Simplified local report storage by @SachinVarghese in #118
- Add basic Helm Chart by @jjk-g in #114
- Add support for api_key #116 by @andresC98 in #120
- feat: migrate scripts from Makefile to PDM scripts by @rudeigerc in #122
- Add Support for Querying Metrics from Google Managed Prometheus and Additional PromQL Filters now Configurable by @Bslabe123 in #121
- fix: improve Docker build workflow with better secret handling and debugging by @wangchen615 in #128
- fix: improve release workflow with proper changelog and Docker image handling by @wangchen615 in #129
- Add qps observability, fractional rates by @jjk-g in #125
- add config.md file to provide detail description for config.yml parameters by @liyuerich in #131
- Prometheus query fixes and examples update by @SachinVarghese in #130
- Add ability to specify datagen bounds for ShareGPT by @jjk-g in #137
- Add the ability to analyze reports and produce charts by @achandrasekar in #135
- update datasets type by @liyuerich in #136
- Fix a parsing issue with streaming requests by @achandrasekar in #140
- feat: add support for s3 storage by @omerap12 in #147
- Add detection for model_name and tokenizer by @jjk-g in #145
- update example config yml files by @liyuerich in #149
- Fix QPS accuracy at lower rates by @jjk-g in #143
- Update Helm chart by @jjk-g in #154
- Autocalc total_count by @jjk-g in #155
- fix: only assign tokenizer to model_name when not configured by @ExplorerRay in #160
- feat: Add Python package publishing to release workflow by @wangchen615 in #153
- Point
config.yml
to mounted configmap file by @Bslabe123 in #158
New Contributors
- @andresC98 made their first contribution in #120
- @rudeigerc made their first contribution in #122
- @liyuerich made their first contribution in #131
- @omerap12 made their first contribution in #147
- @ExplorerRay made their first contribution in #160
Full Changelog: https://github.com/kubernetes-sigs/inference-perf/commits/v0.1.1
Docker Image
quay.io/inference-perf/inference-perf:v0.1.1
Python Package
pip install inference-perf==v0.1.1
v0.1.0
We are excited to announce the initial release of Inference Perf v0.1.0! This release comes with the following key features:
- Highly scalable and can support benchmarking large inference production deployments by sending up to 10k QPS.
- Reports the key metrics needed to measure LLM performance.
- Supports different real world and synthetic datasets.
- Supports different APIs and can support multiple model servers.
- Supports specifying an exact input and output distribution to simulate different scenarios - Gaussian distribution, fixed length, min-max cases are all supported.
- Generates different load patterns and can benchmark specific cases like burst traffic, scaling to saturation and other autoscaling / routing scenarios.
What's Changed
- Add directory structure for the tool by @achandrasekar in #1
- Add Makefile and typecheck presubmit by @Bslabe123 in #3
- Fix Makefile Typo by @Bslabe123 in #6
- Add design document to the repo by @achandrasekar in #4
- Add default python gitignore by @sjmonson in #11
- Make Inference-Perf Package-able / Use Modern Python Tooling by @sjmonson in #13
- Added Abstract Type for Metrics Client by @Bslabe123 in #7
- Add Chen Wang to OWNERS by @terrytangyuan in #20
- Inference perf basic load run implementation by @SachinVarghese in #21
- Adding vLLM Client to inference perf runner by @SachinVarghese in #27
- Add HF ShareGPT Data Generator by @vivekk16 in #33
- Add Unit Testing Maketargets and Unit Testing Github Workflow by @Bslabe123 in #19
- Mock metrics client implementation by @SachinVarghese in #32
- Parameterization of CLI tool using config file by @SachinVarghese in #34
- Add SachinVarghese as approver, add owner aliases by @achandrasekar in #36
- Containerize the benchmark by @achandrasekar in #38
- Added demo example for vLLM Server and shareGPT datagen component by @SachinVarghese in #37
- Fix: Raising error for api type mismatch by @SachinVarghese in #44
- Add Custom Tokenizer by @vivekk16 in #43
- Multi-stage performance run by @SachinVarghese in #49
- Update README.md with meeting time / recording links by @achandrasekar in #54
- Add support for cluster-local benchmarking by @Bslabe123 in #60
- Update DataGenerator to Handle Both Chat and Completion APIs by @Bslabe123 in #58
- Lint and type check fixes by @SachinVarghese in #62
- Add StorageClient abstract type and GCS Client Implementation by @Bslabe123 in #61
- Added Prometheus client to get model server metrics by @aish1331 in #64
- Add support for different input distributions with a synthetic dataset by @achandrasekar in #66
- Automatically Populate Missing Fields in Config by @Bslabe123 in #71
- Generic model server client config by @SachinVarghese in #72
- Request Lifecycle Report Generation by @Bslabe123 in #77
- Add output distribution to synthetic data generator by @achandrasekar in #79
- Improved Logging for Writing Report Files by @Bslabe123 in #80
- Add the option to ignore end of sequence by @achandrasekar in #83
- Add GitHub Release Workflow and Changelog Configuration by @wangchen615 in #41
- Improved abstractions for perf project by @SachinVarghese in #84
- Add issue templates for the repo by @achandrasekar in #90
- docs: Update link to Slack channel in README.md by @terrytangyuan in #91
- Add random data generator by @achandrasekar in #94
- Multi-stage report generation for Prometheus Metrics by @aish1331 in #95
- Add Docker build and push workflows for PRs and releases by @wangchen615 in #97
- Add shared prefix generator to benchmark prefix caching by @achandrasekar in #98
- Added throughput metrics to output report by @Bslabe123 in #101
- Basic code test setup by @SachinVarghese in #96
- Enable Docker Build Workflow on Push to Main Branch by @wangchen615 in #102
- Fix Docker Tag Generation by Using env.QUAY_USERNAME in Workflow by @wangchen615 in #105
- Update Quay.io Organization Name in Docker Build Workflow by @wangchen615 in #106
- Add Support for Streaming Requests to Completions API by @Bslabe123 in #103
- Add multiprocess, multithreaded loadgen by @jjk-g in #99
- Update documentation to cover newer capabilities by @achandrasekar in #104
- Use logging methods with levels instead of print by @shotarok in #110
- Merge the latest fixes to the release branch by @achandrasekar in #115
New Contributors
- @achandrasekar made their first contribution in #1
- @Bslabe123 made their first contribution in #3
- @sjmonson made their first contribution in #11
- @terrytangyuan made their first contribution in #20
- @SachinVarghese made their first contribution in #21
- @vivekk16 made their first contribution in #33
- @aish1331 made their first contribution in #64
- @wangchen615 made their first contribution in #41
- @jjk-g made their first contribution in #99
- @shotarok made their first contribution in #110
Full Changelog: https://github.com/kubernetes-sigs/inference-perf/commits/v0.1.0