Releases · kubernetes-sigs/inference-perf

What's Changed

Simplified local report storage by @SachinVarghese in #118
Add basic Helm Chart by @jjk-g in #114
Add support for api_key #116 by @andresC98 in #120
feat: migrate scripts from Makefile to PDM scripts by @rudeigerc in #122
Add Support for Querying Metrics from Google Managed Prometheus and Additional PromQL Filters now Configurable by @Bslabe123 in #121
fix: improve Docker build workflow with better secret handling and debugging by @wangchen615 in #128
fix: improve release workflow with proper changelog and Docker image handling by @wangchen615 in #129
Add qps observability, fractional rates by @jjk-g in #125
add config.md file to provide detail description for config.yml parameters by @liyuerich in #131
Prometheus query fixes and examples update by @SachinVarghese in #130
Add ability to specify datagen bounds for ShareGPT by @jjk-g in #137
Add the ability to analyze reports and produce charts by @achandrasekar in #135
update datasets type by @liyuerich in #136
Fix a parsing issue with streaming requests by @achandrasekar in #140
feat: add support for s3 storage by @omerap12 in #147
Add detection for model_name and tokenizer by @jjk-g in #145
update example config yml files by @liyuerich in #149
Fix QPS accuracy at lower rates by @jjk-g in #143
Update Helm chart by @jjk-g in #154
Autocalc total_count by @jjk-g in #155
fix: only assign tokenizer to model_name when not configured by @ExplorerRay in #160
feat: Add Python package publishing to release workflow by @wangchen615 in #153
Point config.yml to mounted configmap file by @Bslabe123 in #158

New Contributors

@andresC98 made their first contribution in #120
@rudeigerc made their first contribution in #122
@liyuerich made their first contribution in #131
@omerap12 made their first contribution in #147
@ExplorerRay made their first contribution in #160

Full Changelog: https://github.com/kubernetes-sigs/inference-perf/commits/v0.1.1

Docker Image

quay.io/inference-perf/inference-perf:v0.1.1

Python Package

pip install inference-perf==v0.1.1

We are excited to announce the initial release of Inference Perf v0.1.0! This release comes with the following key features:

Highly scalable and can support benchmarking large inference production deployments by sending up to 10k QPS.
Reports the key metrics needed to measure LLM performance.
Supports different real world and synthetic datasets.
Supports different APIs and can support multiple model servers.
Supports specifying an exact input and output distribution to simulate different scenarios - Gaussian distribution, fixed length, min-max cases are all supported.
Generates different load patterns and can benchmark specific cases like burst traffic, scaling to saturation and other autoscaling / routing scenarios.

What's Changed

Add directory structure for the tool by @achandrasekar in #1
Add Makefile and typecheck presubmit by @Bslabe123 in #3
Fix Makefile Typo by @Bslabe123 in #6
Add design document to the repo by @achandrasekar in #4
Add default python gitignore by @sjmonson in #11
Make Inference-Perf Package-able / Use Modern Python Tooling by @sjmonson in #13
Added Abstract Type for Metrics Client by @Bslabe123 in #7
Add Chen Wang to OWNERS by @terrytangyuan in #20
Inference perf basic load run implementation by @SachinVarghese in #21
Adding vLLM Client to inference perf runner by @SachinVarghese in #27
Add HF ShareGPT Data Generator by @vivekk16 in #33
Add Unit Testing Maketargets and Unit Testing Github Workflow by @Bslabe123 in #19
Mock metrics client implementation by @SachinVarghese in #32
Parameterization of CLI tool using config file by @SachinVarghese in #34
Add SachinVarghese as approver, add owner aliases by @achandrasekar in #36
Containerize the benchmark by @achandrasekar in #38
Added demo example for vLLM Server and shareGPT datagen component by @SachinVarghese in #37
Fix: Raising error for api type mismatch by @SachinVarghese in #44
Add Custom Tokenizer by @vivekk16 in #43
Multi-stage performance run by @SachinVarghese in #49
Update README.md with meeting time / recording links by @achandrasekar in #54
Add support for cluster-local benchmarking by @Bslabe123 in #60
Update DataGenerator to Handle Both Chat and Completion APIs by @Bslabe123 in #58
Lint and type check fixes by @SachinVarghese in #62
Add StorageClient abstract type and GCS Client Implementation by @Bslabe123 in #61
Added Prometheus client to get model server metrics by @aish1331 in #64
Add support for different input distributions with a synthetic dataset by @achandrasekar in #66
Automatically Populate Missing Fields in Config by @Bslabe123 in #71
Generic model server client config by @SachinVarghese in #72
Request Lifecycle Report Generation by @Bslabe123 in #77
Add output distribution to synthetic data generator by @achandrasekar in #79
Improved Logging for Writing Report Files by @Bslabe123 in #80
Add the option to ignore end of sequence by @achandrasekar in #83
Add GitHub Release Workflow and Changelog Configuration by @wangchen615 in #41
Improved abstractions for perf project by @SachinVarghese in #84
Add issue templates for the repo by @achandrasekar in #90
docs: Update link to Slack channel in README.md by @terrytangyuan in #91
Add random data generator by @achandrasekar in #94
Multi-stage report generation for Prometheus Metrics by @aish1331 in #95
Add Docker build and push workflows for PRs and releases by @wangchen615 in #97
Add shared prefix generator to benchmark prefix caching by @achandrasekar in #98
Added throughput metrics to output report by @Bslabe123 in #101
Basic code test setup by @SachinVarghese in #96
Enable Docker Build Workflow on Push to Main Branch by @wangchen615 in #102
Fix Docker Tag Generation by Using env.QUAY_USERNAME in Workflow by @wangchen615 in #105
Update Quay.io Organization Name in Docker Build Workflow by @wangchen615 in #106
Add Support for Streaming Requests to Completions API by @Bslabe123 in #103
Add multiprocess, multithreaded loadgen by @jjk-g in #99
Update documentation to cover newer capabilities by @achandrasekar in #104
Use logging methods with levels instead of print by @shotarok in #110
Merge the latest fixes to the release branch by @achandrasekar in #115