Releases: kubernetes-sigs/inference-perf
Releases · kubernetes-sigs/inference-perf
v0.1.0
We are excited to announce the initial release of Inference Perf v0.1.0! This release comes with the following key features:
- Highly scalable and can support benchmarking large inference production deployments by sending up to 10k QPS.
- Reports the key metrics needed to measure LLM performance.
- Supports different real world and synthetic datasets.
- Supports different APIs and can support multiple model servers.
- Supports specifying an exact input and output distribution to simulate different scenarios - Gaussian distribution, fixed length, min-max cases are all supported.
- Generates different load patterns and can benchmark specific cases like burst traffic, scaling to saturation and other autoscaling / routing scenarios.
What's Changed
- Add directory structure for the tool by @achandrasekar in #1
- Add Makefile and typecheck presubmit by @Bslabe123 in #3
- Fix Makefile Typo by @Bslabe123 in #6
- Add design document to the repo by @achandrasekar in #4
- Add default python gitignore by @sjmonson in #11
- Make Inference-Perf Package-able / Use Modern Python Tooling by @sjmonson in #13
- Added Abstract Type for Metrics Client by @Bslabe123 in #7
- Add Chen Wang to OWNERS by @terrytangyuan in #20
- Inference perf basic load run implementation by @SachinVarghese in #21
- Adding vLLM Client to inference perf runner by @SachinVarghese in #27
- Add HF ShareGPT Data Generator by @vivekk16 in #33
- Add Unit Testing Maketargets and Unit Testing Github Workflow by @Bslabe123 in #19
- Mock metrics client implementation by @SachinVarghese in #32
- Parameterization of CLI tool using config file by @SachinVarghese in #34
- Add SachinVarghese as approver, add owner aliases by @achandrasekar in #36
- Containerize the benchmark by @achandrasekar in #38
- Added demo example for vLLM Server and shareGPT datagen component by @SachinVarghese in #37
- Fix: Raising error for api type mismatch by @SachinVarghese in #44
- Add Custom Tokenizer by @vivekk16 in #43
- Multi-stage performance run by @SachinVarghese in #49
- Update README.md with meeting time / recording links by @achandrasekar in #54
- Add support for cluster-local benchmarking by @Bslabe123 in #60
- Update DataGenerator to Handle Both Chat and Completion APIs by @Bslabe123 in #58
- Lint and type check fixes by @SachinVarghese in #62
- Add StorageClient abstract type and GCS Client Implementation by @Bslabe123 in #61
- Added Prometheus client to get model server metrics by @aish1331 in #64
- Add support for different input distributions with a synthetic dataset by @achandrasekar in #66
- Automatically Populate Missing Fields in Config by @Bslabe123 in #71
- Generic model server client config by @SachinVarghese in #72
- Request Lifecycle Report Generation by @Bslabe123 in #77
- Add output distribution to synthetic data generator by @achandrasekar in #79
- Improved Logging for Writing Report Files by @Bslabe123 in #80
- Add the option to ignore end of sequence by @achandrasekar in #83
- Add GitHub Release Workflow and Changelog Configuration by @wangchen615 in #41
- Improved abstractions for perf project by @SachinVarghese in #84
- Add issue templates for the repo by @achandrasekar in #90
- docs: Update link to Slack channel in README.md by @terrytangyuan in #91
- Add random data generator by @achandrasekar in #94
- Multi-stage report generation for Prometheus Metrics by @aish1331 in #95
- Add Docker build and push workflows for PRs and releases by @wangchen615 in #97
- Add shared prefix generator to benchmark prefix caching by @achandrasekar in #98
- Added throughput metrics to output report by @Bslabe123 in #101
- Basic code test setup by @SachinVarghese in #96
- Enable Docker Build Workflow on Push to Main Branch by @wangchen615 in #102
- Fix Docker Tag Generation by Using env.QUAY_USERNAME in Workflow by @wangchen615 in #105
- Update Quay.io Organization Name in Docker Build Workflow by @wangchen615 in #106
- Add Support for Streaming Requests to Completions API by @Bslabe123 in #103
- Add multiprocess, multithreaded loadgen by @jjk-g in #99
- Update documentation to cover newer capabilities by @achandrasekar in #104
- Use logging methods with levels instead of print by @shotarok in #110
- Merge the latest fixes to the release branch by @achandrasekar in #115
New Contributors
- @achandrasekar made their first contribution in #1
- @Bslabe123 made their first contribution in #3
- @sjmonson made their first contribution in #11
- @terrytangyuan made their first contribution in #20
- @SachinVarghese made their first contribution in #21
- @vivekk16 made their first contribution in #33
- @aish1331 made their first contribution in #64
- @wangchen615 made their first contribution in #41
- @jjk-g made their first contribution in #99
- @shotarok made their first contribution in #110
Full Changelog: https://github.com/kubernetes-sigs/inference-perf/commits/v0.1.0