Skip to content

Releases: kubernetes-sigs/inference-perf

v0.1.1

01 Aug 22:43
fd21242
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: https://github.com/kubernetes-sigs/inference-perf/commits/v0.1.1

Docker Image

quay.io/inference-perf/inference-perf:v0.1.1

Python Package

pip install inference-perf==v0.1.1

v0.1.0

26 Jun 18:59
Compare
Choose a tag to compare

We are excited to announce the initial release of Inference Perf v0.1.0! This release comes with the following key features:

  • Highly scalable and can support benchmarking large inference production deployments by sending up to 10k QPS.
  • Reports the key metrics needed to measure LLM performance.
  • Supports different real world and synthetic datasets.
  • Supports different APIs and can support multiple model servers.
  • Supports specifying an exact input and output distribution to simulate different scenarios - Gaussian distribution, fixed length, min-max cases are all supported.
  • Generates different load patterns and can benchmark specific cases like burst traffic, scaling to saturation and other autoscaling / routing scenarios.

What's Changed

New Contributors

Full Changelog: https://github.com/kubernetes-sigs/inference-perf/commits/v0.1.0