Skip to content

Releases: kubernetes-sigs/inference-perf

v0.1.0

05 Jun 23:20
5186b7a
Compare
Choose a tag to compare

We are excited to announce the initial release of Inference Perf v0.1.0! This release comes with the following key features:

  • Highly scalable and can support benchmarking large inference production deployments by sending up to 10k QPS.
  • Reports the key metrics needed to measure LLM performance.
  • Supports different real world and synthetic datasets.
  • Supports different APIs and can support multiple model servers.
  • Supports specifying an exact input and output distribution to simulate different scenarios - Gaussian distribution, fixed length, min-max cases are all supported.
  • Generates different load patterns and can benchmark specific cases like burst traffic, scaling to saturation and other autoscaling / routing scenarios.

What's Changed

New Contributors

Full Changelog: https://github.com/kubernetes-sigs/inference-perf/commits/v0.1.0