Skip to content

v0.1.0

Latest
Compare
Choose a tag to compare
@achandrasekar achandrasekar released this 05 Jun 23:20
5186b7a

We are excited to announce the initial release of Inference Perf v0.1.0! This release comes with the following key features:

  • Highly scalable and can support benchmarking large inference production deployments by sending up to 10k QPS.
  • Reports the key metrics needed to measure LLM performance.
  • Supports different real world and synthetic datasets.
  • Supports different APIs and can support multiple model servers.
  • Supports specifying an exact input and output distribution to simulate different scenarios - Gaussian distribution, fixed length, min-max cases are all supported.
  • Generates different load patterns and can benchmark specific cases like burst traffic, scaling to saturation and other autoscaling / routing scenarios.

What's Changed

New Contributors

Full Changelog: https://github.com/kubernetes-sigs/inference-perf/commits/v0.1.0