Inference Perf

The Inference Perf project aims to provide GenAI inference performance benchmarking tool. It came out of wg-serving and is sponsored by SIG Scalability. See the proposal for more info.

Status

This project is currently in development.

Getting Started

Configuration

You can configure inference-perf to run with different data generation and load generation configurations today. Please see config.yml and examples in /examples.

Supported datasets include the following:

ShareGPT (for a real world conversational dataset)
Synthetic (for specific input / output distributions)
Mock (for testing)

Similarly load generation can be configured to run with different request rates and durations. You can also run multiple stages with different request rates and durations within a single run.

Run locally

Setup a virtual environment and install inference-perf
```
pip install .
```
Run inference-perf CLI with a configuration file
```
inference-perf --config_file config.yml
```
See more examples

Run in a Docker container

Build the container
```
docker build -t inference-perf .
```

Run the container

docker run -it --rm -v $(pwd)/config.yml:/workspace/config.yml inference-perf

Run in Kubernetes cluster

Refer to the guide in /deploy.

Contributing

Our community meeting is weekly on Thursdays alternating betweem 09:00 and 11:30 PDT (Zoom Link, Meeting Notes, Meeting Recordings).

We currently utilize the #inference-perf channel in Kubernetes Slack workspace for communications.

Contributions are welcomed, thanks for joining us!

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 236 Commits
.github		.github
deploy		deploy
docs		docs
examples/vllm		examples/vllm
inference_perf		inference_perf
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
OWNERS_ALIASES		OWNERS_ALIASES
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
SECURITY_CONTACTS		SECURITY_CONTACTS
code-of-conduct.md		code-of-conduct.md
config.yml		config.yml
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inference Perf

Status

Getting Started

Configuration

Run locally

Run in a Docker container

Run in Kubernetes cluster

Contributing

Code of conduct

About

Uh oh!

Releases

Packages

Languages

License

jjk-g/inference-perf

Folders and files

Latest commit

History

Repository files navigation

Inference Perf

Status

Getting Started

Configuration

Run locally

Run in a Docker container

Run in Kubernetes cluster

Contributing

Code of conduct

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages