Skip to content

jjk-g/inference-perf

 
 

Repository files navigation

Inference Perf

The Inference Perf project aims to provide GenAI inference performance benchmarking tool. It came out of wg-serving and is sponsored by SIG Scalability. See the proposal for more info.

Status

This project is currently in development.

Getting Started

Configuration

You can configure inference-perf to run with different data generation and load generation configurations today. Please see config.yml and examples in /examples.

Supported datasets include the following:

  • ShareGPT (for a real world conversational dataset)
  • Synthetic (for specific input / output distributions)
  • Mock (for testing)

Similarly load generation can be configured to run with different request rates and durations. You can also run multiple stages with different request rates and durations within a single run.

Run locally

  • Setup a virtual environment and install inference-perf

    pip install .
    
  • Run inference-perf CLI with a configuration file

    inference-perf --config_file config.yml
    
  • See more examples

Run in a Docker container

  • Build the container

    docker build -t inference-perf .
    
  • Run the container

    docker run -it --rm -v $(pwd)/config.yml:/workspace/config.yml inference-perf
    
    

Run in Kubernetes cluster

Refer to the guide in /deploy.

Contributing

Our community meeting is weekly on Thursdays alternating betweem 09:00 and 11:30 PDT (Zoom Link, Meeting Notes, Meeting Recordings).

We currently utilize the #inference-perf channel in Kubernetes Slack workspace for communications.

Contributions are welcomed, thanks for joining us!

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

About

GenAI inference performance benchmarking tool

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.7%
  • Other 1.3%