feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark #3277

ywang96 · 2024-03-08T08:23:56Z

Reopens #3194 that was closed due to fork cleanup

Additional changes not mentioned in #3194 :

Feature to save individual token latencies (itl) per request to the result json for further debugging
Default vllm benchmark to OpenAI Completion API
~~If streaming is supported, count output tokens by counting actual token streaming SSE events instead of tokenization~~ will apply tokenization on output to count tokens since some API server does not stream token by token.
Misc fixes/workaround in case the API server doesn't strictly follow Open API

To run the benchmark on the sonnet dataset, specify --dataset-name sonnet and --dataset-path <path to sonnet.txt>. Lengths of input, output and prefix lengths can be specified with command line args.

…to prefix-benchmark

ywang96 · 2024-03-18T07:30:05Z

@simon-mo Please take a look whenever you're free, thanks!

cc @robertgshaw2-neuralmagic in case you want to use this version of the server benchmark script - feel free to review it as well, thanks!

ElizaWszola · 2024-03-21T12:27:14Z

benchmarks/benchmark_serving.py

+        "--dataset-name",
+        type=str,
+        default="sharegpt",
+        choices=["sharegpt", "sonnet"],


Would it make sense to, rather than getting dataset by name, specify whether we want to read from json or from a text file and provide a path to it? Or is it an overkill?

Hmm - I think we can add the option to read from a json or path and that'll be nice for sure, but the problem is we'll then have to use a single user base prompt for all datasets. Also certain arguments of the benchmark (e.g., configure input, output & prefix length) is only used for certain dataset.

I haven't come up with a good solution so for now I keep them as separate datasets. I really like the idea of data registry where the way we sample prompts from the datasets & output lengths lives outside the main benchmark script, and we can reuse it for other benchmark scripts as well, but I haven't put too much thoughts into it

benchmarks/benchmark_serving.py

ywang96 · 2024-03-27T17:21:20Z

@simon-mo I've addressed your comments as well as adding --dataset back so it's backward compatible for now with a deprecation warning. PTAL, thanks!

…llm-project#3277)

dgisser · 2024-04-03T22:00:04Z

@ywang96 is there any plan to post the metrics you've calculated based on these benchmark scripts (or the slides from the vllm meetup)?

ywang96 · 2024-04-03T23:27:58Z

@ywang96 is there any plan to post the metrics you've calculated based on these benchmark scripts (or the slides from the vllm meetup)?

There is and stay tuned!

ywang96 and others added 10 commits March 8, 2024 00:21

recommit

59aa318

comment

5465a74

yapf

268c7db

assert chat model on prefix benchmark

d94a297

Merge branch 'vllm-project:main' into prefix-benchmark

5ea2f8a

add request-level token latencies

82daa5f

Merge branch 'prefix-benchmark' of https://github.com/ywang96/vllm in…

a41ee3e

…to prefix-benchmark

Merge branch 'main' into prefix-benchmark

9eceddc

typo

69da365

yapf

11e678a

ywang96 mentioned this pull request Mar 14, 2024

[Hotfix] [Debug] ZeroDivisionError: float division by zero #3400

Closed

ywang96 and others added 17 commits March 15, 2024 11:17

remove generation prompt

cc9e4b5

revert

a0878e4

iterate

e28acc5

Merge branch 'main' into prefix-benchmark

5880e89

format

4b4ba64

typo & comment

642c96b

deprecate vllm generate api server

64595ce

add optional args for recordkeeping

ac2f43e

save deployment spec to json

ea096b3

workaround for non-streaming

c19cbe6

fix ds payload

90faa42

fix ds payload

507e3a8

iterate

e437d60

typo

c7d9635

Merge branch 'main' into prefix-benchmark

f925990

format

85d0b7c

yapf

6be2275

ywang96 marked this pull request as ready for review March 18, 2024 07:28

ywang96 mentioned this pull request Mar 18, 2024

[feature request] benchmark TRT-LLM backend #2998

Closed

ywang96 added 2 commits March 18, 2024 02:00

update script

553d732

update comment

c3fbd11

ywang96 mentioned this pull request Mar 20, 2024

[Misc][Benchmarking] Enable benchmarks to create request from file #3530

Closed

simon-mo self-assigned this Mar 20, 2024

ywang96 mentioned this pull request Mar 21, 2024

specify that backend is openai, since server was started with openai entrypoint #3552

Closed

ElizaWszola reviewed Mar 21, 2024

View reviewed changes

ywang96 added 3 commits March 22, 2024 08:52

iterate

ba2e29a

readability

e5729f4

format

ea3750f

simon-mo approved these changes Mar 26, 2024

View reviewed changes

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

ywang96 and others added 4 commits March 27, 2024 10:04

move metadata

3bbacf0

iterate

4c3f829

iterate

e1857b8

Merge branch 'main' into prefix-benchmark

e994a56

ywang96 added 4 commits March 27, 2024 10:22

add stack level

dd43489

clarify

bd7c6bc

yapf & ruff

fb114fa

isort

40fdcdd

simon-mo merged commit 45b6ef6 into vllm-project:main Mar 27, 2024
33 checks passed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 31, 2024

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (v…

b92ee36

…llm-project#3277)

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark #3277

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark #3277

ywang96 commented Mar 8, 2024 •

edited

ywang96 commented Mar 18, 2024 •

edited

ElizaWszola Mar 21, 2024

ywang96 Mar 21, 2024 •

edited

ywang96 commented Mar 27, 2024

dgisser commented Apr 3, 2024

ywang96 commented Apr 3, 2024

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark #3277

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark #3277

Conversation

ywang96 commented Mar 8, 2024 • edited

ywang96 commented Mar 18, 2024 • edited

ElizaWszola Mar 21, 2024

Choose a reason for hiding this comment

ywang96 Mar 21, 2024 • edited

Choose a reason for hiding this comment

ywang96 commented Mar 27, 2024

dgisser commented Apr 3, 2024

ywang96 commented Apr 3, 2024

ywang96 commented Mar 8, 2024 •

edited

ywang96 commented Mar 18, 2024 •

edited

ywang96 Mar 21, 2024 •

edited