Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark #3277

Merged
merged 40 commits into from
Mar 27, 2024

Conversation

ywang96
Copy link
Collaborator

@ywang96 ywang96 commented Mar 8, 2024

Reopens #3194 that was closed due to fork cleanup

Additional changes not mentioned in #3194 :

  • Feature to save individual token latencies (itl) per request to the result json for further debugging
  • Default vllm benchmark to OpenAI Completion API
  • If streaming is supported, count output tokens by counting actual token streaming SSE events instead of tokenization will apply tokenization on output to count tokens since some API server does not stream token by token.
  • Misc fixes/workaround in case the API server doesn't strictly follow Open API

To run the benchmark on the sonnet dataset, specify --dataset-name sonnet and --dataset-path <path to sonnet.txt>. Lengths of input, output and prefix lengths can be specified with command line args.

@ywang96 ywang96 marked this pull request as ready for review March 18, 2024 07:28
@ywang96
Copy link
Collaborator Author

ywang96 commented Mar 18, 2024

@simon-mo Please take a look whenever you're free, thanks!

cc @robertgshaw2-neuralmagic in case you want to use this version of the server benchmark script - feel free to review it as well, thanks!

"--dataset-name",
type=str,
default="sharegpt",
choices=["sharegpt", "sonnet"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to, rather than getting dataset by name, specify whether we want to read from json or from a text file and provide a path to it? Or is it an overkill?

Copy link
Collaborator Author

@ywang96 ywang96 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - I think we can add the option to read from a json or path and that'll be nice for sure, but the problem is we'll then have to use a single user base prompt for all datasets. Also certain arguments of the benchmark (e.g., configure input, output & prefix length) is only used for certain dataset.

I haven't come up with a good solution so for now I keep them as separate datasets. I really like the idea of data registry where the way we sample prompts from the datasets & output lengths lives outside the main benchmark script, and we can reuse it for other benchmark scripts as well, but I haven't put too much thoughts into it

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved
benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved
@ywang96
Copy link
Collaborator Author

ywang96 commented Mar 27, 2024

@simon-mo I've addressed your comments as well as adding --dataset back so it's backward compatible for now with a deprecation warning. PTAL, thanks!

@simon-mo simon-mo merged commit 45b6ef6 into vllm-project:main Mar 27, 2024
33 checks passed
@dgisser
Copy link

dgisser commented Apr 3, 2024

@ywang96 is there any plan to post the metrics you've calculated based on these benchmark scripts (or the slides from the vllm meetup)?

@ywang96
Copy link
Collaborator Author

ywang96 commented Apr 3, 2024

@ywang96 is there any plan to post the metrics you've calculated based on these benchmark scripts (or the slides from the vllm meetup)?

There is and stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants