-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark #3277
Conversation
…to prefix-benchmark
@simon-mo Please take a look whenever you're free, thanks! cc @robertgshaw2-neuralmagic in case you want to use this version of the server benchmark script - feel free to review it as well, thanks! |
"--dataset-name", | ||
type=str, | ||
default="sharegpt", | ||
choices=["sharegpt", "sonnet"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to, rather than getting dataset by name, specify whether we want to read from json or from a text file and provide a path to it? Or is it an overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - I think we can add the option to read from a json or path and that'll be nice for sure, but the problem is we'll then have to use a single user base prompt for all datasets. Also certain arguments of the benchmark (e.g., configure input, output & prefix length) is only used for certain dataset.
I haven't come up with a good solution so for now I keep them as separate datasets. I really like the idea of data registry where the way we sample prompts from the datasets & output lengths lives outside the main benchmark script, and we can reuse it for other benchmark scripts as well, but I haven't put too much thoughts into it
@simon-mo I've addressed your comments as well as adding |
@ywang96 is there any plan to post the metrics you've calculated based on these benchmark scripts (or the slides from the vllm meetup)? |
There is and stay tuned! |
Reopens #3194 that was closed due to fork cleanup
Additional changes not mentioned in #3194 :
vllm
benchmark to OpenAI Completion APIIf streaming is supported, count output tokens by counting actual token streaming SSE events instead of tokenizationwill apply tokenization on output to count tokens since some API server does not stream token by token.To run the benchmark on the sonnet dataset, specify
--dataset-name sonnet
and--dataset-path <path to sonnet.txt>
. Lengths of input, output and prefix lengths can be specified with command line args.