Add benchmark serving to CI #2505

simon-mo · 2024-01-19T04:43:38Z

Currently we are downloading the full sharegpt but only sending 20 prompts, consider either smaller dataset or generate dummy inputs

zhuohan123

LGTM!

simon-mo · 2024-01-20T04:20:16Z

Working!

Serving Benchmarks
Namespace(backend='vllm', host='localhost', port=8000, endpoint='/v1/completions', model='meta-llama/Llama-2-7b-chat-hf', dataset='./ShareGPT_V3_unfiltered_cleaned_split.json', tokenizer='meta-llama/Llama-2-7b-chat-hf', best_of=1, use_beam_search=False, num_prompts=20, request_rate=inf, seed=0, trust_remote_code=False)

Total time: 58.91 s Throughput: 0.34 requests/s Average latency: 18.55 s Average latency per token: 0.05 s Average latency per output token: 0.10 s

Add benchmark serving to CI

c6ce0b1

zhuohan123 approved these changes Jan 19, 2024

View reviewed changes

simon-mo added 5 commits January 19, 2024 19:00

use curl instead

2cf7518

install wget

089cd2c

install curl as well

a95e134

fix typo

a9ba604

fix

6c48369

simon-mo merged commit 00efdc8 into vllm-project:main Jan 20, 2024
16 checks passed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add benchmark serving to CI (vllm-project#2505)

d6f77ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark serving to CI #2505

Add benchmark serving to CI #2505

simon-mo commented Jan 19, 2024 •

edited

zhuohan123 left a comment

simon-mo commented Jan 20, 2024

Add benchmark serving to CI #2505

Add benchmark serving to CI #2505

Conversation

simon-mo commented Jan 19, 2024 • edited

zhuohan123 left a comment

Choose a reason for hiding this comment

simon-mo commented Jan 20, 2024

simon-mo commented Jan 19, 2024 •

edited