You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the benchmark script of vLLM supports multiple backends, and the overall functionality is also relatively rich.
And it relies on backend_request_func and get_tokenizer. The backend_request_func is independent and is a separate file but if we want to use get_tokenizer, we need to clone the repository or install Python package.
When we typically use the vLLM script to benchmark other backends, we do not want to rely on vLLM components. We don't want to clone the repository or install a Python package.
May I submit a PR to extract the function get_tokenizer into backend_request_func? Do you think this is okay or do you have any other suggestions? Thanks.
Hey @zhyncs! First of all thanks for the feedback on benchmark_serving.py and I'm glad you liked its functionality.
Yes I think it totally makes sense to decouple serving benchmark out of vLLM since it doesn't really have any dependencies on the library itself other than reusuing get_tokenizer. Like you said, the only thing you need to do should be just copying codes for get_tokenizer into backend_request_func so we won't need to install vLLM itself at all.
🚀 The feature, motivation and pitch
Currently, the benchmark script of vLLM supports multiple backends, and the overall functionality is also relatively rich.
And it relies on
backend_request_func
andget_tokenizer
. Thebackend_request_func
is independent and is a separate file but if we want to useget_tokenizer
, we need to clone the repository or install Python package.vllm/benchmarks/benchmark_serving.py
Lines 37 to 42 in 845a3f2
vllm/vllm/transformers_utils/tokenizer.py
Line 57 in 845a3f2
When we typically use the vLLM script to benchmark other backends, we do not want to rely on vLLM components. We don't want to clone the repository or install a Python package.
May I submit a PR to extract the function
get_tokenizer
intobackend_request_func
? Do you think this is okay or do you have any other suggestions? Thanks.@ywang96 @simon-mo
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: