Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Decouple the benchmark script from the components of vLLM #5586

Closed
zhyncs opened this issue Jun 17, 2024 · 1 comment · Fixed by #5588
Closed

[Feature]: Decouple the benchmark script from the components of vLLM #5586

zhyncs opened this issue Jun 17, 2024 · 1 comment · Fixed by #5588

Comments

@zhyncs
Copy link
Contributor

zhyncs commented Jun 17, 2024

🚀 The feature, motivation and pitch

Currently, the benchmark script of vLLM supports multiple backends, and the overall functionality is also relatively rich.

And it relies on backend_request_func and get_tokenizer. The backend_request_func is independent and is a separate file but if we want to use get_tokenizer, we need to clone the repository or install Python package.

from backend_request_func import (ASYNC_REQUEST_FUNCS, RequestFuncInput,
RequestFuncOutput)
from tqdm.asyncio import tqdm
from transformers import PreTrainedTokenizerBase
from vllm.transformers_utils.tokenizer import get_tokenizer

def get_tokenizer(

When we typically use the vLLM script to benchmark other backends, we do not want to rely on vLLM components. We don't want to clone the repository or install a Python package.

May I submit a PR to extract the function get_tokenizer into backend_request_func? Do you think this is okay or do you have any other suggestions? Thanks.

@ywang96 @simon-mo

Alternatives

No response

Additional context

No response

@ywang96
Copy link
Member

ywang96 commented Jun 17, 2024

Hey @zhyncs! First of all thanks for the feedback on benchmark_serving.py and I'm glad you liked its functionality.

Yes I think it totally makes sense to decouple serving benchmark out of vLLM since it doesn't really have any dependencies on the library itself other than reusuing get_tokenizer. Like you said, the only thing you need to do should be just copying codes for get_tokenizer into backend_request_func so we won't need to install vLLM itself at all.

Happy to review your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants