-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Add doc for running vLLM on the cloud #426
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! Left some comments.
docs/source/serving/run_on_cloud.rst
Outdated
python -u -m vllm.entrypoints.api_server \ | ||
--model $MODEL_NAME \ | ||
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \ | ||
--tokenizer hf-internal-testing/llama-tokenizer 2>&1 | tee api_server.log & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make tokenizer also an envvar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Changed. Thanks!
docs/source/serving/run_on_cloud.rst
Outdated
conda activate vllm | ||
echo 'Starting vllm api server...' | ||
python -u -m vllm.entrypoints.api_server \ | ||
--model $MODEL_NAME \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we specify $MODEL_NAME
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is defined in envs
section above. : )
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Can you rename run_on_cloud.rst
to run_on_sky.rst
before we merge? @Michaelvll
@Michaelvll @zhuohan123 Sorry for the late response, but why don't we use a single GPU and a smaller model for the example? |
@WoosukKwon That sounds good to me. Do you think single A100 with LLaMA-13b work? |
This should be great! |
Done. PTAL @WoosukKwon @zhuohan123 |
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
No description provided.