Add an option to disable Ray when using a single GPU #23

WoosukKwon · 2023-04-02T07:32:38Z

When working with a single GPU, Ray is not useful. Therefore, it would be beneficial to have an option to disable Ray in such scenarios.

…envino Use PagedAttentionExtension from OV without contrib dependency

…ubi (opendatahub-io#23) Changes: - vLLM v0.4.2 was published today, update our build to use pre-built libs from their wheel - bump other dependencies in the image build (base UBI image, miniforge, flash attention, grpcio-tools, accelerate) - little cleanup to remove `PYTORCH_` args that are no longer used --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

…ubi (#23) Changes: - vLLM v0.4.2 was published today, update our build to use pre-built libs from their wheel - bump other dependencies in the image build (base UBI image, miniforge, flash attention, grpcio-tools, accelerate) - little cleanup to remove `PYTORCH_` args that are no longer used --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Dockerfile: use fixed vllm-provided nccl version

Removed HIP specific matvec logic that is duplicated from tuned_gemm.py and doesn't support bf16

WoosukKwon changed the title ~~Add no-ray option for single gpu usage~~ Add an option to disable Ray when using a single GPU Apr 2, 2023

zhuohan123 self-assigned this Apr 22, 2023

zhuohan123 mentioned this issue Apr 27, 2023

Add an option to launch cacheflow without ray #51

Merged

zhuohan123 closed this as completed in #51 Apr 30, 2023

shanshanpt mentioned this issue Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this issue Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

slyalin pushed a commit to slyalin/vllm that referenced this issue Apr 4, 2024

Merge pull request vllm-project#23 from slyalin/paged_attention_in_op…

dbed638

…envino Use PagedAttentionExtension from OV without contrib dependency

z103cb pushed a commit to dtrifiro/vllm that referenced this issue May 15, 2024

Merge pull request vllm-project#23 from dtrifiro/use-vllm-nccl

3f5757e

Dockerfile: use fixed vllm-provided nccl version

fxmarty pushed a commit to fxmarty/vllm-public that referenced this issue Jun 12, 2024

Merge pull request vllm-project#23 from ROCm/remove_duplicate_matvec

ae34618

Removed HIP specific matvec logic that is duplicated from tuned_gemm.py and doesn't support bf16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to disable Ray when using a single GPU #23

Add an option to disable Ray when using a single GPU #23

WoosukKwon commented Apr 2, 2023

Add an option to disable Ray when using a single GPU #23

Add an option to disable Ray when using a single GPU #23

Comments

WoosukKwon commented Apr 2, 2023