Use CuPy for CUDA graphs #2811

WoosukKwon · 2024-02-08T03:45:04Z

This is a temporary fix for the memory leak issue when using CUDA graph w/o the custom all-reduce kernel. The PR uses CuPy NCCL instead of PyTorch NCCL.

Yard1 · 2024-02-08T22:09:19Z

vllm/model_executor/parallel_utils/cupy_utils.py

@@ -71,7 +73,7 @@ def init_process_group(world_size: int, rank: int, host: str,

    if isinstance(cupy, Exception):
        raise ImportError(
-            "NCCLBackend is not available. Please install cupy.") from cupy
+            "NCCLBackend is not available. Please install cupy==13.0.0.") from cupy


Suggested change

"NCCLBackend is not available. Please install cupy==13.0.0.") from cupy

"NCCLBackend is not available. Please install cupy-cuda12x==13.0.0.") from cupy

Thanks for pointing it out. Actually, there are two issues:

I changed the PR to use cupy 12.3 instead of 13.0 because cupy 13.0 does not support python 3.8 (I wasn't able to find the wheel in pypi).

Users need to install different versions of cupy depending on their env. For example, CUDA 11.8 users should install cupy-cuda11x. ROCm users should install cupy-rocm.

requirements.txt

hanzhi713 · 2024-02-11T04:57:40Z

@WoosukKwon FYI custom allreduce doesn't work for all cases (e.g. 8 PCIE gpus) so this fix might be needed anyway

NikolaBorisov · 2024-02-13T01:40:46Z

I think #2731 might be related.

Yard1

LGTM

WoosukKwon added 11 commits February 8, 2024 00:08

Add CuPY to dependencies

54d658e

Add CuPy utils

cf03499

Add CuPy states

902d63f

Use CuPy all reduce when enabled

1e4036e

yapf

97ff0e1

Use CuPy for CUDA graphs

e9a0d03

Add comment

58aee0a

yapf

68ab9f8

Compatible with cupy 13.0

bd5557d

Fix cupy_port bug

639db01

Merge branch 'main' into add-cupy

0ebbb08

Yard1 reviewed Feb 8, 2024

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

Address reviews

4ee2e0d

WoosukKwon added 4 commits February 12, 2024 01:37

Roll back to cupy 12.3

3c733ea

Add minor comment

2a40508

Clean up at exit

7020f71

Add clean up methods

233221e

hanzhi713 mentioned this pull request Feb 12, 2024

0.2.7 CPU Memory Leak - MPT-7B tensor_parallel_size=4 #2793

Closed

WoosukKwon mentioned this pull request Feb 13, 2024

vLLM getting stuck. Nothing is generate while requests are running and pending. #2731

Closed

WoosukKwon added 2 commits February 13, 2024 09:21

Improve cleanup procedure

3c211a9

Merge branch 'main' into add-cupy

b9f318b

WoosukKwon marked this pull request as ready for review February 13, 2024 09:27

Minor

a0380f4

Yard1 approved these changes Feb 13, 2024

View reviewed changes

Merge branch 'main' into add-cupy

9b5b227

WoosukKwon merged commit a463c33 into main Feb 13, 2024
19 checks passed

WoosukKwon deleted the add-cupy branch February 13, 2024 19:32

WoosukKwon mentioned this pull request Feb 13, 2024

[Bug ] Memory leak on latest release 0.2.7 #2624

Closed

NikolaBorisov mentioned this pull request Feb 13, 2024

Fix docker python version #2845

Merged

WoosukKwon mentioned this pull request Feb 14, 2024

NCCL hanging during inference #2770

Closed

jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024

Use CuPy for CUDA graphs (vllm-project#2811)

299b8cc

hanzhi713 mentioned this pull request Feb 17, 2024

Some fixes for custom allreduce kernels #2760

Merged

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024

Use CuPy for CUDA graphs (vllm-project#2811)

49f3ac2

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

Use CuPy for CUDA graphs (vllm-project#2811)

75ea4c2

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

esmeetu mentioned this pull request Feb 26, 2024

Fix using CuPy for eager mode #3037

Merged

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Use CuPy for CUDA graphs (vllm-project#2811)

8fbdea2

youkaichao mentioned this pull request Mar 23, 2024

[RFC]: Interface and Abstraction for Distributed Inference Environment #3587

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CuPy for CUDA graphs #2811

Use CuPy for CUDA graphs #2811

WoosukKwon commented Feb 8, 2024

Yard1 Feb 8, 2024

WoosukKwon Feb 12, 2024

hanzhi713 commented Feb 11, 2024

NikolaBorisov commented Feb 13, 2024

Yard1 left a comment

	"NCCLBackend is not available. Please install cupy==13.0.0.") from cupy
	"NCCLBackend is not available. Please install cupy-cuda12x==13.0.0.") from cupy

Use CuPy for CUDA graphs #2811

Use CuPy for CUDA graphs #2811

Conversation

WoosukKwon commented Feb 8, 2024

Yard1 Feb 8, 2024

Choose a reason for hiding this comment

WoosukKwon Feb 12, 2024

Choose a reason for hiding this comment

hanzhi713 commented Feb 11, 2024

NikolaBorisov commented Feb 13, 2024

Yard1 left a comment

Choose a reason for hiding this comment