Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CuPy for CUDA graphs #2811

Merged
merged 20 commits into from
Feb 13, 2024
Merged

Use CuPy for CUDA graphs #2811

merged 20 commits into from
Feb 13, 2024

Conversation

WoosukKwon
Copy link
Collaborator

This is a temporary fix for the memory leak issue when using CUDA graph w/o the custom all-reduce kernel. The PR uses CuPy NCCL instead of PyTorch NCCL.

@@ -71,7 +73,7 @@ def init_process_group(world_size: int, rank: int, host: str,

if isinstance(cupy, Exception):
raise ImportError(
"NCCLBackend is not available. Please install cupy.") from cupy
"NCCLBackend is not available. Please install cupy==13.0.0.") from cupy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"NCCLBackend is not available. Please install cupy==13.0.0.") from cupy
"NCCLBackend is not available. Please install cupy-cuda12x==13.0.0.") from cupy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. Actually, there are two issues:

  1. I changed the PR to use cupy 12.3 instead of 13.0 because cupy 13.0 does not support python 3.8 (I wasn't able to find the wheel in pypi).
  2. Users need to install different versions of cupy depending on their env. For example, CUDA 11.8 users should install cupy-cuda11x. ROCm users should install cupy-rocm.

requirements.txt Outdated Show resolved Hide resolved
@hanzhi713
Copy link
Contributor

@WoosukKwon FYI custom allreduce doesn't work for all cases (e.g. 8 PCIE gpus) so this fix might be needed anyway

@NikolaBorisov
Copy link
Contributor

I think #2731 might be related.

@WoosukKwon WoosukKwon marked this pull request as ready for review February 13, 2024 09:27
Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WoosukKwon WoosukKwon merged commit a463c33 into main Feb 13, 2024
19 checks passed
@WoosukKwon WoosukKwon deleted the add-cupy branch February 13, 2024 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants