-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: [1] is setting up NCCL communicator and retreiving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Broken pipe #221
Labels
good first issue
Good for newcomers
Comments
把实验路径删了,或者注释掉以下代码试试: wespeaker/wespeaker/bin/train.py Lines 60 to 61 in 6550a2a
|
使用单gpu的时候可以运行,多gpu就跑不动 |
看起来是nccl的问题 |
wespeaker/wespeaker/bin/train.py Line 52 in 6550a2a
换成gloo试试 |
应该是卡在了dist.barrier(device_ids=[gpu]) |
在脚本前加入NCCL_P2P_DISABLE=1可以多卡跑了 |
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The text was updated successfully, but these errors were encountered: