Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多卡环境 #21

Closed
deepsmallsea033 opened this issue Mar 18, 2021 · 2 comments
Closed

多卡环境 #21

deepsmallsea033 opened this issue Mar 18, 2021 · 2 comments

Comments

@deepsmallsea033
Copy link

您好,按照您的教程我拉取镜像安装完成后,
import paddle
paddle.fluid.install_check.run_check()
报错了,log 如下
///////////////////////////////////////////////////////
Running Verify Fluid Program ...
W0318 01:57:14.200362 29 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 11.0, Runtime API Version: 10.0
W0318 01:57:14.200688 29 device_context.cc:260] device: 0, cuDNN Version: 7.6.
Your Paddle Fluid works well on SINGLE GPU or CPU.
/usr/local/python3.5.1/lib/python3.5/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
WARNING:root:Your Paddle Fluid has some problem with multiple GPU. This may be caused by:

There is only 1 or 0 GPU visible on your Device;
No.1 or No.2 GPU or both of them are occupied now
Wrong installation of NVIDIA-NCCL2, please follow instruction on https://github.com/NVIDIA/nccl-tests
to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
Original Error is:

C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::NCCLContextMap::NCCLContextMap(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, ncclUniqueId*, unsigned long, unsigned long)
3 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, std::vector<std::string, std::allocatorstd::string > const&, std::string const&, paddle::framework::Scope*, std::vector<paddle::framework::Scope*, std::allocatorpaddle::framework::Scope* > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framework::ir::Graph*)

Error Message Summary:
ExternalError: Nccl error, unhandled system error at (/paddle/paddle/fluid/platform/nccl_helper.h:114)

Your Paddle Fluid is installed successfully ONLY for SINGLE GPU or CPU!
Let's start deep Learning with Paddle Fluid now
////////////////////////////////////////////////////////////////////
我打算用多卡训练,现在只能用一块卡,想咨询一下您训练的时候有没有用多卡。

@yeyupiaoling
Copy link
Owner

@deepsmallsea033 我用的就是多卡,这个错误应该是你没有正确安装NCCL,文档可以看这个:
https://www.paddlepaddle.org.cn/documentation/docs/zh/1.8/install/pip/linux-pip.html#erkaishianzhuang

@deepsmallsea033
Copy link
Author

好的感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants