Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #3

johndpope · 2020-05-23T03:38:15Z

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

root@92aefbd387d9:/home/playing_smplifyx# python3 smplifyx/easy_run.py
Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
no cuda
F0523 03:36:17.552206 109 cudnn_relu_layer.cpp:13] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
@ 0x7f82525590cd google::LogMessage::Fail()
@ 0x7f825255af33 google::LogMessage::SendToLog()
@ 0x7f8252558c28 google::LogMessage::Flush()
@ 0x7f825255b999 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f8251442af0 caffe::CuDNNReLULayer<>::LayerSetUp()
@ 0x7f82515335ed caffe::Net<>::Init()
@ 0x7f82515356ee caffe::Net<>::Net()
@ 0x7f82532e829a op::NetCaffe::initializationOnThread()
@ 0x7f82532c70b8 op::HandExtractorCaffe::netInitializationOnThread()
@ 0x7f82532c8ac3 op::HandExtractorNet::initializationOnThread()
@ 0x7f8253346cf1 op::Worker<>::initializationOnThreadNoException()
@ 0x7f8253346e20 op::SubThread<>::initializationOnThread()
@ 0x7f8253349178 op::Thread<>::initializationOnThread()
@ 0x7f8253349347 op::Thread<>::threadFunction()
@ 0x7f82dcfcf66f (unknown)
@ 0x7f82e1feb6db start_thread
@ 0x7f82e232488f clone
Aborted (core dumped)

johndpope · 2020-05-23T08:01:32Z

I thought the problem is here

smplify-x_in_docker/Dockerfile

Line 1 in d490bbc

FROM nvidia/cudagl:10.0-devel-ubuntu18.04

is referencing older 10.0 cuda driver -
the up to date ones are 10.2
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
https://hub.docker.com/r/nvidia/cudagl

going to retry with

FROM nvidia/cudagl:10.2-devel-ubuntu18.04

**Step 1/37 : FROM nvidia/cudagl:10.2-devel-ubuntu18.04**
10.2-devel-ubuntu18.04: Pulling from nvidia/cudagl
7ddbc47eeb70: Already exists 
c1bbdc448b72: Already exists 
8c3b70e39044: Already exists 
45d437916d57: Already exists 
d8f1569ddae6: Already exists 
902fc5ce8229: Pull complete 
ae1bb79c5cfc: Pull complete 
fa6605c8fe7a: Downloading [==>                                                ]  35.31MB/707.1MB
0508a679d339: Downloading [====>                                              ]  69.73MB/821.6MB
569964daa9b3: Download complete 
e013d8934c2e: Download complete 
dfe43e9df471: Downloading [==========>                                        ]  13.34MB/64.24MB
5e6c9b9d746f: Waiting 
1d445ebc7a6a: Waiting

But seems I'm out of GPU memory
TimoSaemann/ENet#64

UPDATE - it maybe a config issue with openpose
BVLC/caffe#5701

the line in openposewrapper.py
sys.path.append('/home/ubuntu/users/weiwei/anaconda3/envs/smplify-x/bin/python')
doesn't seem right
need to keep digging.

# Import Openpose (Windows/Ubuntu/OSX)
dir_path = os.path.dirname(os.path.realpath(__file__))
try:
    # Windows Import
    if platform == "win32":
        # Change these variables to point to the correct folder (Release/x64 etc.)
        sys.path.append(dir_path + '/../../python/openpose/Release');
        os.environ['PATH']  = os.environ['PATH'] + ';' + dir_path + '/../../x64/Release;' +  dir_path + '/../../bin;'
        import pyopenpose as op
    else:
        # Change these variables to point to the correct folder (Release/x64 etc.)
        # sys.path.append('../../python');
        # If you run `make install` (default path is `/usr/local/python` for Ubuntu), you can also access the OpenPose/python module from there. This will install OpenPose and the python library at your desired installation path. Ensure that this is in your python path in order to use it.
        sys.path.append('/home/ubuntu/users/weiwei/anaconda3/envs/smplify-x/bin/python')
        from openpose import pyopenpose as op
except ImportError as e:
    print('Error: OpenPose library could not be found. Did you enable `BUILD_PYTHON` in CMake and have this Python script in the right folder?')
    raise e

UPDATE 2

changing the hands = False
I get a different error
Check failed: error == cudaSuccess (2 vs. 0) out of memory

openpose_wrapper.py 

# Custom Params (refer to include/openpose/flags.hpp for more parameters)
params = dict()
params["model_folder"] = "/root/openpose/models"
params["net_resolution"] = "-1x320"
params["face"] = True
params["hand"] = False

Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
no cuda
F0523 14:21:08.142108   184 cudnn_conv_layer.cpp:52] Check failed: error == **cudaSuccess** (2 vs. 0)  **out of memory**
*** Check failure stack trace: ***
    @     0x7f5d777670cd  google::LogMessage::Fail()
    @     0x7f5d77768f33  google::LogMessage::SendToLog()
    @     0x7f5d77766c28  google::LogMessage::Flush()
    @     0x7f5d77769999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f5d76682313  caffe::CuDNNConvolutionLayer<>::LayerSetUp()
    @     0x7f5d7678989d  caffe::Net<>::Init()
    @     0x7f5d7678b99e  caffe::Net<>::Net()
    @     0x7f5d784f65ea  op::NetCaffe::initializationOnThread()
    @     0x7f5d7851a6f4  op::addCaffeNetOnThread()
    @     0x7f5d7851bbce  op::PoseExtractorCaffe::netInitializationOnThread()
    @     0x7f5d78521773  op::PoseExtractorNet::initializationOnThread()
    @     0x7f5d78516471  op::PoseExtractor::initializationOnThread()
    @     0x7f5d78511271  op::WPoseExtractor<>::initializationOnThread()
    @     0x7f5d78555041  op::Worker<>::initializationOnThreadNoException()
    @     0x7f5d78555170  op::SubThread<>::initializationOnThread()
    @     0x7f5d785574c8  op::Thread<>::initializationOnThread()
    @     0x7f5d78557697  op::Thread<>::threadFunction()
    @     0x7f5e0217266f  (unknown)
    @     0x7f5e0718e6db  start_thread
    @     0x7f5e074c788f  clone
Aborted (core dumped)

I think I need a new graphics card - I have k2200 nvidia - 4gb of GPU RAM

changing the hands / face -> FALSE

I still get errors

python3 smplifyx/easy_run.py
Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
no cuda
F0523 14:24:43.595528   257 syncedmem.hpp:39] Check failed: error == cudaSuccess (700 vs. 0)  an illegal memory access was encountered
*** Check failure stack trace: ***
    @     0x7feb7c9a40cd  google::LogMessage::Fail()
    @     0x7feb7c9a5f33  google::LogMessage::SendToLog()
    @     0x7feb7c9a3c28  google::LogMessage::Flush()
    @     0x7feb7c9a6999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7feb7b9f2183  caffe::SyncedMemory::~SyncedMemory()
    @     0x7feb7b84a9c2  boost::detail::sp_counted_impl_p<>::dispose()
    @     0x7feb7b84dfb1  boost::shared_ptr<>::reset<>()
    @     0x7feb7b855dcf  caffe::Blob<>::Reshape()
    @     0x7feb7b88feb7  caffe::BaseConvolutionLayer<>::Reshape()
    @     0x7feb7b8c0fff  caffe::CuDNNConvolutionLayer<>::Reshape()
    @     0x7feb7b9b50ed  caffe::Net<>::Reshape()
    @     0x7feb7d73208a  op::NetCaffe::forwardPass()
    @     0x7feb7d75a67a  op::PoseExtractorCaffe::forwardPass()
    @     0x7feb7d7535b5  op::PoseExtractor::forwardPass()
    @     0x7feb7d75043e  op::WPoseExtractor<>::work()
    @     0x7feb7d7922b9  op::Worker<>::checkAndWork()
    @     0x7feb7d792443  op::SubThread<>::workTWorkers()
    @     0x7feb7d79c0a8  op::SubThreadQueueInOut<>::work()
    @     0x7feb7d794721  op::Thread<>::threadFunction()
    @     0x7fec0325766f  (unknown)
    @     0x7fec0c3856db  start_thread
    @     0x7fec0c6be88f  clone
Aborted (core dumped)

UPDATE 4
I changed the hands / face to false in easy_run.py

def get_keypoints(image_np, use_hands=False, use_face=False,
                   use_face_contour=False):
    op_datum = openpose_wrapper.detect_keypoints(image_np)

root@e330b8076e9d:/home/playing_smplifyx# python3 smplifyx/easy_run.py
Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
no cuda
Found Trained Model: vposer/snapshots/TR00_E096.pt
tipe img
<class 'numpy.ndarray'>
Camera initialization done after 2.0845
Camera initialization final loss 123.5452
Stage:   0%|                                               | 0/5 [00:00<?, ?it/s]
Orientation:   0%|                                         | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "smplifyx/easy_run.py", line 145, in <module>
    main(**args)
  File "smplifyx/easy_run.py", line 134, in main
    **args)
  File "/home/playing_smplifyx/smplifyx/fit_single_frame.py", line 446, in fit_single_frame
    use_vposer=use_vposer)
  File "/home/playing_smplifyx/smplifyx/fitting.py", line 180, in run_fitting
    loss = optimizer.step(closure)
  File "/home/playing_smplifyx/smplifyx/optimizers/lbfgs_ls.py", line 280, in step
    orig_loss = closure()
  File "/home/playing_smplifyx/smplifyx/fitting.py", line 258, in fitting_func
    **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/playing_smplifyx/smplifyx/fitting.py", line 377, in forward
    if self.use_joints_conf else
RuntimeError: The size of tensor a (118) must match the size of tensor b (25) at non-singleton dimension 1
root@e330b8076e9d:/home/playing_smplifyx#

UPDATE 5 -
running nvidia-smi on my host I could see /usr/lib/xorg/Xorg was taking up a lot of vram - 2gb of y 4gb of ram - I killed it - and this got me further -
@wells-wei-wei can I ask - how much VRAM your card has?

wells-wei-wei · 2020-05-25T02:00:12Z

I thought the problem is here

smplify-x_in_docker/Dockerfile

Line 1 in d490bbc

FROM nvidia/cudagl:10.0-devel-ubuntu18.04

is referencing older 10.0 cuda driver -
the up to date ones are 10.2
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
https://hub.docker.com/r/nvidia/cudagl

going to retry with

FROM nvidia/cudagl:10.2-devel-ubuntu18.04

**Step 1/37 : FROM nvidia/cudagl:10.2-devel-ubuntu18.04**
10.2-devel-ubuntu18.04: Pulling from nvidia/cudagl
7ddbc47eeb70: Already exists 
c1bbdc448b72: Already exists 
8c3b70e39044: Already exists 
45d437916d57: Already exists 
d8f1569ddae6: Already exists 
902fc5ce8229: Pull complete 
ae1bb79c5cfc: Pull complete 
fa6605c8fe7a: Downloading [==>                                                ]  35.31MB/707.1MB
0508a679d339: Downloading [====>                                              ]  69.73MB/821.6MB
569964daa9b3: Download complete 
e013d8934c2e: Download complete 
dfe43e9df471: Downloading [==========>                                        ]  13.34MB/64.24MB
5e6c9b9d746f: Waiting 
1d445ebc7a6a: Waiting

But seems I'm out of GPU memory
TimoSaemann/ENet#64

UPDATE - it maybe a config issue with openpose
BVLC/caffe#5701

the line in openposewrapper.py
sys.path.append('/home/ubuntu/users/weiwei/anaconda3/envs/smplify-x/bin/python')
doesn't seem right
need to keep digging.

# Import Openpose (Windows/Ubuntu/OSX)
dir_path = os.path.dirname(os.path.realpath(__file__))
try:
    # Windows Import
    if platform == "win32":
        # Change these variables to point to the correct folder (Release/x64 etc.)
        sys.path.append(dir_path + '/../../python/openpose/Release');
        os.environ['PATH']  = os.environ['PATH'] + ';' + dir_path + '/../../x64/Release;' +  dir_path + '/../../bin;'
        import pyopenpose as op
    else:
        # Change these variables to point to the correct folder (Release/x64 etc.)
        # sys.path.append('../../python');
        # If you run `make install` (default path is `/usr/local/python` for Ubuntu), you can also access the OpenPose/python module from there. This will install OpenPose and the python library at your desired installation path. Ensure that this is in your python path in order to use it.
        sys.path.append('/home/ubuntu/users/weiwei/anaconda3/envs/smplify-x/bin/python')
        from openpose import pyopenpose as op
except ImportError as e:
    print('Error: OpenPose library could not be found. Did you enable `BUILD_PYTHON` in CMake and have this Python script in the right folder?')
    raise e

UPDATE 2

changing the hands = False
I get a different error
Check failed: error == cudaSuccess (2 vs. 0) out of memory

openpose_wrapper.py 

# Custom Params (refer to include/openpose/flags.hpp for more parameters)
params = dict()
params["model_folder"] = "/root/openpose/models"
params["net_resolution"] = "-1x320"
params["face"] = True
params["hand"] = False

Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
no cuda
F0523 14:21:08.142108   184 cudnn_conv_layer.cpp:52] Check failed: error == **cudaSuccess** (2 vs. 0)  **out of memory**
*** Check failure stack trace: ***
    @     0x7f5d777670cd  google::LogMessage::Fail()
    @     0x7f5d77768f33  google::LogMessage::SendToLog()
    @     0x7f5d77766c28  google::LogMessage::Flush()
    @     0x7f5d77769999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f5d76682313  caffe::CuDNNConvolutionLayer<>::LayerSetUp()
    @     0x7f5d7678989d  caffe::Net<>::Init()
    @     0x7f5d7678b99e  caffe::Net<>::Net()
    @     0x7f5d784f65ea  op::NetCaffe::initializationOnThread()
    @     0x7f5d7851a6f4  op::addCaffeNetOnThread()
    @     0x7f5d7851bbce  op::PoseExtractorCaffe::netInitializationOnThread()
    @     0x7f5d78521773  op::PoseExtractorNet::initializationOnThread()
    @     0x7f5d78516471  op::PoseExtractor::initializationOnThread()
    @     0x7f5d78511271  op::WPoseExtractor<>::initializationOnThread()
    @     0x7f5d78555041  op::Worker<>::initializationOnThreadNoException()
    @     0x7f5d78555170  op::SubThread<>::initializationOnThread()
    @     0x7f5d785574c8  op::Thread<>::initializationOnThread()
    @     0x7f5d78557697  op::Thread<>::threadFunction()
    @     0x7f5e0217266f  (unknown)
    @     0x7f5e0718e6db  start_thread
    @     0x7f5e074c788f  clone
Aborted (core dumped)

I think I need a new graphics card - I have k2200 nvidia - 4gb of GPU RAM

changing the hands / face -> FALSE

I still get errors

python3 smplifyx/easy_run.py
Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
no cuda
F0523 14:24:43.595528   257 syncedmem.hpp:39] Check failed: error == cudaSuccess (700 vs. 0)  an illegal memory access was encountered
*** Check failure stack trace: ***
    @     0x7feb7c9a40cd  google::LogMessage::Fail()
    @     0x7feb7c9a5f33  google::LogMessage::SendToLog()
    @     0x7feb7c9a3c28  google::LogMessage::Flush()
    @     0x7feb7c9a6999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7feb7b9f2183  caffe::SyncedMemory::~SyncedMemory()
    @     0x7feb7b84a9c2  boost::detail::sp_counted_impl_p<>::dispose()
    @     0x7feb7b84dfb1  boost::shared_ptr<>::reset<>()
    @     0x7feb7b855dcf  caffe::Blob<>::Reshape()
    @     0x7feb7b88feb7  caffe::BaseConvolutionLayer<>::Reshape()
    @     0x7feb7b8c0fff  caffe::CuDNNConvolutionLayer<>::Reshape()
    @     0x7feb7b9b50ed  caffe::Net<>::Reshape()
    @     0x7feb7d73208a  op::NetCaffe::forwardPass()
    @     0x7feb7d75a67a  op::PoseExtractorCaffe::forwardPass()
    @     0x7feb7d7535b5  op::PoseExtractor::forwardPass()
    @     0x7feb7d75043e  op::WPoseExtractor<>::work()
    @     0x7feb7d7922b9  op::Worker<>::checkAndWork()
    @     0x7feb7d792443  op::SubThread<>::workTWorkers()
    @     0x7feb7d79c0a8  op::SubThreadQueueInOut<>::work()
    @     0x7feb7d794721  op::Thread<>::threadFunction()
    @     0x7fec0325766f  (unknown)
    @     0x7fec0c3856db  start_thread
    @     0x7fec0c6be88f  clone
Aborted (core dumped)

UPDATE 4
I changed the hands / face to false in easy_run.py

def get_keypoints(image_np, use_hands=False, use_face=False,
                   use_face_contour=False):
    op_datum = openpose_wrapper.detect_keypoints(image_np)

root@e330b8076e9d:/home/playing_smplifyx# python3 smplifyx/easy_run.py
Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
no cuda
Found Trained Model: vposer/snapshots/TR00_E096.pt
tipe img
<class 'numpy.ndarray'>
Camera initialization done after 2.0845
Camera initialization final loss 123.5452
Stage:   0%|                                               | 0/5 [00:00<?, ?it/s]
Orientation:   0%|                                         | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "smplifyx/easy_run.py", line 145, in <module>
    main(**args)
  File "smplifyx/easy_run.py", line 134, in main
    **args)
  File "/home/playing_smplifyx/smplifyx/fit_single_frame.py", line 446, in fit_single_frame
    use_vposer=use_vposer)
  File "/home/playing_smplifyx/smplifyx/fitting.py", line 180, in run_fitting
    loss = optimizer.step(closure)
  File "/home/playing_smplifyx/smplifyx/optimizers/lbfgs_ls.py", line 280, in step
    orig_loss = closure()
  File "/home/playing_smplifyx/smplifyx/fitting.py", line 258, in fitting_func
    **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/playing_smplifyx/smplifyx/fitting.py", line 377, in forward
    if self.use_joints_conf else
RuntimeError: The size of tensor a (118) must match the size of tensor b (25) at non-singleton dimension 1
root@e330b8076e9d:/home/playing_smplifyx#

UPDATE 5 -
running nvidia-smi on my host I could see /usr/lib/xorg/Xorg was taking up a lot of vram - 2gb of y 4gb of ram - I killed it - and this got me further -
@wells-wei-wei can I ask - how much VRAM your card has?

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

wells-wei-wei · 2020-05-25T02:03:15Z

I don't know why your cuda cause so many problems. My host is CentOS cuda10.1, and in image the cuda 10.0 is OK to run.

johndpope · 2020-05-25T07:27:59Z

How much VRAM do you have?

johndpope closed this as completed May 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #3

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #3

johndpope commented May 23, 2020 •

edited

Loading

johndpope commented May 23, 2020 •

edited

Loading

wells-wei-wei commented May 25, 2020

wells-wei-wei commented May 25, 2020

johndpope commented May 25, 2020

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #3

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #3

Comments

johndpope commented May 23, 2020 • edited Loading

johndpope commented May 23, 2020 • edited Loading

wells-wei-wei commented May 25, 2020

wells-wei-wei commented May 25, 2020

johndpope commented May 25, 2020

johndpope commented May 23, 2020 •

edited

Loading

johndpope commented May 23, 2020 •

edited

Loading