Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaGraphicsGLRegisterImage failed: cudaErrorNotSupported #4

Closed
rcrvano opened this issue Apr 26, 2019 · 6 comments
Closed

cudaGraphicsGLRegisterImage failed: cudaErrorNotSupported #4

rcrvano opened this issue Apr 26, 2019 · 6 comments

Comments

@rcrvano
Copy link

rcrvano commented Apr 26, 2019

Hello!

I'm trying to build and use the octopus on the following configuration:

  • GeForce RTX 2080 Ti
  • Ubuntu 18.04
  • Cuda 10.0.130, Nvidia driver 410.48, cuDNN 7.5.1.10
  • Python 2.7.15rc1, pip 9.0.1

I successfully built the dirt using these commands

git clone https://github.com/pmh47/dirt.git
cd dirt
mkdir build ; cd build
vim ../csrc/CMakeLists.txt
#add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0)
cmake ../csrc
vim CMakeFiles/rasterise.dir/flags.make
#add CUDA_FLAGS = -DNDEBUG
make
cd ..
pip install -e .

The dirt tests (lighting_tests.py and square_test.py) passed without any errors.

And when I'm trying to run the octopus tests at the end I receive

...
2019-04-25 07:57:54.313242: I /opt/dirt/csrc/gl_common.h:84] successfully created new GL context on thread 0x7f371f9f6700 (EGL = 1.5, GL = 4.6.0 NVIDIA 410.48, renderer = GeForce RTX 2080 Ti/PCIe/SSE2)
2019-04-25 07:57:54.322804: I /opt/dirt/csrc/rasterise_egl.cpp:266] reinitialised framebuffer with size 1080 x 1080
2019-04-25 07:57:54.333154: I /opt/dirt/csrc/gl_common.h:66] selected egl device #0 to match cuda device #0 for thread 0x7f371c9bd700
2019-04-25 07:57:54.360844: I /opt/dirt/csrc/gl_common.h:84] successfully created new GL context on thread 0x7f371c9bd700 (EGL = 1.5, GL = 4.6.0 NVIDIA 410.48, renderer = GeForce RTX 2080 Ti/PCIe/SSE2)
2019-04-25 07:57:54.366606: F /opt/dirt/csrc/rasterise_grad_egl.cpp:194] cudaGraphicsGLRegisterImage failed: cudaErrorNotSupported
run_demo.sh: line 2: 27784 Aborted                 (core dumped) python infer_single.py sample data/sample/segmentations data/sample/keypoints --out_dir out

what can be a problem?

P.S.: My
ls -l /usr/lib/ * / * GL *

-rw-r--r-- 1 root root   67900 мая 23  2018 /usr/lib/girepository-1.0/GstGL-1.0.typelib
lrwxrwxrwx 1 root root      20 фев  9 01:02 /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0
-rw-r--r-- 1 root root  242840 фев  9 01:02 /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0.0.0
lrwxrwxrwx 1 root root      23 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 -> libEGL_nvidia.so.410.48
-rwxr-xr-x 1 root root 1031552 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.410.48
lrwxrwxrwx 1 root root      41 апр 25 07:40 /usr/lib/x86_64-linux-gnu/libEGL.so -> /usr/lib/x86_64-linux-gnu/libEGL.so.1.0.0
lrwxrwxrwx 1 root root      41 апр 25 08:48 /usr/lib/x86_64-linux-gnu/libEGL.so.1 -> /usr/lib/x86_64-linux-gnu/libEGL.so.1.0.0
-rw-r--r-- 1 root root   80448 авг 15  2018 /usr/lib/x86_64-linux-gnu/libEGL.so.1.0.0
lrwxrwxrwx 1 root root      22 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGLdispatch.so -> libGLdispatch.so.0.0.0
lrwxrwxrwx 1 root root      22 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0 -> libGLdispatch.so.0.0.0
-rw-r--r-- 1 root root  612792 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0.0.0
lrwxrwxrwx 1 root root      29 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.410.48
-rwxr-xr-x 1 root root   60200 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.410.48
lrwxrwxrwx 1 root root      21 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so -> libGLESv1_CM.so.1.0.0
lrwxrwxrwx 1 root root      21 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1 -> libGLESv1_CM.so.1.0.0
-rw-r--r-- 1 root root   43328 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1.0.0
lrwxrwxrwx 1 root root      26 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.410.48
-rwxr-xr-x 1 root root  111400 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.410.48
lrwxrwxrwx 1 root root      18 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGLESv2.so -> libGLESv2.so.2.0.0
lrwxrwxrwx 1 root root      18 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGLESv2.so.2 -> libGLESv2.so.2.0.0
-rw-r--r-- 1 root root   72000 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGLESv2.so.2.0.0
-rw-r--r-- 1 root root     671 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGL.la
lrwxrwxrwx 1 root root      14 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGL.so -> libGL.so.1.0.0
lrwxrwxrwx 1 root root      40 апр 25 07:36 /usr/lib/x86_64-linux-gnu/libGL.so.1 -> /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0
-rw-r--r-- 1 root root  567624 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0
lrwxrwxrwx 1 root root      15 апр 24 20:17 /usr/lib/x86_64-linux-gnu/libGLU.so.1 -> libGLU.so.1.3.1
-rw-r--r-- 1 root root  453352 мая 22  2016 /usr/lib/x86_64-linux-gnu/libGLU.so.1.3.1
lrwxrwxrwx 1 root root      23 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGLX_indirect.so.0 -> libGLX_nvidia.so.410.48
lrwxrwxrwx 1 root root      20 фев  9 01:02 /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0 -> libGLX_mesa.so.0.0.0
-rw-r--r-- 1 root root  479992 фев  9 01:02 /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0
lrwxrwxrwx 1 root root      23 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0 -> libGLX_nvidia.so.410.48
-rwxr-xr-x 1 root root 1270576 апр 25 05:54 /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.410.48
lrwxrwxrwx 1 root root      15 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGLX.so -> libGLX.so.0.0.0
lrwxrwxrwx 1 root root      41 апр 25 08:49 /usr/lib/x86_64-linux-gnu/libGLX.so.0 -> /usr/lib/x86_64-linux-gnu/libGLX.so.0.0.0
-rw-r--r-- 1 root root   68144 авг 15  2018 /usr/lib/x86_64-linux-gnu/libGLX.so.0.0.0
lrwxrwxrwx 1 root root      18 авг 15  2018 /usr/lib/x86_64-linux-gnu/libOpenGL.so -> libOpenGL.so.0.0.0
lrwxrwxrwx 1 root root      18 авг 15  2018 /usr/lib/x86_64-linux-gnu/libOpenGL.so.0 -> libOpenGL.so.0.0.0
-rw-r--r-- 1 root root  186688 авг 15  2018 /usr/lib/x86_64-linux-gnu/libOpenGL.so.0.0.0
@rcrvano
Copy link
Author

rcrvano commented Apr 27, 2019

Please check this issue
pmh47/dirt#16

Paul Henderson (the dirt developer) give me this solution:

I reproduced this on a similar configuration; the issue is indeed with using too much memory. Octopus doesn't use allow_growth, so tensorflow grabs all the GPU memory, which leaves DIRT unable to perform its own small allocations. You can patch octopus with the following:

diff --git a/infer_single.py b/infer_single.py
index bfcb292..3a02d8f 100644
--- a/infer_single.py
+++ b/infer_single.py
@@ -6,6 +6,8 @@ from glob import glob
 from lib.io import openpose_from_file, read_segmentation, write_mesh
 from model.octopus import Octopus
 
+import tensorflow as tf
+
 
 def main(weights, name, segm_dir, pose_dir, out_dir, opt_pose_steps, opt_shape_steps):
     segm_files = sorted(glob(os.path.join(segm_dir, '*.png')))
@@ -14,6 +16,8 @@ def main(weights, name, segm_dir, pose_dir, out_dir, opt_pose_steps, opt_shape_s
     if len(segm_files) != len(pose_files) or len(segm_files) == len(pose_files) == 0:
         exit('Inconsistent input.')
 
+    tf.keras.backend.set_session(tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))))
+
     model = Octopus(num=len(segm_files))
     model.load(weights)

And all works fine at me. Please add this patch to the octopus code for works with GPU cards with more than 10GB memory.

@thmoa
Copy link
Owner

thmoa commented Apr 29, 2019

DIRT is not part of this project. Please open a ticket at the DIRT project.
If this option improves this project, please submit a pull request.

@thmoa thmoa closed this as completed Apr 29, 2019
@rcrvano
Copy link
Author

rcrvano commented Apr 29, 2019

The fixes should be made in the Octopus code:

file infer_single.py:

import tensorflow as tf

and add inside main function :

tf.keras.backend.set_session(tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))))

Please read his anwer

I reproduced this on a similar configuration; the issue is indeed with using too much memory. Octopus doesn't use allow_growth, so tensorflow grabs all the GPU memory, which leaves DIRT unable to perform its own small allocations. You can patch octopus with the following:

@thmoa
Copy link
Owner

thmoa commented Apr 29, 2019

Please feel free to submit a pull request.

@rcrvano
Copy link
Author

rcrvano commented Apr 29, 2019

Now all works fine. Thanks!

@OOF-dura
Copy link

OOF-dura commented Jun 6, 2020

Now all works fine. Thanks!

hi~ Could you share your pkg version? Are you using tf1.13.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants