Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile lib_kernel/lib_fast_nms/fast_nms using GPU V100 #35

Closed
emedinac opened this issue Jun 27, 2018 · 5 comments
Closed

compile lib_kernel/lib_fast_nms/fast_nms using GPU V100 #35

emedinac opened this issue Jun 27, 2018 · 5 comments

Comments

@emedinac
Copy link

emedinac commented Jun 27, 2018

Hi,
I tried to compile this code using two GPUs V100 using sm_70 and I'm getting this warning during compiling and this error when I run the test.py:

/usr/local/cuda-9.0/bin/../targets/x86_64-linux/include/sm_30_intrinsics.hpp(213): here was declared deprecated ("__shfl_down() is not valid on compute_70 and above, and should be replaced with __shfl_down_sync().To continue using __shfl_down(), specify virtual architecture compute_60 when targeting sm_70 and above, for example, using the pair of compiler options: -arch=compute_60 -code=sm_70.")
NotFoundError: /home/edgar/light_head_rcnn/lib/lib_kernel/lib_fast_nms/fast_nms.so: undefined symbol: _ZN10tensorflow7strings6StrCatERKNS0_8AlphaNumE

Also, when I use -arch=compute_60 -code=sm_70, I got this warning during compiling and the same error when I run the test.py:

/usr/local/cuda/bin/../targets/x86_64-linux/include/sm_30_intrinsics.hpp(213): here was declared deprecated ("__shfl_down() is deprecated in favor of __shfl_down_sync() and may be removed in a future release (Use -Wno-deprecated-declarations to suppress this warning).")

The lines to be compiled are:

CUDA_PATH=/usr/local/cuda-9.0/
nvcc -std=c++11 -c -o nms_op.cu.o nms_op.cu.cc \
	-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=compute_60 -code=sm_70 --expt-relaxed-constexpr -Wno-deprecated-declarations
@bl0
Copy link

bl0 commented Jul 2, 2018

After a long time of debugging, I find the solution:
Edit the file /src/detection/lib/lib_kernel/lib_fast_nms/make.sh and replace -D_GLIBCXX_USE_CXX11_ABI=0 to -D_GLIBCXX_USE_CXX11_ABI=1 and recompile, the annoying problem disappears.

g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=1 -o fast_nms.so nms_op.cc \
        nms_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64 -L$TF_LIB -ltensorflow_framework -I$TF_INC/external/nsync/public

@emedinac
Copy link
Author

emedinac commented Jul 4, 2018

it worked, thanks. Also, I recommend working in root mode, because I initially installed TF using Conda.

@fay0505
Copy link

fay0505 commented Sep 12, 2018

@bl0 hello, your solution worked, Thanks! Can you account for it?

@bl0
Copy link

bl0 commented Sep 12, 2018

I just search on the issue page of Tensorflow. The following page may help:
tensorflow/tensorflow#20899 (comment)

@fay0505
Copy link

fay0505 commented Sep 12, 2018

@bl0 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants