Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about undefined symbol #6

Closed
WenFuLee opened this issue Apr 8, 2019 · 18 comments
Closed

Question about undefined symbol #6

WenFuLee opened this issue Apr 8, 2019 · 18 comments

Comments

@WenFuLee
Copy link

WenFuLee commented Apr 8, 2019

Below is the error message I got. Not so sure about how to fix it. Could you help me with this? Thanks.

====
UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
exp_config = edict(yaml.load(f))
Traceback (most recent call last):
File "upsnet/upsnet_end2end_train.py", line 61, in
from upsnet.models import *
File "upsnet/../upsnet/models/init.py", line 1, in
from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in
from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone
File "upsnet/../upsnet/models/resnet.py", line 21, in
from upsnet.operators.modules.deform_conv import DeformConv
File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in
from upsnet.operators.functions.deform_conv import DeformConvFunction
File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in
from .._ext.deform_conv import deform_conv_cuda
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE

@YuwenXiong
Copy link
Contributor

The most likely reason is that the pytorch version you used to build the operator is different from the pytorch version you used to run experiments. Please double check the python env/pytorch version and try to rebuild the operators (don't forget to delete upsnet/operators/build folder first)

@WenFuLee
Copy link
Author

WenFuLee commented Apr 8, 2019

Thanks for the reply.

Below are the versions of my pythond and pytorch.
python 3.6.8
pytorch 0.4.1

Also, I just followed your suggestions.
(1) Delete upsnet/operators/build folder first
(2) Run "init.sh" to rebuild the operators
(3) Run the experiment.

But still got the same issue. Is there anything I missed or misunderstood?
Also, when building the operators, I got warnings below. Does it matter?
"cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++"

Thanks.

@YuwenXiong
Copy link
Contributor

Try to run python build_deform_conv.py build_ext --inplace and python build_roialign.py build_ext --inplace manually, make sure your python is with pytorch 0.4.1, then run python under upsnet/operators/_ext/deform_conv, make sure your python is with pytorch 0.4.1 again, then execute import torch and import deform_conv_cuda manually, it should be no problem if your environment setup is correct.

The warning can be just ignored.

@WenFuLee
Copy link
Author

WenFuLee commented Apr 8, 2019

Below is the result of following your suggestions.
Would you mind telling me what might be the reasons for this environment issue? Thanks.

====
~/UPSNet_ROOT/upsnet/operators/_ext/deform_conv$ python
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
print(torch.version)
0.4.1
import deform_conv_cuda
Traceback (most recent call last):
File "", line 1, in
ImportError: /home/wen-fulee/UPSNet_ROOT/upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

@YuwenXiong
Copy link
Contributor

can you show me the output of the operator building?

@WenFuLee
Copy link
Author

WenFuLee commented Apr 8, 2019

Do you mean this?

====

~/UPSNet_ROOT/upsnet/operators$ python build_deform_conv.py build_ext --inplace
running build_ext
building 'deform_conv_cuda' extension
creating build
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/src
gcc -pthread -B /opt/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/deform_conv_cuda.cpp -o build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o -DTORCH_EXTENSION_NAME=deform_conv_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda/bin/nvcc -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/deform_conv_kernel.cu -o build/temp.linux-x86_64-3.6/src/deform_conv_kernel.o -O2 -DTORCH_EXTENSION_NAME=deform_conv_cuda --compiler-options '-fPIC' -std=c++11
creating build/lib.linux-x86_64-3.6
g++ -pthread -shared -B /opt/anaconda3/compiler_compat -L/opt/anaconda3/lib -Wl,-rpath=/opt/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/src/deform_conv_cuda.o build/temp.linux-x86_64-3.6/src/deform_conv_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.6/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so -> 

====

~/UPSNet_ROOT/upsnet/operators$ python build_roialign.py build_ext --inplace
running build_ext
building 'roi_align_cuda' extension
gcc -pthread -B /opt/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/roi_align_cuda.cpp -o build/temp.linux-x86_64-3.6/src/roi_align_cuda.o -DTORCH_EXTENSION_NAME=roi_align_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
src/roi_align_cuda.cpp: In function ‘int roi_align_forward_cuda(int, int, int, float, at::Tensor, at::Tensor, at::Tensor)’:
src/roi_align_cuda.cpp:58:7: warning: unused variable ‘batch_size’ [-Wunused-variable]
   int batch_size = features.size(0);
       ^~~~~~~~~~
/usr/local/cuda/bin/nvcc -I/home/wen-fulee/UPSNet_ROOT/upsnet/operators/src -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/opt/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/anaconda3/include/python3.6m -c src/roi_align_kernel.cu -o build/temp.linux-x86_64-3.6/src/roi_align_kernel.o -O2 -DTORCH_EXTENSION_NAME=roi_align_cuda --compiler-options '-fPIC' -std=c++11
g++ -pthread -shared -B /opt/anaconda3/compiler_compat -L/opt/anaconda3/lib -Wl,-rpath=/opt/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/src/roi_align_cuda.o build/temp.linux-x86_64-3.6/src/roi_align_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.6/roi_align_cuda.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/roi_align_cuda.cpython-36m-x86_64-linux-gnu.so ->

@YuwenXiong
Copy link
Contributor

I think you can find solution here: pytorch/extension-cpp#6 (comment). I'm surprised that -D_GLIBCXX_USE_CXX11_ABI=0 doesn't show in your compile argument since it shows in my side. Please check if your gcc version is > 5.1, if it is I think that's the case. You can manually add '-D_GLIBCXX_USE_CXX11_ABI=0' to https://github.com/uber-research/UPSNet/blob/master/upsnet/operators/build_deform_conv.py, L51 and L52. The same applies to build_roialign.py

@WenFuLee
Copy link
Author

WenFuLee commented Apr 8, 2019

Do I add -D_GLIBCXX_USE_CXX11_ABI=0 or _GLIBCXX_USE_CXX11_ABI=0?
In the post you shared, they seem to use _GLIBCXX_USE_CXX11_ABI=0 instead.

@YuwenXiong
Copy link
Contributor

_GLIBCXX_USE_CXX11_ABI is the macro name, in compiler argument it should be -D_GLIBCXX_USE_CXX11_ABI=0 with a -D prefix

@WenFuLee
Copy link
Author

WenFuLee commented Apr 8, 2019

Thanks. I might have a little progress, but still got an error below. I google it, which might be related to my cuda version: NVlabs/PWC-Net#11
My current cuda version is: release 10.0, V10.0.130
Could this be the possible reason?

~/UPSNet_ROOT$ python upsnet/upsnet_end2end_train.py --cfg upsnet/experiments/upsnet_resnet50_coco.yaml
upsnet/../upsnet/config/config.py:180: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  exp_config = edict(yaml.load(f))
Traceback (most recent call last):
  File "upsnet/upsnet_end2end_train.py", line 61, in <module>
    from upsnet.models import *
  File "upsnet/../upsnet/models/__init__.py", line 1, in <module>
    from .resnet_upsnet import resnet_50_upsnet, resnet_101_upsnet
  File "upsnet/../upsnet/models/resnet_upsnet.py", line 22, in <module>
    from upsnet.models.resnet import get_params, resnet_rcnn, ResNetBackbone
  File "upsnet/../upsnet/models/resnet.py", line 21, in <module>
    from upsnet.operators.modules.deform_conv import DeformConv
  File "upsnet/../upsnet/operators/modules/deform_conv.py", line 22, in <module>
    from upsnet.operators.functions.deform_conv import DeformConvFunction
  File "upsnet/../upsnet/operators/functions/deform_conv.py", line 21, in <module>
    from .._ext.deform_conv import deform_conv_cuda
ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

@YuwenXiong
Copy link
Contributor

I never saw this issue before. Probably changing cuda version would solve it. From my side cuda 9.1/gcc 4.9.4 works for me

@dongzhang89
Copy link

Hi, have you solved this problem ? I meet the same one, but I have no idea about that.

@WenFuLee
Copy link
Author

After downgrading CUDA to 9.1, this was solved.

@whw19950510
Copy link

Hi, I also have this same issue. I followed the instruction cited here but still not work(add flags after cxx && nvcc), my torch version and cuda version is all the same. Any further suggestions? Thanks.

@dongzhang89
Copy link

Hi, I also have this same issue. I followed the instruction cited here but still not work(add flags after cxx && nvcc), my torch version and cuda version is all the same. Any further suggestions? Thanks.

Please check your CUDA path in profile.

@lfdeep
Copy link

lfdeep commented Jun 12, 2019

After downgrading CUDA to 9.1, this was solved.

Hello,Your cuda is 9.1, then what version of gcc can run the network?

@WenFuLee
Copy link
Author

My version is GCC 7.3.0.

@gaussiangit
Copy link

ImportError: upsnet/../upsnet/operators/_ext/deform_conv/deform_conv_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKSs

Also -D_GLIBCXX_USE_CXX11_ABI=0 is there while compilation.

Torch version 1.0.1
GCC 7.3.0
CUDA 9.0

What could be the problem ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants