Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault when launching the server with custom built TensorRT plugins #2227

Closed
zmy1116 opened this issue Nov 5, 2020 · 8 comments

Comments

@zmy1116
Copy link

zmy1116 commented Nov 5, 2020

Description
I want to serve a TensorRT model with custom built plugins on Triton Server. It generates segmenation fault immediately.

I can confirm the TensorRT model plan and the plugin are built correctly, we are currently using this tensorrt model in our production environment.

I have successfully setup other TensorRT models that do not require custom plugins on Triton server, so I think the problem is isolated to custom plugins.

I can reproduce issue with the example detectionLayerPlugin in the nvidia TensorRT repo

Triton Information

  • For Triton Server, I am using directly the NGC container nvcr.io/nvidia/tritonserver:20.10-py3
  • To build the TensorRT model plan and the custom plugin, I am using the NGC container nvcr.io/nvidia/tensorrt:20.10-py3
  • All the work are done on AWS g4dn.xlarge instance, it has a T4 GPU.

To Reproduce
I use the example plugin detectionLayerPlugin in the nivida TensorRT repo https://github.com/NVIDIA/TensorRT/tree/master/plugin to reproduce a custom plugin that cause the issue.

To facilitate your test I create a repo with all the necessary files
https://github.com/zmy1116/triton_server_custom_plugin_issue

So I basically put the following files under the folder plugin:
https://github.com/NVIDIA/TensorRT/blob/master/plugin/detectionLayerPlugin/detectionLayerPlugin.cpp
https://github.com/NVIDIA/TensorRT/blob/master/plugin/detectionLayerPlugin/detectionLayerPlugin.h
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/plugin.h
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/checkMacrosPlugin.cpp
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/checkMacrosPlugin.h
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/kernels/maskRCNNKernels.cu
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/kernels/maskRCNNKernels.h

To build the plugin , under the TensorRT container, it's the standard

mkdir build
cd build
cmake ..
make

To launch the triton server, within the Triton server container, assuming the model is in path /ubuntu/model_repository and the plugin is at /ubuntu/libtestplugins.so

LD_PRELOAD=/ubuntu/libtestplugins.so tritonserver --model-repository=/ubuntu/model_repository --strict-model-config=false

In the model_repository, just put any model so that triton server will launch, the model does not need to call the custom plugin, the error occurs before models are loaded.

Expected behavior
Immediately the segmentation shows up

I1105 06:27:12.184359 1991 metrics.cc:184] found 1 GPUs supporting NVML metrics
I1105 06:27:12.189796 1991 metrics.cc:193]   GPU 0: Tesla T4
I1105 06:27:12.362048 1991 pinned_memory_manager.cc:195] Pinned memory pool is created at '0x7fd970000000' with size 268435456
I1105 06:27:12.362442 1991 cuda_memory_manager.cc:98] CUDA memory pool is created on device 0 with size 67108864
Segmentation fault (core dumped)

Please let me know if you need any additional information and I will get back to you ASAP.
Thank you

@CoderHam
Copy link
Contributor

CoderHam commented Nov 5, 2020

Could you use gdb and share a backtrace for the segfault?

@zmy1116
Copy link
Author

zmy1116 commented Nov 5, 2020

@CoderHam, thanks for the quick response

below list the backtrace in gdb for the seg fault.

Starting program: /opt/tritonserver/bin/tritonserver --model-repository=/ubuntu/model_repository --strict-model-config=false
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.

process 759 is executing new program: /opt/tritonserver/bin/tritonserver
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
I1105 22:34:42.144986 759 metrics.cc:184] found 1 GPUs supporting NVML metrics
I1105 22:34:42.150359 759 metrics.cc:193]   GPU 0: Tesla T4
[New Thread 0x7fff3d1a1700 (LWP 760)]
[New Thread 0x7fff2dfff700 (LWP 761)]
[New Thread 0x7fff0ffff700 (LWP 762)]
I1105 22:34:42.323176 759 pinned_memory_manager.cc:195] Pinned memory pool is created at '0x7ffef0000000' with size 268435456
I1105 22:34:42.323549 759 cuda_memory_manager.cc:98] CUDA memory pool is created on device 0 with size 67108864

Thread 1 "tritonserver" received signal SIGSEGV, Segmentation fault.
0x00007fffce510d90 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.7
(gdb) backtrace
#0  0x00007fffce510d90 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.7
#1  0x00007fffce50abd6 in initLibNvInferPlugins () from /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.7
#2  0x00007ffff69fc98d in nvidia::inferenceserver::PlanBackendFactory::Create(std::shared_ptr<nvidia::inferenceserver::BackendConfig> const&, std::unique_ptr<nvidia::inferenceserver::PlanBackendFactory, std::default_delete<nvidia::inferenceserver::PlanBackendFactory> >*)::{lambda()#1}::operator()() const ()
   from /opt/tritonserver/bin/../lib/libtritonserver.so
#3  0x00007ffff63ca907 in __pthread_once_slow (once_control=0x7fffffffc1cc, init_routine=0x7ffff57798a0 <__once_proxy>) at pthread_once.c:116
#4  0x00007ffff69fcae8 in nvidia::inferenceserver::PlanBackendFactory::Create(std::shared_ptr<nvidia::inferenceserver::BackendConfig> const&, std::unique_ptr<nvidia::inferenceserver::PlanBackendFactory, std::default_delete<nvidia::inferenceserver::PlanBackendFactory> >*) ()
   from /opt/tritonserver/bin/../lib/libtritonserver.so
#5  0x00007ffff690bcc9 in nvidia::inferenceserver::ModelRepositoryManager::BackendLifeCycle::Create(nvidia::inferenceserver::InferenceServer*, double, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<nvidia::inferenceserver::BackendConfig>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<nvidia::inferenceserver::BackendConfig> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, std::unique_ptr<nvidia::inferenceserver::ModelRepositoryManager::BackendLifeCycle, std::default_delete<nvidia::inferenceserver::ModelRepositoryManager::BackendLifeCycle> >*) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#6  0x00007ffff6914182 in nvidia::inferenceserver::ModelRepositoryManager::Create(nvidia::inferenceserver::InferenceServer*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, float, bool, bool, bool, double, std::unique_ptr<nvidia::inferenceserver::ModelRepositoryManager, std::default_delete<nvidia::inferenceserver::ModelRepositoryManager> >*) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#7  0x00007ffff6943fdf in nvidia::inferenceserver::InferenceServer::Init() () from /opt/tritonserver/bin/../lib/libtritonserver.so
#8  0x00007ffff694d1f3 in TRITONSERVER_ServerNew () from /opt/tritonserver/bin/../lib/libtritonserver.so
#9  0x00005555555cab3a in main ()


i'm not really familar with c/c++/gdb, so i'm not sure if i'm giving you the right information, i did the following

export LD_PRELOAD=/ubuntu/pluggin_test/build/libtestplugins.so
gdb tritonserver
run --model-repository=/ubuntu/model_repository --strict-model-config=false
(..seg fault happens)
backtrace

please let me know if it's not what you want.

thanks.

@CoderHam
Copy link
Contributor

CoderHam commented Nov 18, 2020

@zmy1116 can you shared the model you are using with the plugin shared library?

@zmy1116
Copy link
Author

zmy1116 commented Nov 19, 2020

@CoderHam It does not really matter what you put in the model repository, even if none of the models in the repository uses the plugin. From the output above you can see that the error happens before any of the model is loaded.

That's said, I have tested with a repository only containing with the dummy example in the triton server repo
https://github.com/triton-inference-server/server/tree/master/docs/examples/model_repository/simple_string

As you can see this model does not use custom plugin. However, when starting the triton server with the plugin, segmentation fault still occurs.

Thanks

@CoderHam
Copy link
Contributor

@zmy1116 I tried loading your shared library with trtexec (You can find this inside the TensorRT container) and saw a segfault even in this case. This ascertains that the issue is inside your plugin or with TensorRT and not Tritonserver. Please file a ticket against TensorRT. They may be able to find issues with your build script that are not obvious to me.

LD_PRELOAD=/data/libtestplugins.so /usr/src/tensorrt/bin/trtexec --loadEngine=/data/model.plan

@zmy1116
Copy link
Author

zmy1116 commented Nov 20, 2020

@CoderHam thanks for the directions.

actually non of my built operations seem to work with trtexec directly in command line... (I have tested mulitple versions of TensorRT..... )

In our current production environment. We run TensorRT models directly in python. I just do ctypes.CDLL("libtestplugins.so") before loading the tensorrt engine, and they just work..... i agree there is probably something wrong with my build script.. emm

@tianq01
Copy link

tianq01 commented Dec 25, 2020

@CoderHam thanks for the directions.

actually non of my built operations seem to work with trtexec directly in command line... (I have tested mulitple versions of TensorRT..... )

In our current production environment. We run TensorRT models directly in python. I just do ctypes.CDLL("libtestplugins.so") before loading the tensorrt engine, and they just work..... i agree there is probably something wrong with my build script.. emm

@zmy1116 I hit the same issue. have you got a solution?
thanks.

@zmy1116
Copy link
Author

zmy1116 commented Jan 4, 2021

@tianq01 It appears that this specific problem does not exist in the 20.12 version (TRT, triton).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants