Segmentation Fault when launching the server with custom built TensorRT plugins #2227

zmy1116 · 2020-11-05T07:45:57Z

Description
I want to serve a TensorRT model with custom built plugins on Triton Server. It generates segmenation fault immediately.

I can confirm the TensorRT model plan and the plugin are built correctly, we are currently using this tensorrt model in our production environment.

I have successfully setup other TensorRT models that do not require custom plugins on Triton server, so I think the problem is isolated to custom plugins.

I can reproduce issue with the example detectionLayerPlugin in the nvidia TensorRT repo

Triton Information

For Triton Server, I am using directly the NGC container nvcr.io/nvidia/tritonserver:20.10-py3
To build the TensorRT model plan and the custom plugin, I am using the NGC container nvcr.io/nvidia/tensorrt:20.10-py3
All the work are done on AWS g4dn.xlarge instance, it has a T4 GPU.

To Reproduce
I use the example plugin detectionLayerPlugin in the nivida TensorRT repo https://github.com/NVIDIA/TensorRT/tree/master/plugin to reproduce a custom plugin that cause the issue.

To facilitate your test I create a repo with all the necessary files
https://github.com/zmy1116/triton_server_custom_plugin_issue

So I basically put the following files under the folder plugin:
https://github.com/NVIDIA/TensorRT/blob/master/plugin/detectionLayerPlugin/detectionLayerPlugin.cpp
https://github.com/NVIDIA/TensorRT/blob/master/plugin/detectionLayerPlugin/detectionLayerPlugin.h
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/plugin.h
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/checkMacrosPlugin.cpp
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/checkMacrosPlugin.h
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/kernels/maskRCNNKernels.cu
https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/kernels/maskRCNNKernels.h

To build the plugin , under the TensorRT container, it's the standard

mkdir build
cd build
cmake ..
make

To launch the triton server, within the Triton server container, assuming the model is in path /ubuntu/model_repository and the plugin is at /ubuntu/libtestplugins.so

LD_PRELOAD=/ubuntu/libtestplugins.so tritonserver --model-repository=/ubuntu/model_repository --strict-model-config=false

In the model_repository, just put any model so that triton server will launch, the model does not need to call the custom plugin, the error occurs before models are loaded.

Expected behavior
Immediately the segmentation shows up

I1105 06:27:12.184359 1991 metrics.cc:184] found 1 GPUs supporting NVML metrics
I1105 06:27:12.189796 1991 metrics.cc:193]   GPU 0: Tesla T4
I1105 06:27:12.362048 1991 pinned_memory_manager.cc:195] Pinned memory pool is created at '0x7fd970000000' with size 268435456
I1105 06:27:12.362442 1991 cuda_memory_manager.cc:98] CUDA memory pool is created on device 0 with size 67108864
Segmentation fault (core dumped)

Please let me know if you need any additional information and I will get back to you ASAP.
Thank you

The text was updated successfully, but these errors were encountered:

CoderHam · 2020-11-05T20:42:18Z

Could you use gdb and share a backtrace for the segfault?

zmy1116 · 2020-11-05T22:41:53Z

@CoderHam, thanks for the quick response

below list the backtrace in gdb for the seg fault.

Starting program: /opt/tritonserver/bin/tritonserver --model-repository=/ubuntu/model_repository --strict-model-config=false
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.

process 759 is executing new program: /opt/tritonserver/bin/tritonserver
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
I1105 22:34:42.144986 759 metrics.cc:184] found 1 GPUs supporting NVML metrics
I1105 22:34:42.150359 759 metrics.cc:193]   GPU 0: Tesla T4
[New Thread 0x7fff3d1a1700 (LWP 760)]
[New Thread 0x7fff2dfff700 (LWP 761)]
[New Thread 0x7fff0ffff700 (LWP 762)]
I1105 22:34:42.323176 759 pinned_memory_manager.cc:195] Pinned memory pool is created at '0x7ffef0000000' with size 268435456
I1105 22:34:42.323549 759 cuda_memory_manager.cc:98] CUDA memory pool is created on device 0 with size 67108864

Thread 1 "tritonserver" received signal SIGSEGV, Segmentation fault.
0x00007fffce510d90 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.7
(gdb) backtrace
#0  0x00007fffce510d90 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.7
#1  0x00007fffce50abd6 in initLibNvInferPlugins () from /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.7
#2  0x00007ffff69fc98d in nvidia::inferenceserver::PlanBackendFactory::Create(std::shared_ptr<nvidia::inferenceserver::BackendConfig> const&, std::unique_ptr<nvidia::inferenceserver::PlanBackendFactory, std::default_delete<nvidia::inferenceserver::PlanBackendFactory> >*)::{lambda()#1}::operator()() const ()
   from /opt/tritonserver/bin/../lib/libtritonserver.so
#3  0x00007ffff63ca907 in __pthread_once_slow (once_control=0x7fffffffc1cc, init_routine=0x7ffff57798a0 <__once_proxy>) at pthread_once.c:116
#4  0x00007ffff69fcae8 in nvidia::inferenceserver::PlanBackendFactory::Create(std::shared_ptr<nvidia::inferenceserver::BackendConfig> const&, std::unique_ptr<nvidia::inferenceserver::PlanBackendFactory, std::default_delete<nvidia::inferenceserver::PlanBackendFactory> >*) ()
   from /opt/tritonserver/bin/../lib/libtritonserver.so
#5  0x00007ffff690bcc9 in nvidia::inferenceserver::ModelRepositoryManager::BackendLifeCycle::Create(nvidia::inferenceserver::InferenceServer*, double, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<nvidia::inferenceserver::BackendConfig>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<nvidia::inferenceserver::BackendConfig> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, std::unique_ptr<nvidia::inferenceserver::ModelRepositoryManager::BackendLifeCycle, std::default_delete<nvidia::inferenceserver::ModelRepositoryManager::BackendLifeCycle> >*) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#6  0x00007ffff6914182 in nvidia::inferenceserver::ModelRepositoryManager::Create(nvidia::inferenceserver::InferenceServer*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, float, bool, bool, bool, double, std::unique_ptr<nvidia::inferenceserver::ModelRepositoryManager, std::default_delete<nvidia::inferenceserver::ModelRepositoryManager> >*) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#7  0x00007ffff6943fdf in nvidia::inferenceserver::InferenceServer::Init() () from /opt/tritonserver/bin/../lib/libtritonserver.so
#8  0x00007ffff694d1f3 in TRITONSERVER_ServerNew () from /opt/tritonserver/bin/../lib/libtritonserver.so
#9  0x00005555555cab3a in main ()

i'm not really familar with c/c++/gdb, so i'm not sure if i'm giving you the right information, i did the following

export LD_PRELOAD=/ubuntu/pluggin_test/build/libtestplugins.so
gdb tritonserver
run --model-repository=/ubuntu/model_repository --strict-model-config=false
(..seg fault happens)
backtrace

please let me know if it's not what you want.

thanks.

CoderHam · 2020-11-18T18:46:44Z

@zmy1116 can you shared the model you are using with the plugin shared library?

zmy1116 · 2020-11-19T05:19:15Z

@CoderHam It does not really matter what you put in the model repository, even if none of the models in the repository uses the plugin. From the output above you can see that the error happens before any of the model is loaded.

That's said, I have tested with a repository only containing with the dummy example in the triton server repo
https://github.com/triton-inference-server/server/tree/master/docs/examples/model_repository/simple_string

As you can see this model does not use custom plugin. However, when starting the triton server with the plugin, segmentation fault still occurs.

Thanks

CoderHam · 2020-11-19T18:18:01Z

@zmy1116 I tried loading your shared library with trtexec (You can find this inside the TensorRT container) and saw a segfault even in this case. This ascertains that the issue is inside your plugin or with TensorRT and not Tritonserver. Please file a ticket against TensorRT. They may be able to find issues with your build script that are not obvious to me.

LD_PRELOAD=/data/libtestplugins.so /usr/src/tensorrt/bin/trtexec --loadEngine=/data/model.plan

zmy1116 · 2020-11-20T04:11:03Z

@CoderHam thanks for the directions.

actually non of my built operations seem to work with trtexec directly in command line... (I have tested mulitple versions of TensorRT..... )

In our current production environment. We run TensorRT models directly in python. I just do ctypes.CDLL("libtestplugins.so") before loading the tensorrt engine, and they just work..... i agree there is probably something wrong with my build script.. emm

tianq01 · 2020-12-25T09:56:08Z

@CoderHam thanks for the directions.

actually non of my built operations seem to work with trtexec directly in command line... (I have tested mulitple versions of TensorRT..... )

In our current production environment. We run TensorRT models directly in python. I just do ctypes.CDLL("libtestplugins.so") before loading the tensorrt engine, and they just work..... i agree there is probably something wrong with my build script.. emm

@zmy1116 I hit the same issue. have you got a solution?
thanks.

zmy1116 · 2021-01-04T02:30:45Z

@tianq01 It appears that this specific problem does not exist in the 20.12 version (TRT, triton).

CoderHam closed this as completed Nov 19, 2020

tianq01 mentioned this issue Dec 25, 2020

LD_PRELOAD libamirstan_plugin.so core dump in triton server grimoire/mmdetection-to-tensorrt#40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation Fault when launching the server with custom built TensorRT plugins #2227

Segmentation Fault when launching the server with custom built TensorRT plugins #2227

zmy1116 commented Nov 5, 2020 •

edited

CoderHam commented Nov 5, 2020

zmy1116 commented Nov 5, 2020

CoderHam commented Nov 18, 2020 •

edited

zmy1116 commented Nov 19, 2020

CoderHam commented Nov 19, 2020

zmy1116 commented Nov 20, 2020

tianq01 commented Dec 25, 2020

zmy1116 commented Jan 4, 2021

Segmentation Fault when launching the server with custom built TensorRT plugins #2227

Segmentation Fault when launching the server with custom built TensorRT plugins #2227

Comments

zmy1116 commented Nov 5, 2020 • edited

CoderHam commented Nov 5, 2020

zmy1116 commented Nov 5, 2020

CoderHam commented Nov 18, 2020 • edited

zmy1116 commented Nov 19, 2020

CoderHam commented Nov 19, 2020

zmy1116 commented Nov 20, 2020

tianq01 commented Dec 25, 2020

zmy1116 commented Jan 4, 2021

zmy1116 commented Nov 5, 2020 •

edited

CoderHam commented Nov 18, 2020 •

edited