Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python backend crash #4857

Closed
Tsingjie89 opened this issue Sep 8, 2022 · 4 comments
Closed

python backend crash #4857

Tsingjie89 opened this issue Sep 8, 2022 · 4 comments

Comments

@Tsingjie89
Copy link

Tsingjie89 commented Sep 8, 2022

Description
python backend may crash on multi instance on cpu mode.

Triton Information
What version of Triton are you using?
22.04

Are you using the Triton container or did you build it yourself?
use Triton container

To Reproduce
recoginze model cfg:
name: "rec_ch_cpu"
backend: "paddle"
max_batch_size: 6

input [
{
name:"x",
data_type:TYPE_FP32,
dims:[3, 48, -1]
}
]
output [
{
name:"softmax_5.tmp_0",
data_type:TYPE_FP32,
dims:[-1, 6625]
}
]

instance_group [
{
count: 4
kind: KIND_CPU
}
]

optimization {
execution_accelerators {
cpu_execution_accelerator : [
{
name : "mkldnn"
parameters { key: "cpu_threads" value: "5" }
}
]
}
}

python backend cfg:
name: "ocr_lite_rec"
backend: "python"

input [
{
name: "INPUT_0"
data_type: TYPE_STRING
dims: [-1]
}
]
input [
{
name: "INPUT_1"
data_type: TYPE_STRING
dims: [-1]
}
]

output [
{
name: "OUTPUT"
data_type: TYPE_STRING
dims: [-1]
}
]

instance_group [{
count: 2
kind: KIND_CPU
}
]

test data:
84 images, 50 bboxes per image

Expected behavior
coredump info:
Core was generated by `/opt/tritonserver/backends/python/triton_python_backend_stub /workspace/ocr_lit'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00005594dc325f49 in boost::intrusive::bstree_algorithms_base<boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, t
rue> >::next_node(boost::interprocess::offset_ptr<boost::intrusive::compact_rbtree_node<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul> >, long, unsigned
long, 0ul> const&) ()
[Current thread is 1 (Thread 0x7f98c1cfc000 (LWP 125682))]
(gdb)
(gdb) bt
#0 0x00005594dc325f49 in boost::intrusive::bstree_algorithms_base<boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, t
rue> >::next_node(boost::interprocess::offset_ptr<boost::intrusive::compact_rbtree_node<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul> >, long, unsigned
long, 0ul> const&) ()
#1 0x00005594dc32e8b5 in boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0u
l>::priv_deallocate(void*) ()
#2 0x00005594dc32eea4 in std::_Function_handler<void (char*), triton::backend::python::AllocatedSharedMemory triton::backend::python::SharedMemoryManager::WrapObje
ctInUniquePtr(char*, triton::backend::python::AllocatedShmOwnership*, long const&)::{lambda(char*)#1}>::_M_invoke(std::_Any_data const&, char*&&) ()
#3 0x00005594dc33f2bd in triton::backend::python::PbTensor::~PbTensor() ()
#4 0x00005594dc343261 in std::_Sp_counted_ptr_inplace<triton::backend::python::PbTensor, std::allocatortriton::backend::python::PbTensor, (__gnu_cxx::_Lock_policy)2>:$_M_dispose() ()
#5 0x00005594dc319e55 in triton::backend::python::InferRequest::~InferRequest() ()
#6 0x00005594dc319f76 in std::_Sp_counted_ptr<triton::backend::python::InferRequest*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
#7 0x00005594dc31b598 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::M_release() ()
#8 0x00005594dc31b71a in pybind11::class
<triton::backend::python::InferRequest, std::shared_ptrtriton::backend::python::InferRequest >::dealloc(pybind11::detail::value_and_holder&) ()
#9 0x00005594dc309c27 in pybind11::detail::clear_instance(_object*) ()
#10 0x00005594dc30aba3 in pybind11_object_dealloc ()
#11 0x00007f98c2334dd3 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#12 0x00007f98c254c865 in _PyGen_Send () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#13 0x00007f98b8839ef9 in ?? () from /usr/lib/python3.8/lib-dynload/_asyncio.cpython-38-x86_64-linux-gnu.so
#14 0x00007f98b88390ac in ?? () from /usr/lib/python3.8/lib-dynload/_asyncio.cpython-38-x86_64-linux-gnu.so
#15 0x00007f98c255db1b in _PyObject_MakeTpCall () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#16 0x00007f98c246a8a3 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#17 0x00007f98c251444f in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#18 0x00007f98c255d830 in PyVectorcall_Call () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#19 0x00007f98c2332f48 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#20 0x00007f98c233506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#21 0x00007f98c2329d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#22 0x00007f98c232b018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#23 0x00007f98c233506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#24 0x00007f98c2329d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#25 0x00007f98c232b018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#26 0x00007f98c233506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#27 0x00007f98c2329d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#28 0x00007f98c232b018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#29 0x00007f98c233506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#30 0x00007f98c2329d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#31 0x00007f98c232b018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#32 0x00007f98c247fe3b in _PyEval_EvalCodeWithName () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#33 0x00007f98c255d114 in _PyFunction_Vectorcall () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#34 0x00007f98c255d830 in PyVectorcall_Call () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#35 0x00005594dc31f565 in pybind11::object pybind11::detail::object_api<pybind11::detail::accessorpybind11::detail::accessor_policies::str_attr >::operator()<(pybind11::return_value_policy)1, pybind11::object&>(pybind11::object&) const ()
#36 0x00005594dc313c95 in triton::backend::python::Stub::Execute(triton::backend::python::RequestBatch*, triton::backend::python::ResponseBatch*, long*) ()
#37 0x00005594dc317bf6 in triton::backend::python::Stub::RunCommand() ()
#38 0x00005594dc2fd160 in main ()

@krishung5
Copy link
Contributor

Hi @Tsingjie89, thanks for sharing the config and back trace. Are you able to reproduce this with our newest container? Could you also share the model you are using and the steps to reproduce the issue that will really help us investigate this further?

@Tsingjie89
Copy link
Author

Hi @krishung5
use container: nvcr.io/nvidia/tritonserver:22.08-py3
same model cfg
coredump info:
Core was generated by `/opt/tritonserver/backends/python/triton_python_backend_stub /workspace/ocr_lit'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000557252ba953e in boost::intrusive::multiset_impl<boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::i
nterprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::block_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long,
0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 3u>, void, void, unsigned long, true, void>::insert(boost::intrusive::tree_iterator<boost::$
ntrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::bl$
ck_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusi$
e::dft_tag, 3u>, true>, boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul$
::block_ctrl&) ()
[Current thread is 1 (Thread 0x7f1ca37fe000 (LWP 24676))]
(gdb)
(gdb)
(gdb) bt
#0 0x0000557252ba953e in boost::intrusive::multiset_impl<boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::$nterprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::block_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long$ 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 3u>, void, void, unsigned long, true, void>::insert(boost::intrusive::tree_iterator<boost::$ntrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::bl$ck_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusi$e::dft_tag, 3u>, true>, boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul$::block_ctrl&) ()
#1 0x0000557252baa44a in boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0$l>::priv_deallocate(void*) ()
#2 0x0000557252baa934 in std::_Function_handler<void (char*), triton::backend::python::SharedMemoryManager::WrapObjectInUniquePtr(char*, triton::backend::python::$llocatedShmOwnership*, long const&)::{lambda(char*)#1}>::_M_invoke(std::_Any_data const&, char*&&) ()
#3 0x0000557252bddbdd in triton::backend::python::PbTensor::~PbTensor() ()
#4 0x0000557252be16d1 in std::_Sp_counted_ptr_inplace<triton::backend::python::PbTensor, std::allocatortriton::backend::python::PbTensor, (__gnu_cxx::Lock_policy)2>:$M_dispose() ()
#5 0x0000557252bbe7ed in triton::backend::python::InferRequest::~InferRequest() ()
#6 0x0000557252bae750 in pybind11::cpp_function::initialize<triton::backend::python::pybind11_init_c_python_backend_utils(pybind11::module
&)::{lambda(std::shared_ptrtriton::backend::python::InferRequest&)#2}::operator()(std::shared_ptrtriton::backend::python::InferRequest&) const::{lambda()#1}, std::shared_ptrtriton::backend::python::InferResponse>(triton::backend::python::pybind11_init_c_python_backend_utils(pybind11::module
&)::{lambda(std::shared_ptrtriton::backend::python::InferRequest&)#2}::operator()(std::shared_ptrtriton::backend::python::InferRequest&) const::{lambda()#1}&&, std::shared_ptrtriton::backend::python::InferResponse ()())::{lambda(pybind11::detail::function_record)#1}::_FUN(pybind11::detail) ()
#7 0x0000557252b93985 in pybind11::cpp_function::initialize_generic(std::unique_ptr<pybind11::detail::function_record, pybind11::cpp_function::InitializingFunctionRecordDeleter>&&, char const*, std::type_info const* const*, unsigned long)::{lambda(void*)#1}::_FUN(void*) ()
#8 0x00007f1f7bb0abb3 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#9 0x00007f1f7bac6245 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#10 0x00007f1f7bad5774 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#11 0x00007f1f7baadfbf in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#12 0x00007f1f7b8e1cb0 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#13 0x00007f1f7b8e506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#14 0x00007f1f7bb0d830 in PyVectorcall_Call () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#15 0x00007f1f7b8dfa7a in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#16 0x00007f1f7b8e506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#17 0x00007f1f7b8d9d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#18 0x00007f1f7b8db018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#19 0x00007f1f7b8e506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#20 0x00007f1f7b8d9d6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#21 0x00007f1f7b8db018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#22 0x00007f1f7b8e506b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#23 0x00007f1f7bb0de2b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#24 0x00007f1f7bb0d830 in PyVectorcall_Call () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#25 0x00007f1f7b97bc01 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#26 0x00007f1f7b9e251b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
--Type for more, q to quit, c to continue without paging--
#27 0x00007f1f7b64b609 in start_thread (arg=) at pthread_create.c:477
#28 0x00007f1f7b570133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@krishung5
Copy link
Contributor

Thank you @Tsingjie89 for the reply. Apart from the model config you shared, we would also need the model file, model.py in this case, to reproduce the issue. Besides, could you provide the steps to reproduce the issue? i.e. the full tritonserver ... command you run to get the coredump.

@dyastremsky
Copy link
Contributor

Closing due to inactivity. Please let us know to reopen the issue if you'd like to follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants