Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage hang at exit #3434

Closed
kikimo opened this issue Dec 8, 2021 · 0 comments
Closed

Storage hang at exit #3434

kikimo opened this issue Dec 8, 2021 · 0 comments
Assignees
Labels
type/bug Type: something is unexpected
Milestone

Comments

@kikimo
Copy link
Contributor

kikimo commented Dec 8, 2021

Please check the FAQ documentation before raising an issue

Storage may hang forever at exit, this happends regularly during our test when you kill a storage leader instance while inserting data. The probelm might be caused by a dead lock generated by the signal handler in nebula, we think it might also exists in metad and graphd since they share similar signal handler processing mechanism. When this happends, we can always see a similar stack:

(rr) bt
#0  0x0000000070000002 in syscall_traced ()
#1  0x0000250910a18725 in _raw_syscall () at /root/src/rr/src/preload/raw_syscall.S:120
#2  0x0000250910a144ff in traced_raw_syscall (call=call@entry=0x7f3c1c8fd970) at /root/src/rr/src/preload/syscallbuf.c:278
#3  0x0000250910a15f98 in sys_statfs (call=<optimized out>) at /root/src/rr/src/preload/syscallbuf.c:3148
#4  syscall_hook_internal (call=0x7f3c1c8fd970) at /root/src/rr/src/preload/syscallbuf.c:3415
#5  syscall_hook (call=0x7f3c1c8fd970) at /root/src/rr/src/preload/syscallbuf.c:3454
#6  0x0000250910a14340 in _syscall_hook_trampoline () at /root/src/rr/src/preload/syscall_hook.S:313
#7  0x0000250910a1439f in __morestack () at /root/src/rr/src/preload/syscall_hook.S:458
#8  0x0000250910a143bb in _syscall_hook_trampoline_48_3d_00_f0_ff_ff () at /root/src/rr/src/preload/syscall_hook.S:477
#9  0x00002509109f537c in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x250910833a2c) at ../sysdeps/nptl/futex-internal.h:183
#10 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x250910833930, cond=0x250910833a00) at pthread_cond_wait.c:508
#11 __pthread_cond_wait (cond=0x250910833a00, mutex=0x250910833930) at pthread_cond_wait.c:638
#12 0x0000000004ed23e0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
#13 0x00000000046d867b in apache::thrift::concurrency::ThreadManager::Impl::removeWorkerImpl(std::unique_lock<std::mutex>&, unsigned long, bool) ()
#14 0x00000000046d9c1b in apache::thrift::concurrency::ThreadManager::Impl::stopImpl(bool) ()
#15 0x00000000046e09f9 in apache::thrift::concurrency::PriorityThreadManager::PriorityImpl::join() ()
#16 0x00000000045deee7 in apache::thrift::ThriftServer::~ThriftServer() ()
#17 0x00000000045df3e2 in apache::thrift::ThriftServer::~ThriftServer() ()
#18 0x0000000002be4223 in std::default_delete<apache::thrift::ThriftServer>::operator() (this=0x250910812ac0, __ptr=0x25091081e000) at /usr/include/c++/9/bits/unique_ptr.h:81
#19 0x0000000002be7100 in std::unique_ptr<apache::thrift::ThriftServer, std::default_delete<apache::thrift::ThriftServer> >::reset (this=0x250910812ac0, __p=0x25091081e000) at /usr/include/c++/9/bits/unique_ptr.h:402
#20 0x0000000004110b8a in nebula::raftex::RaftexService::waitUntilStop (this=0x250910812ab0) at /root/src/nebula/src/kvstore/raftex/RaftexService.cpp:141
#21 0x00000000040652a2 in nebula::kvstore::NebulaStore::~NebulaStore (this=0x250910837400, __in_chrg=<optimized out>) at /root/src/nebula/src/kvstore/NebulaStore.cpp:40
#22 0x00000000040654fe in nebula::kvstore::NebulaStore::~NebulaStore (this=0x250910837400, __in_chrg=<optimized out>) at /root/src/nebula/src/kvstore/NebulaStore.cpp:48
#23 0x0000000002be442f in std::default_delete<nebula::kvstore::KVStore>::operator() (this=0x25091082f670, __ptr=0x250910837400) at /usr/include/c++/9/bits/unique_ptr.h:81
#24 0x0000000002bdf1b8 in std::unique_ptr<nebula::kvstore::KVStore, std::default_delete<nebula::kvstore::KVStore> >::reset (this=0x25091082f670, __p=0x250910837400) at /usr/include/c++/9/bits/unique_ptr.h:402
#25 0x0000000002bd76f7 in nebula::storage::StorageServer::stop (this=0x25091082f600) at /root/src/nebula/src/storage/StorageServer.cpp:351
#26 0x0000000002bbaf2d in signalHandler (sig=15) at /root/src/nebula/src/daemons/StorageDaemon.cpp:183
#27 0x0000000002bbad2f in <lambda(nebula::SignalHandler::GeneralSignalInfo*)>::operator()(nebula::SignalHandler::GeneralSignalInfo *) const (__closure=0x5d93360 <nebula::SignalHandler::get()::instance+448>, info=0x7f3c1c8fe030) at /root/src/nebula/src/daemons/StorageDaemon.cpp:174
#28 0x0000000002bc2ad8 in std::_Function_handler<void(nebula::SignalHandler::GeneralSignalInfo*), setupSignalHandler()::<lambda(nebula::SignalHandler::GeneralSignalInfo*)> >::_M_invoke(const std::_Any_data &, nebula::SignalHandler::GeneralSignalInfo *&&) (__functor=...,
    __args#0=@0x7f3c1c8fdfe0: 0x7f3c1c8fe030) at /usr/include/c++/9/bits/std_function.h:300
#29 0x0000000003f5abc8 in std::function<void (nebula::SignalHandler::GeneralSignalInfo*)>::operator()(nebula::SignalHandler::GeneralSignalInfo*) const (this=0x5d93360 <nebula::SignalHandler::get()::instance+448>, __args#0=0x7f3c1c8fe030)
    at /usr/include/c++/9/bits/std_function.h:688
#30 0x0000000003f5a6e1 in nebula::SignalHandler::handleGeneralSignal (this=0x5d931a0 <nebula::SignalHandler::get()::instance>, sig=15, info=0x7f3c1c8fe1f0) at /root/src/nebula/src/common/base/SignalHandler.cpp:106
#31 0x0000000003f5a669 in nebula::SignalHandler::doHandle (this=0x5d931a0 <nebula::SignalHandler::get()::instance>, sig=15, info=0x7f3c1c8fe1f0, uctx=0x7f3c1c8fe0c0) at /root/src/nebula/src/common/base/SignalHandler.cpp:100
#32 0x0000000003f5a5c5 in nebula::SignalHandler::handlerHook (sig=15, info=0x7f3c1c8fe1f0, uctx=0x7f3c1c8fe0c0) at /root/src/nebula/src/common/base/SignalHandler.cpp:81
#33 <signal handler called>
#34 0x0000000070000002 in syscall_traced ()
#35 0x0000250910a18725 in _raw_syscall () at /root/src/rr/src/preload/raw_syscall.S:120
#36 0x0000250910a144ff in traced_raw_syscall (call=call@entry=0x7f3c1c8fefa0) at /root/src/rr/src/preload/syscallbuf.c:278
#37 0x0000250910a15f98 in sys_statfs (call=<optimized out>) at /root/src/rr/src/preload/syscallbuf.c:3148
#38 syscall_hook_internal (call=0x7f3c1c8fefa0) at /root/src/rr/src/preload/syscallbuf.c:3415
#39 syscall_hook (call=0x7f3c1c8fefa0) at /root/src/rr/src/preload/syscallbuf.c:3454
#40 0x0000250910a14340 in _syscall_hook_trampoline () at /root/src/rr/src/preload/syscall_hook.S:313
#41 0x0000250910a1439f in __morestack () at /root/src/rr/src/preload/syscall_hook.S:458
#42 0x0000250910a143bb in _syscall_hook_trampoline_48_3d_00_f0_ff_ff () at /root/src/rr/src/preload/syscall_hook.S:477
#43 0x00002509109f92d5 in __libc_write (nbytes=8, buf=0x2c941f48e08, fd=31) at ../sysdeps/unix/sysv/linux/write.c:26
#44 __libc_write (fd=31, buf=0x2c941f48e08, nbytes=8) at ../sysdeps/unix/sysv/linux/write.c:24
#45 0x000000000496e61e in folly::EventBaseAtomicNotificationQueue<folly::Function<void ()>, folly::EventBase::FuncRunner>::notifyFd() ()
#46 0x000000000496829e in folly::EventBase::runInEventBaseThread(folly::Function<void ()>) ()
#47 0x00000000045a922b in apache::thrift::HandlerCallbackBase::sendReply(folly::IOBufQueue) ()
#48 0x00000000041e3581 in apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse>::doResult (this=0x241a228b6a80, r=...) at /root/src/nebula/build/third-party/install/include/thrift/lib/cpp2/async/AsyncProcessor.h:865
#49 0x00000000041dd581 in apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse>::result (this=0x241a228b6a80, r=...) at /root/src/nebula/build/third-party/install/include/thrift/lib/cpp2/async/AsyncProcessor.h:568
#50 0x00000000041d76d6 in apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse>::complete (this=0x241a228b6a80, r=...) at /root/src/nebula/build/third-party/install/include/thrift/lib/cpp2/async/AsyncProcessor.h:851
#51 0x00000000041d44c9 in apache::thrift::detail::si::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>::operator()(folly::Try<nebula::raftex::cpp2::AppendLogResponse> &&) const (this=0x14613bee3d10, _ret=...)
    at /root/src/nebula/build/third-party/install/include/thrift/lib/cpp2/GeneratedCodeHelper.h:1265
#52 0x00000000041d7753 in folly::Future<nebula::raftex::cpp2::AppendLogResponse>::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>::operator()(folly::Executor::KeepAlive<folly::Executor> &&, folly::Try<nebula::raftex::cpp2::AppendLogResponse> &&) (this=0x14613bee3d10, t=...) at /root/src/nebula/build/third-party/install/include/folly/futures/Future-inl.h:961
#53 0x00000000041dd63d in folly::futures::detail::CoreCallbackState<folly::Unit, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>; apache::thrift::detail::si::CallbackPtr<F> = std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >; typename folly::drop_unit<typename folly::invoke_detail::traits<F>::result<>::value_type>::type = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse; typename folly::futures::detail::tryCallableResult<T, F>::value_type = folly::Unit]::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> >::invoke<folly::Executor::KeepAlive<folly::Executor>, folly::Try<nebula::raftex::cpp2::AppendLogResponse> >(void) (this=0x14613bee3d10) at /root/src/nebula/build/third-party/install/include/folly/futures/Future-inl.h:134
#54 0x00000000041dd696 in folly::futures::detail::detail_msvc_15_7_workaround::invoke<folly::futures::detail::tryExecutorCallableResult<nebula::raftex::cpp2::AppendLogResponse, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::Ser--Type <RET> for more, q to quit, c to continue without paging--
verInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, void>, folly::futures::detail::CoreCallbackState<folly::Unit, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> >, nebula::raftex::cpp2::AppendLogResponse>(folly::futures::detail::tryExecutorCallableResult<nebula::raftex::cpp2::AppendLogResponse, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>; apache::thrift::detail::si::CallbackPtr<F> = std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >; typename folly::drop_unit<typename folly::invoke_detail::traits<F>::result<>::value_type>::type = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse; typename folly::futures::detail::tryCallableResult<T, F>::value_type = folly::Unit]::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, void>, folly::futures::detail::CoreCallbackState<folly::Unit, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>; apache::thrift::detail::si::CallbackPtr<F> = std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >; typename folly::drop_unit<typename folly::invoke_detail::traits<F>::result<>::value_type>::type = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse; typename folly::futures::detail::tryCallableResult<T, F>::value_type = folly::Unit]::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> > &, folly::Executor::KeepAlive<folly::Executor> &&, folly::Try<nebula::raftex::cpp2::AppendLogResponse> &&) (state=..., ka=..., t=...) at /root/src/nebula/build/third-party/install/include/folly/futures/Future-inl.h:327
#55 0x00000000041dd6e4 in folly::futures::detail::FutureBase<nebula::raftex::cpp2::AppendLogResponse>::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>::<lambda()>::operator()(void) const (this=0x2c941f49240)
    at /root/src/nebula/build/third-party/install/include/folly/futures/Future-inl.h:402
#56 0x00000000041e9555 in folly::makeTryWithNoUnwrap<folly::futures::detail::FutureBase<T>::thenImplementation(F&&, R, folly::futures::detail::InlineContinuation) [with F = folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; R = folly::futures::detail::tryExecutorCallableResult<nebula::raftex::cpp2::AppendLogResponse, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, void>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> mutable::<lambda()> >(folly::futures::detail::FutureBase<nebula::raftex::cpp2::AppendLogResponse>::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>::<lambda()> &&) (f=...) at /root/src/nebula/build/third-party/install/include/folly/Try-inl.h:257
#57 0x00000000041e3880 in folly::makeTryWith<folly::futures::detail::FutureBase<T>::thenImplementation(F&&, R, folly::futures::detail::InlineContinuation) [with F = folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; R = folly::futures::detail::tryExecutorCallableResult<nebula::raftex::cpp2::AppendLogResponse, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, void>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> mutable::<lambda()> >(folly::futures::detail::FutureBase<nebula::raftex::cpp2::AppendLogResponse>::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>::<lambda()> &&) (f=...) at /root/src/nebula/build/third-party/install/include/folly/Try-inl.h:270
#58 0x00000000041dd7ce in folly::futures::detail::FutureBase<nebula::raftex::cpp2::AppendLogResponse>::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>::operator()(folly::Executor::KeepAlive<folly::Executor> &&, folly::Try<nebula::raftex::cpp2::AppendLogResponse> &&) (this=0x14613bee3d10, ka=..., t=...) at /root/src/nebula/build/third-party/install/include/folly/futures/Future-inl.h:401
#59 0x00000000041e9738 in folly::futures::detail::Core<nebula::raftex::cpp2::AppendLogResponse>::<lambda(folly::futures::detail::CoreBase&, folly::Executor::KeepAlive<folly::Executor>&&, folly::exception_wrapper*)>::operator()(folly::futures::detail::CoreBase &, folly::Executor::KeepAlive<folly::Executor> &&, folly::exception_wrapper *) (this=0x14613bee3d10, coreBase=..., ka=..., ew=0x0) at /root/src/nebula/build/third-party/install/include/folly/futures/detail/Core.h:583
#60 0x00000000041f21ec in folly::detail::function::FunctionTraits<void(folly::futures::detail::CoreBase&, folly::Executor::KeepAlive<folly::Executor>&&, folly::exception_wrapper*)>::callSmall<folly::futures::detail::Core<T>::setCallback(F&&, std::shared_ptr<folly::RequestContext>&&, folly::futures::detail::InlineContinuation) [with F = folly::futures::detail::FutureBase<T>::thenImplementation(F&&, R, folly::futures::detail::InlineContinuation) [with F = folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; R = folly::futures::detail::tryExecutorCallableResult<nebula::raftex::cpp2::AppendLogResponse, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, void>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::futures::detail::CoreBase&, folly::Executor::KeepAlive<>&&, folly::exception_wrapper*)> >(folly::detail::function::CallArg, folly::detail::function::CallArg, folly::detail::function::CallArg, folly::detail::function::Data &) (args#0=..., args#1=..., args#2=0x0, p=...)
    at /root/src/nebula/build/third-party/install/include/folly/Function.h:371
#61 0x000000000492ac9c in ?? ()
#62 0x000000000492ba84 in folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State) ()
#63 0x000000000492bfbf in folly::futures::detail::CoreBase::setCallback_(folly::Function<void (folly::futures::detail::CoreBase&, folly::Executor::KeepAlive<folly::Executor>&&, folly::exception_wrapper*)>&&, std::shared_ptr<folly::RequestContext>&&, folly::futures::detail::InlineContinuation) ()
#64 0x00000000041e983c in folly::futures::detail::Core<nebula::raftex::cpp2::AppendLogResponse>::setCallback<folly::futures::detail::FutureBase<T>::thenImplementation(F&&, R, folly::futures::detail::InlineContinuation) [with F = folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; R = folly::futures::detail::tryExecutorCallableResult<nebula::raftex::cpp2::AppendLogResponse, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, void>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> >(folly::futures::detail::FutureBase<nebula::raftex::cpp2::AppendLogResponse>::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> &&, std::shared_ptr<folly::RequestContext> &&, folly::futures::detail::InlineContinuation) (this=0x14613bee3d00,
    func=..., context=..., allowInline=folly::futures::detail::InlineContinuation::permit) at /root/src/nebula/build/third-party/install/include/folly/futures/detail/Core.h:586
#65 0x00000000041e39c0 in folly::futures::detail::FutureBase<nebula::raftex::cpp2::AppendLogResponse>::setCallback_<folly::futures::detail::FutureBase<T>::thenImplementation(F&&, R, folly::futures::detail::InlineContinuation) [with F = folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; R = folly::futures::detail::tryExecutorCallableResult<nebula::raftex::cpp2::AppendLogResponse, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, void>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> >(folly::futures::detail::FutureBase<nebula::raftex::cpp2::AppendLogResponse>::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> &&, folly::futures::detail::InlineContinuation) (this=0x2c941f49748, func=...,
    allowInline=folly::futures::detail::InlineContinuation::permit) at /root/src/nebula/build/third-party/install/include/folly/futures/Future-inl.h:303
#66 0x00000000041dd9de in folly::futures::detail::FutureBase<nebula::raftex::cpp2::AppendLogResponse>::thenImplementation<folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<--Type <RET> for more, q to quit, c to continue without paging--
F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, folly::futures::detail::tryExecutorCallableResult<nebula::raftex::cpp2::AppendLogResponse, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, void> >(folly::Future<nebula::raftex::cpp2::AppendLogResponse>::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> &&, folly::futures::detail::tryExecutorCallableResult<nebula::raftex::cpp2::AppendLogResponse, folly::Future<T>::thenTryInline(F&&) && [with F = apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>; apache::thrift::detail::si::CallbackPtr<F> = std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >; typename folly::drop_unit<typename folly::invoke_detail::traits<F>::result<>::value_type>::type = nebula::raftex::cpp2::AppendLogResponse]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>; T = nebula::raftex::cpp2::AppendLogResponse; typename folly::futures::detail::tryCallableResult<T, F>::value_type = folly::Unit]::<lambda(folly::Executor::KeepAlive<folly::Executor>&&, folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)>, void>, folly::futures::detail::InlineContinuation) (this=0x2c941f49748, func=..., allowInline=folly::futures::detail::InlineContinuation::permit)
    at /root/src/nebula/build/third-party/install/include/folly/futures/Future-inl.h:393
#67 0x00000000041d781b in folly::Future<nebula::raftex::cpp2::AppendLogResponse>::thenTryInline<apache::thrift::detail::si::async_tm(apache::thrift::ServerInterface*, apache::thrift::detail::si::CallbackPtr<F>, F&&) [with F = nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()>]::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> >(apache::thrift::detail::si::<lambda(folly::Try<nebula::raftex::cpp2::AppendLogResponse>&&)> &&) (this=0x2c941f49748, func=...) at /root/src/nebula/build/third-party/install/include/folly/futures/Future-inl.h:965
#68 0x00000000041d45b0 in apache::thrift::detail::si::async_tm<nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog(std::unique_ptr<apache::thrift::HandlerCallback<nebula::raftex::cpp2::AppendLogResponse> >, const nebula::raftex::cpp2::AppendLogRequest&)::<lambda()> >(apache::thrift::ServerInterface *, apache::thrift::detail::si::CallbackPtr, nebula::raftex::cpp2::RaftexServiceSvIf::<lambda()> &&) (si=0x250910812ab8, callback=..., f=...) at /root/src/nebula/build/third-party/install/include/thrift/lib/cpp2/GeneratedCodeHelper.h:1263
#69 0x00000000041d2acd in nebula::raftex::cpp2::RaftexServiceSvIf::async_tm_appendLog (this=0x250910812ab0, callback=..., p_req=...) at /root/src/nebula/build/src/interface/gen-cpp2/RaftexService.cpp:55
#70 0x00000000041dbbf3 in nebula::raftex::cpp2::RaftexServiceAsyncProcessor::process_appendLog<apache::thrift::CompactProtocolReader, apache::thrift::CompactProtocolWriter> (this=0x38aa18e3c460, req=..., serializedRequest=..., ctx=0x38aa18e19f10, eb=0x38aa18dc7000,
    tm=0x2509107fa1d0) at /root/src/nebula/build/src/interface/gen-cpp2/RaftexService.tcc:113
#71 0x00000000041df85c in apache::thrift::GeneratedAsyncProcessor::makeEventTaskForRequest<nebula::raftex::cpp2::RaftexServiceAsyncProcessor>(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>, apache::thrift::SerializedRequest&&, apache::thrift::Cpp2RequestContext*, folly::EventBase*, apache::thrift::concurrency::ThreadManager*, apache::thrift::RpcKind, void (nebula::raftex::cpp2::RaftexServiceAsyncProcessor::*)(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>, apache::thrift::SerializedRequest&&, apache::thrift::Cpp2RequestContext*, folly::EventBase*, apache::thrift::concurrency::ThreadManager*), nebula::raftex::cpp2::RaftexServiceAsyncProcessor*, apache::thrift::Tile*)::{lambda(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>)#1}::operator()(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>) (this=0x45fc3a8eec00, rq=...) at /root/src/nebula/build/third-party/install/include/thrift/lib/cpp2/async/AsyncProcessor.h:698
#72 0x00000000041f74d2 in folly::detail::function::FunctionTraits<void (std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>)>::callBig<apache::thrift::GeneratedAsyncProcessor::makeEventTaskForRequest<nebula::raftex::cpp2::RaftexServiceAsyncProcessor>(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>, apache::thrift::SerializedRequest&&, apache::thrift::Cpp2RequestContext*, folly::EventBase*, apache::thrift::concurrency::ThreadManager*, apache::thrift::RpcKind, void (nebula::raftex::cpp2::RaftexServiceAsyncProcessor::*)(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>, apache::thrift::SerializedRequest&&, apache::thrift::Cpp2RequestContext*, folly::EventBase*, apache::thrift::concurrency::ThreadManager*), nebula::raftex::cpp2::RaftexServiceAsyncProcessor*, apache::thrift::Tile*)::{lambda(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>)#1}>(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>&&, folly::detail::function::Data&) (args#0=..., p=...) at /root/src/nebula/build/third-party/install/include/folly/Function.h:385
#73 0x00000000045a8fa2 in apache::thrift::EventTask::run() ()
#74 0x00000000041d9bdf in apache::thrift::GeneratedAsyncProcessor::processInThread<nebula::raftex::cpp2::RaftexServiceAsyncProcessor>(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>, apache::thrift::SerializedRequest&&, apache::thrift::Cpp2RequestContext*, folly::EventBase*, apache::thrift::concurrency::ThreadManager*, apache::thrift::RpcKind, void (nebula::raftex::cpp2::RaftexServiceAsyncProcessor::*)(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>, apache::thrift::SerializedRequest&&, apache::thrift::Cpp2RequestContext*, folly::EventBase*, apache::thrift::concurrency::ThreadManager*), nebula::raftex::cpp2::RaftexServiceAsyncProcessor*)::{lambda()#1}::operator()() const (this=0x14613bea4490)
    at /root/src/nebula/build/third-party/install/include/thrift/lib/cpp2/async/AsyncProcessor.h:773
#75 0x00000000041eac5e in folly::detail::function::FunctionTraits<void ()>::callSmall<apache::thrift::GeneratedAsyncProcessor::processInThread<nebula::raftex::cpp2::RaftexServiceAsyncProcessor>(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>, apache::thrift::SerializedRequest&&, apache::thrift::Cpp2RequestContext*, folly::EventBase*, apache::thrift::concurrency::ThreadManager*, apache::thrift::RpcKind, void (nebula::raftex::cpp2::RaftexServiceAsyncProcessor::*)(std::unique_ptr<apache::thrift::ResponseChannelRequest, apache::thrift::RequestsRegistry::Deleter>, apache::thrift::SerializedRequest&&, apache::thrift::Cpp2RequestContext*, folly::EventBase*, apache::thrift::concurrency::ThreadManager*), nebula::raftex::cpp2::RaftexServiceAsyncProcessor*)::{lambda()#1}>(folly::detail::function::Data&) (p=...) at /root/src/nebula/build/third-party/install/include/folly/Function.h:371
#76 0x00000000045ae447 in virtual thunk to apache::thrift::concurrency::FunctionRunner::run() ()
#77 0x00000000046e3843 in apache::thrift::concurrency::ThreadManager::Impl::Worker::run() ()
#78 0x00000000046e749d in apache::thrift::concurrency::PthreadThread::threadMain(void*) ()
#79 0x00002509109ee609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#80 0x00005d55579c6293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  1. the thread in question is a Thrift woker thread, it was chosed to run the signal handler
  2. the signal handler cause the StorageServer to stop
  3. the StorageServer wait for ThriftServer to stop
  4. the Thrift wait for all the worker to stop

here we can a perfect dead lock of wait: worker -> signal handerl -> StorageServer -> ThriftServer -> worker, and that might be the reason why storage hang at exit.

Why the signal handler run in this worker thread? Well, the kernel just randomly choose a thread to execut a signal handler, excerpted from man page of signal(7):

A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked.

image

and we can see how a signal handler is executed:

image

To fix this problem, we strongly recommend that don't do heavy works like wiat(), joing() in signal handler, a signal handler should return as soon as possible, you should never make it wait.

Your Environments (required)

  • OS: uname -a
  • Compiler: g++ --version or clang++ --version
  • CPU: lscpu
  • Commit id (e.g. a3ffc7d8)

How To Reproduce(required)

Steps to reproduce the behavior:

  1. Step 1
  2. Step 2
  3. Step 3

Expected behavior

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

7 participants