-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Please check the FAQ documentation before raising an issue
Please check the FAQ documentation and old issues before raising an issue in case someone has asked the same question that you are asking.
Describe the bug (must be provided)
storage hand on terminate, this seems to be a bug introduced by #2843 .
Your Environments (must be provided)
- OS:
uname -a - Compliler:
g++ --versionorclang++ --version - CPU:
lscpu - Commit id (e.g.
a3ffc7d8)
How To Reproduce(must be provided)
start a cluster, execute kill ${PID_OF_STORAGED}, it will not exit, pstack shows:
...
thread: 0, lwp: 71129, type: 0
#0 0x00007ffff7e57376 in __pthread_cond_wait()+534 in /lib/x86_64-linux-gnu/libpthread.so.0 at futex-internal.h:183
#1 0x0000000004b187f0 in _ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE!()+15 in /root/src/nebula/build/bin/nebula-storaged
#2 0x0000000002b8c048 in _ZZN6nebula7storage16AdminTaskManager21handleUnreportedTasksEvENKUlvE_clEv!()+117 in /root/src/nebula/build/bin/nebula-storaged at AdminTaskManager.cpp:47
#3 0x0000000002bb07ee in std::__invoke_impl<void, nebula::storage::AdminTaskManager::handleUnreportedTasks()::<lambda()> >()+29 in /root/src/nebula/build/bin/nebula-storaged at invoke.h:60
#4 0x0000000002bb07a4 in std::__invoke<nebula::storage::AdminTaskManager::handleUnreportedTasks()::<lambda()> >()+29 in /root/src/nebula/build/bin/nebula-storaged at invoke.h:95
#5 0x0000000002bb0752 in std::thread::_Invoker<std::tuple<nebula::storage::AdminTaskManager::handleUnreportedTasks()::<lambda()> > >::_M_invoke<0>()+37 in /root/src/nebula/build/bin/nebula-storaged at thread:244
#6 0x0000000002bb0708 in std::thread::_Invoker<std::tuple<nebula::storage::AdminTaskManager::handleUnreportedTasks()::<lambda()> > >::operator()()+21 in /root/src/nebula/build/bin/nebula-storaged at thread:251
#7 0x0000000002bb0502 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<nebula::storage::AdminTaskManager::handleUnreportedTasks()::<lambda()> > > >::_M_run()+29 in /root/src/nebula/build/bin/nebula-storaged at thread:195
#8 0x0000000004b8f2b4 in execute_native_thread_routine!()+19 in /root/src/nebula/build/bin/nebula-storaged
#9 0x00007ffff7e50609 in start_thread()+216 in /lib/x86_64-linux-gnu/libpthread.so.0 at pthread_create.c:477
#10 0x00007ffff7d77293 in __GI___clone!()+66 in /lib/x86_64-linux-gnu/libc.so.6 at clone.S:95
...code around AdminTaskManager.cpp:47:
nebula/src/storage/admin/AdminTaskManager.cpp
Lines 39 to 54 in 9462d35
| void AdminTaskManager::handleUnreportedTasks() { | |
| using futTuple = | |
| std::tuple<JobID, TaskID, std::string, folly::Future<StatusOr<nebula::cpp2::ErrorCode>>>; | |
| if (env_ == nullptr) return; | |
| unreportedAdminThread_.reset(new std::thread([this] { | |
| bool ifAny = true; | |
| while (true) { | |
| std::unique_lock<std::mutex> lk(unreportedMutex_); | |
| if (!ifAny) unreportedCV_.wait(lk); | |
| ifAny = false; | |
| std::unique_ptr<kvstore::KVIterator> iter; | |
| auto kvRet = env_->adminStore_->scan(&iter); | |
| if (kvRet != nebula::cpp2::ErrorCode::SUCCEEDED || iter == nullptr) continue; | |
| std::vector<std::string> keys; | |
| std::vector<futTuple> futVec; | |
| for (; iter->valid(); iter->next()) { |
and the log shows:
I1007 07:58:03.991796 75386 NebulaStore.cpp:49] ~NebulaStore()
I1007 07:58:03.992041 75485 StorageServer.cpp:269] The admin service stopped
I1007 07:58:03.992151 75486 StorageServer.cpp:294] The internal storage service stopped
I1007 07:58:03.992262 75484 StorageServer.cpp:240] The storage service stopped
I1007 07:58:03.992362 75386 StorageDaemon.cpp:147] The storage Daemon stoppedseems that the thread unreportedAdminThread_ waiting on the cond var unreportedCV_ is blocking the whole process from exiting.
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Provide logs and configs, or any other context to trace the problem.