Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage crash, possible mem corruption #3192

Closed
kikimo opened this issue Oct 23, 2021 · 3 comments
Closed

storage crash, possible mem corruption #3192

kikimo opened this issue Oct 23, 2021 · 3 comments
Assignees
Labels
type/bug Type: something is unexpected
Milestone

Comments

@kikimo
Copy link
Contributor

kikimo commented Oct 23, 2021

Please check the FAQ documentation before raising an issue

Please check the FAQ documentation and old issues before raising an issue in case someone has asked the same question that you are asking.

Describe the bug (must be provided)

storage crash during stress test.

Your Environments (must be provided)

How To Reproduce(must be provided)

Steps to reproduce the behavior:

  1. a nebula cluster with 3storage + 1meta + 1graph
  2. nebula-stresser with 1024 concurrent clients inserting edges + leader change
  3. wait until on storaged crash(in about 15 - 30min)

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

stderr output:

pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
pure virtual method called
terminate called recursively
*** Aborted at 1634788446 (Unix time, try 'date -d @1634788446') ***
*** Signal 6 (SIGABRT) (0x395d71) received by PID 3759473 (pthread TID 0x7f506e9ff700) (linux TID 3759501) (maybe from PID 3759473, UID 0) (code: -6), stack trace: ***
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN5folly10symbolizer17getStackTraceSafeEPmm+0x31)[0x4665941]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN5folly10symbolizer21SafeStackTracePrinter15printStackTraceEb+0x1b)[0x465ae7b]
/data/src/wwl/nebula/build/bin/nebula-storaged[0x4658bd2]
/lib64/libpthread.so.0(+0xf62f)[0x7f5079b3c62f]
/lib64/libc.so.6(gsignal+0x37)[0x7f5079795387]
/lib64/libc.so.6(abort+0x147)[0x7f5079796a77]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0xec)[0x4bbc34c]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN10__cxxabiv111__terminateEPFvvE+0x5)[0x4b29fb5]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZSt9terminatev+0x10)[0x4b2a000]
/data/src/wwl/nebula/build/bin/nebula-storaged(__cxa_pure_virtual+0x1e)[0x4b2878e]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6nebula7storage27ChainAddEdgesProcessorLocal27replaceNullWithDefaultValueERNS0_4cpp215AddEdgesRequestE+0x8f3)[0x2c9f41f]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6nebula7storage27ChainAddEdgesProcessorLocal14prepareRequestERKNS0_4cpp215AddEdgesRequestE+0x240)[0x2c9b282]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6nebula7storage27ChainAddEdgesProcessorLocal7processERKNS0_4cpp215AddEdgesRequestE+0x20)[0x2c99eca]
/data/src/wwl/nebula/build/bin/nebula-storaged[0x2c95587]
/data/src/wwl/nebula/build/bin/nebula-storaged[0x2c955f8]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6nebula7storage27ChainAddEdgesGroupProcessor7processERKNS0_4cpp215AddEdgesRequestE+0xa9)[0x2c95115]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6nebula7storage26GraphStorageServiceHandler20future_chainAddEdgesERKNS0_4cpp215AddEdgesRequestE+0x53)[0x2ac23a3]
/data/src/wwl/nebula/build/bin/nebula-storaged[0x320f6b9]
/data/src/wwl/nebula/build/bin/nebula-storaged[0x321df57]
/data/src/wwl/nebula/build/bin/nebula-storaged[0x3214fa0]
/data/src/wwl/nebula/build/bin/nebula-storaged[0x3215193]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6nebula7storage4cpp223GraphStorageServiceSvIf22async_tm_chainAddEdgesESt10unique_ptrIN6apache6thrift15HandlerCallbackINS1_12ExecResponseEEESt14default_deleteIS8_EERKNS1_15AddEdgesRequestE+0x65)[0x320f729]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6nebula7storage4cpp233GraphStorageServiceAsyncProcessor21process_chainAddEdgesIN6apache6thrift20BinaryProtocolReaderENS5_20BinaryProtocolWriterEEEvSt10unique_ptrINS5_22ResponseChannelRequestENS5_16RequestsRegistry7DeleterEEONS5_17SerializedRequestEPNS5_18Cpp2RequestContextEPN5folly9EventBaseEPNS5_11concurrency13ThreadManagerE+0x24b)[0x3222a23]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZZN6apache6thrift23GeneratedAsyncProcessor23makeEventTaskForRequestIN6nebula7storage4cpp233GraphStorageServiceAsyncProcessorEEESt10shared_ptrINS0_9EventTaskEESt10unique_ptrINS0_22ResponseChannelRequestENS0_16RequestsRegistry7DeleterEEONS0_17SerializedRequestEPNS0_18Cpp2RequestContextEPN5folly9EventBaseEPNS0_11concurrency13ThreadManagerENS0_7RpcKindEMT_FvSE_SG_SI_SL_SO_EPSQ_PNS0_4TileEENUlSE_E_clESE_+0x1c0)[0x322eb86]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN5folly6detail8function14FunctionTraitsIFvSt10unique_ptrIN6apache6thrift22ResponseChannelRequestENS5_16RequestsRegistry7DeleterEEEE7callBigIZNS5_23GeneratedAsyncProcessor23makeEventTaskForRequestIN6nebula7storage4cpp233GraphStorageServiceAsyncProcessorEEESt10shared_ptrINS5_9EventTaskEES9_ONS5_17SerializedRequestEPNS5_18Cpp2RequestContextEPNS_9EventBaseEPNS5_11concurrency13ThreadManagerENS5_7RpcKindEMT_FvS9_SN_SP_SR_SU_EPSW_PNS5_4TileEEUlS9_E_EEvOS9_RNS1_4DataE+0x43)[0x325cfcc]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6apache6thrift9EventTask3runEv+0x39)[0x4214649]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZZN6apache6thrift23GeneratedAsyncProcessor15processInThreadIN6nebula7storage4cpp233GraphStorageServiceAsyncProcessorEEEvSt10unique_ptrINS0_22ResponseChannelRequestENS0_16RequestsRegistry7DeleterEEONS0_17SerializedRequestEPNS0_18Cpp2RequestContextEPN5folly9EventBaseEPNS0_11concurrency13ThreadManagerENS0_7RpcKindEMT_FvSB_SD_SF_SI_SL_EPSN_ENKUlvE_clEv+0x24)[0x321ee68]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN5folly6detail8function14FunctionTraitsIFvvEE9callSmallIZN6apache6thrift23GeneratedAsyncProcessor15processInThreadIN6nebula7storage4cpp233GraphStorageServiceAsyncProcessorEEEvSt10unique_ptrINS7_22ResponseChannelRequestENS7_16RequestsRegistry7DeleterEEONS7_17SerializedRequestEPNS7_18Cpp2RequestContextEPNS_9EventBaseEPNS7_11concurrency13ThreadManagerENS7_7RpcKindEMT_FvSI_SK_SM_SO_SR_EPST_EUlvE_EEvRNS1_4DataE+0x1f)[0x3248203]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6apache6thrift11concurrency14FunctionRunner3runEv+0x78)[0x421b6b8]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6apache6thrift11concurrency13ThreadManager4Impl6Worker3runEv+0x141)[0x434aee1]
/data/src/wwl/nebula/build/bin/nebula-storaged(_ZN6apache6thrift11concurrency13PthreadThread10threadMainEPv+0xb7)[0x434eea7]
/lib64/libpthread.so.0(+0x7ea4)[0x7f5079b34ea4]
/lib64/libc.so.6(clone+0x6c)[0x7f507985d9fc]
(safe mode, symbolizer not available)
@kikimo kikimo added the type/bug Type: something is unexpected label Oct 23, 2021
@Sophie-Xie Sophie-Xie added this to the v3.0.0 milestone Oct 25, 2021
@critical27
Copy link
Contributor

A possible related problem: https://discuss.nebula-graph.com.cn/t/topic/6200.

@liuyu85cn
Copy link
Contributor

liuyu85cn commented Oct 25, 2021

Not about toss, this can be repro in normal insert, if leader changed quickly.

#0  0x0000000002b4fb2b in __gnu_cxx::__normal_iterator<nebula::Expression**, std::vector<nebula::Expression*, std::allocator<nebula::Expression*> > >::__normal_iterator (this=0x7f02a36fabd8, __i=<error reading variable>) at /data/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/stl_iterator.h:783
#1  0x0000000002b49254 in std::vector<nebula::Expression*, std::allocator<nebula::Expression*> >::begin (this=0x0) at /data/vesoft/toolset/gcc/7.5.0/include/c++/7.5.0/bits/stl_vector.h:564
#2  0x0000000003c59299 in nebula::FunctionCallExpression::clone (this=0x7f02674a8f00) at /root/src/wwl/nebula-local/src/common/expression/FunctionCallExpression.h:76
#3  0x0000000003e19604 in nebula::RowWriterV2::checkUnsetFields (this=0x7f02a36fadd0) at /root/src/wwl/nebula-local/src/codec/RowWriterV2.cpp:787
#4  0x0000000003e19e8b in nebula::RowWriterV2::finish (this=0x7f02a36fadd0) at /root/src/wwl/nebula-local/src/codec/RowWriterV2.cpp:885
#5  0x0000000002afd75b in nebula::storage::BaseProcessor<nebula::storage::cpp2::ExecResponse>::encodeRowVal (this=0x7f02a2226200, schema=0x7f02920283f0, propNames=..., props=..., wRet=@0x7f02a36faf9c: nebula::WriteResult::SUCCEEDED) at /root/src/wwl/nebula-local/src/storage/BaseProcessor-inl.h:182
#6  0x0000000002b264e8 in nebula::storage::AddEdgesProcessor::doProcess (this=0x7f02a2226200, req=...) at /root/src/wwl/nebula-local/src/storage/mutate/AddEdgesProcessor.cpp:121
#7  0x0000000002b255a4 in nebula::storage::AddEdgesProcessor::process (this=0x7f02a2226200, req=...) at /root/src/wwl/nebula-local/src/storage/mutate/AddEdgesProcessor.cpp:56
#8  0x0000000002ae1d57 in nebula::storage::GraphStorageServiceHandler::future_addEdges (this=0x7f0278400010, req=...) at /root/src/wwl/nebula-local/src/storage/GraphStorageServiceHandler.cpp:93
#9  0x000000000322d402 in nebula::storage::cpp2::GraphStorageServiceSvIf::<lambda()>::operator()(void) const (__closure=0x7f02a36fb7b0) at /root/src/wwl/nebula-local/build/src/interface/gen-cpp2/GraphStorageService.cpp:99
#10 0x000000000323a838 in folly::makeFutureWith<nebula::storage::cpp2::GraphStorageServiceSvIf::async_tm_addEdges(std::unique_ptr<apache::thrift::HandlerCallback<nebula::storage::cpp2::ExecResponse> >, const nebula::storage::cpp2::AddEdgesRequest&)::<lambda()> >(nebula::storage::cpp2::GraphStorageServiceSvIf::<lambda()> &&) (func=...) at /root/src/wwl/nebula-local/build/third-party/install/include/folly/futures/Future-inl.h:1241
#11 0x000000000

we got a corrupted memory while debugging the core, look at the "start" "finish" and "end"

(gdb) p args_->args_
$1 = {<std::_Vector_base<nebula::Expression*, std::allocator<nebula::Expression*> >> = {_M_impl = {<std::allocator<nebula::Expression*>> = {<__gnu_cxx::new_allocator<nebula::Expression*>> = {<No data fields>}, <No data fields>}, _M_start = 0x45f90b0 <folly::IOBuf::freeInternalBuf(void*, void*)>,
      _M_finish = 0x7f02674a8f00, _M_end_of_storage = 0x0}}, <No data fields>}

@Sophie-Xie
Copy link
Contributor

#3553

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

5 participants