Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Every tserver in universe crashes when triggering a "live queries" #7033

Open
tylarb opened this issue Jan 29, 2021 · 5 comments
Open

Every tserver in universe crashes when triggering a "live queries" #7033

tylarb opened this issue Jan 29, 2021 · 5 comments
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@tylarb
Copy link
Contributor

tylarb commented Jan 29, 2021

Jira Link: DB-4880
This issue occurred when platform was on 2.4 and the server was on 2.2.

Before updating the tservers to 2.4, visiting the "live queries" tab caused every tserver in the universe (in this case, three) to crash - causing a full database outage.

BT from one of the cores:

(gdb) bt
#0 0x00007f3fee36e198 in yb::rpc::ConnectionContextWithCallId::DumpPB (this=this@entry=0xdaa1a20, req=...,
resp=resp@entry=0xb2dd620) at ../../src/yb/rpc/rpc_with_call_id.cc:32
#1 0x00007f3ff8044b49 in yb::cqlserver::CQLConnectionContext::DumpPB (this=0xdaa1a20, req=..., resp=0xb2dd620)
at ../../src/yb/yql/cql/cqlserver/cql_rpc.cc:139
#2 0x00007f3fee325dd5 in yb::rpc::Connection::DumpPB (this=0x31c3f330, req=..., resp=0xb2dd620)
at ../../src/yb/rpc/connection.cc:366
#3 0x00007f3fee3535c9 in operator() (reactor=0x5b7af00, __closure=0x12e8bcc0) at ../../src/yb/rpc/reactor.cc:294
#4 yb::rpc::RunFunctionTask<yb::rpc::Reactor::DumpRunningRpcs(const yb::rpc::DumpRunningRpcsRequestPB&, yb::rpc::DumpRunningRpcsResponsePB*)::<lambda(yb::rpc::Reactor*)> >::Run(yb::rpc::Reactor *) (this=0x12e8bc90, reactor=0x5b7af00)
at ../../src/yb/rpc/reactor.cc:923
#5 0x00007f3fee3582c3 in yb::rpc::Reactor::AsyncHandler (this=0x5b7af00, watcher=..., revents=<optimized out>)
at ../../src/yb/rpc/reactor.cc:357
#6 0x00007f3feb8cdb6b in ev_invoke_pending ()
from /home/yugabyte/yb-software/yugabyte-2.2.5.0-b2-centos-x86_64/lib/yb-thirdparty/libev.so.4
#7 0x00007f3feb8d1c7a in ev_run ()
from /home/yugabyte/yb-software/yugabyte-2.2.5.0-b2-centos-x86_64/lib/yb-thirdparty/libev.so.4
#8 0x00007f3fee3530bc in run (flags=0, this=0x5b7af98)
at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20200707012818-49ca690c1f-centos/installed/common/include/ev++.h:211
#9 yb::rpc::Reactor::RunThread (this=0x5b7af00) at ../../src/yb/rpc/reactor.cc:482
#10 0x00007f3fecbc903f in operator() (this=0x6af1c78)
at /home/yugabyte/yb-software/yugabyte-2.2.5.0-b2-centos-x86_64/linuxbrew-xxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#11 yb::Thread::SuperviseThread (arg=0x6af1c20) at ../../src/yb/util/thread.cc:759
#12 0x00007f3fe73e5694 in start_thread (arg=0x7f3fbe2c6700) at pthread_create.c:333
#13 0x00007f3fe6b2241d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Core is available.

@mbautin
Copy link
Collaborator

mbautin commented Jan 29, 2021

void ConnectionContextWithCallId::DumpPB(const DumpRunningRpcsRequestPB& req,
                                         RpcConnectionPB* resp) {
  for (const auto &entry : calls_being_handled_) {
    entry.second->DumpPB(req, resp->add_calls_in_flight());
  }
}

Should we be checking for some null pointers here?

@bmatican
Copy link
Contributor

was talking to @spolitov last week (since this issue is likely a dup of #6869)

neither of us could understand though how we'd end up with nulls in there, or when objects could become null...for example, we could be checking for null, then the objects could get invalidated after the check

@tedyu
Copy link
Contributor

tedyu commented Jan 29, 2021

How about adding the following check ?

diff --git a/src/yb/rpc/rpc_with_call_id.cc b/src/yb/rpc/rpc_with_call_id.cc
index 58ade7118..c3d0f5b7b 100644
--- a/src/yb/rpc/rpc_with_call_id.cc
+++ b/src/yb/rpc/rpc_with_call_id.cc
@@ -29,7 +29,12 @@ ConnectionContextWithCallId::ConnectionContextWithCallId() {}
 void ConnectionContextWithCallId::DumpPB(const DumpRunningRpcsRequestPB& req,
                                          RpcConnectionPB* resp) {
   for (const auto &entry : calls_being_handled_) {
-    entry.second->DumpPB(req, resp->add_calls_in_flight());
+    auto call = entry.second;
+    if (call) {
+      call->DumpPB(req, resp->add_calls_in_flight());
+    } else {
+      LOG(WARNING) << "call ID " << entry.first << " doesn't have call";
+    }
   }
 }

@spolitov
Copy link
Contributor

Should we be checking for some null pointers here?

I don't think so.
We insert in calls_being_handled_ only here:

Status ConnectionContextWithCallId::Store(InboundCall* call) {
  uint64_t call_id = ExtractCallId(call);
  if (!calls_being_handled_.emplace(call_id, call).second) {

So entry.second cannot be null.

@rthallamko3 rthallamko3 added the area/docdb YugabyteDB core features label Jan 3, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage and removed status/awaiting-triage Issue awaiting triage labels Jan 3, 2023
@rthallamko3
Copy link
Contributor

@tylarb , Do you know if anyone else reported this issue. Not sure if this is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

7 participants