Summary:
Before the refactor in 583053e943922b97cc47f510ddf4ad137230754a, any failure in `tablet_server_main_impl` would result in a `LOG(FATAL)` and immediately crash the program. After that commit, failures in `StartServices` return a non-ok `Status` and call the destructor on the locally allocated Services struct. This results in destructors called on partially initialized services, exposing an existing check-fail bug in `CQLServer::Start` and `CQLServer::Shutdown`.
This diff refactors `StartServices` so it doesn't own the `Services` struct. This way on an early return service destructors are not invoked.
Instead we will hit the `LOG_AND_RETURN_FROM_MAIN_NOT_OK` wrapping the `StartServices` call, thus skipping any destructor invocations. The `LOG_AND_RETURN_FROM_MAIN_NOT_OK` contains a `LOG(FATAL)` invocation on non-ok `Status` as long as glogging is initialized, so it will immediately abort the program on non-ok `Status`.
Jira: DB-18651
Test Plan:
Check fail stack trace observed by qa, indicating shutdown calls via destructor on early return from `StartServices`:
```
F20251013 00:29:26 ../../src/yb/rpc/acceptor.cc:131] Check failed: sockets_to_add_.empty()
@ 0xaaaac089d5a4 google::LogMessage::SendToLog()
@ 0xaaaac089e4c4 google::LogMessage::Flush()
@ 0xaaaac089ec1c google::LogMessageFatal::~LogMessageFatal()
@ 0xaaaac1c68cec yb::rpc::Acceptor::Shutdown()
@ 0xaaaac1c688f8 yb::rpc::Acceptor::~Acceptor()
@ 0xaaaac1c97358 yb::rpc::Messenger::ShutdownAcceptor()
@ 0xaaaac1dc5540 yb::server::RpcAndWebServerBase::~RpcAndWebServerBase()
@ 0xaaaac26bd714 yb::cqlserver::CQLServer::~CQLServer()
@ 0xaaaac22e5db0 yb::tserver::Services::~Services()
@ 0xaaaac22e3cd4 yb::tserver::StartServices()
@ 0xaaaac084cf04 main
@ 0xffff82592340 __libc_start_call_main
@ 0xffff82592418 __libc_start_main_alias_2
@ 0xaaaac06d2034 (unknown)
```
With this diff[0], and this patch, the std error for the tserver on startup is:
```
% yb-ctl create --rf 1 --tserver_flags "TEST_webserver_fail_on_startup=true"
```
```
F1017 10:49:56.988938 4164516544 tablet_server_main_impl.cc:480] Illegal state (yb/server/webserver.cc:463): foo
Fatal failure details written to /Users/zdrudi/yugabyte-data/node-1/disk-1/yb-data/tserver/logs/yb-tserver.FATAL.details.2025-10-17T10_49_56.pid9311.txt
F20251017 10:49:56 ../../src/yb/tserver/tablet_server_main_impl.cc:480] Illegal state (yb/server/webserver.cc:463): foo
@ 0x10a5d6510 google::LogDestination::LogToSinks()
@ 0x10a5d5654 google::LogMessage::SendToLog()
@ 0x10a5d6034 google::LogMessage::Flush()
@ 0x10a5d9f3c google::LogMessageFatal::~LogMessageFatal()
@ 0x10a5d6ee0 google::LogMessageFatal::~LogMessageFatal()
@ 0x104f3d654 yb::tserver::TabletServerMain()
@ 0x1903020e0 start
*** Check failure stack trace: ***
@ 0x10a5d5814 google::LogMessage::SendToLog()
@ 0x10a5d6034 google::LogMessage::Flush()
@ 0x10a5d9f3c google::LogMessageFatal::~LogMessageFatal()
@ 0x10a5d6ee0 google::LogMessageFatal::~LogMessageFatal()
@ 0x104f3d654 yb::tserver::TabletServerMain()
@ 0x1903020e0 start
```
On master and the same diff [0], there is no std err and we see in the INFO logs messages about destructors being called:
```
I1017 10:54:45.258898 4164516544 service_pool.cc:146] yb.stateful_service.PgAutoAnalyzeService: yb::rpc::ServicePoolImpl created at 0x1029dc9a0
I1017 10:54:45.258975 4164516544 webserver.cc:365] Starting webserver on 127.0.0.1:9000
I1017 10:54:45.258980 4164516544 webserver.cc:374] Document root: /Users/zdrudi/code/yugabyte-db/www
I1017 10:54:45.258991 4164516544 webserver.cc:360] Webserver listen spec is 127.0.0.1:9000
I1017 10:54:45.259048 4164516544 tablet_server.cc:777] TabletServer shutting down...
```
[0]
```
% git diff
diff --git a/src/yb/server/webserver.cc b/src/yb/server/webserver.cc
index ce660c3b88..6c9279bcee 100644
--- a/src/yb/server/webserver.cc
+++ b/src/yb/server/webserver.cc
@@ -117,6 +117,10 @@ DEFINE_test_flag(bool, use_custom_varz, false,
"Cause /varz endpoint of TServers to output for selected gflags their values at "
"startup rather than their current values.");
+DEFINE_test_flag(
+ bool, webserver_fail_on_startup, false,
+ "If set fail webserver startup");
+
namespace yb {
using std::string;
@@ -455,6 +459,9 @@ Status Webserver::Impl::Start() {
// default callback. That method unpacks the pointer to this and calls the real
// callback.
context_ = sq_start(&callbacks, reinterpret_cast<void*>(this), &options[0]);
+ if (FLAGS_TEST_webserver_fail_on_startup) {
+ return STATUS(IllegalState, "foo");
+ }
// Restore the child signal handler so wait() works properly.
signal(SIGCHLD, sig_chld);
```
Reviewers: bkolagani, hsunder, asrivastava
Reviewed By: asrivastava
Subscribers: ybase, rthallam
Differential Revision: https://phorge.dev.yugabyte.com/D47478