Update build not to be hard-coded to require brew on linux #1

hengestone · 2017-11-06T23:38:44Z

Use cmake to detect library dependencies and fail if not found.

mbautin · 2017-11-06T23:55:56Z

@hengestone Great point, we'll definitely look into this! The reason to introduce Linuxbrew was that CentOS is our primary target production platform, but the default toolchain on CentOS 7 is quite outdated. And it turns out that Linuxbrew has a pretty recent Boost version as well. Building without Linuxbrew on CentOS might still be challenging, but we might be able to build without it e.g. on a recent version of Ubuntu. What platform are you trying to build on, and what does your toolchain look like, by the way?

hengestone · 2017-11-07T00:03:27Z

Totally understood - there just should be an alternative for environments that are more up to date and able to install newer packages. I'm on Ubuntu (17.10), which is used by a large majority of cloud platforms. See e.g. zdnet or G+

I should add: congratulations on the public Beta, wish you lots of success!

… memtable Summary: There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message. ``` #0 operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178 #1 std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857 #2 0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267 #3 rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...) at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259 #4 0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143 #5 0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586 #6 0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816 #7 0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838 #8 0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133 #9 0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157 #10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333 ``` Test Plan: Jenkins Reviewers: hector, sergei Reviewed By: hector, sergei Subscribers: sergei, bogdan, bharat, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D4044

mbautin · 2018-03-19T19:07:55Z

@hengestone We have made changes to our build script and YugaByte DB should now build on Ubuntu 17.10. Please give it a try when you have a chance and let us know how it goes! By the way, at this point, we're still doing all of our official testing and releases on CentOS, but the resulting Linux package works on both CentOS and Ubuntu.

hengestone · 2018-03-23T16:39:54Z

Thanks!

The build process gets a lot further now. I'll file a separate bug for the compile failure.

…EANUP Summary: We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service. This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload. ```#0 yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0, resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222 #1 0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...) at ../../src/yb/tserver/tablet_service.cc:431 #2 0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267 #3 0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214``` Changed LookupTablePeerOrRespond to return complete result using return value. Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable. Reviewers: robert, hector, mikhail, kannan Reviewed By: mikhail, kannan Subscribers: kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D5772

This reverts commit 89ef904.

Summary: Enable building on gcc 7 on Ubuntu 17.10 - Extending the FALLTHROUGH_INTENDED macro appropriately - Removing some unused code that gcc 7 complains about - Adding "#include <functional>" to a bunch of files because otherwise gcc 7 does not find `std::function`. - Add the thirdparty library directory to rpath of libraries in that directory. Without this glog fails to find gflags. Not clear why this only happens on Ubuntu 17.10 and not on CentOS with Linuxbrew. - Fixing the clean_thirdparty.sh script that apparently has been incorrect since the Python rewrite of the thirdparty framework. This will fix yugabyte/yugabyte-db#1. Test Plan: Jenkins Reviewers: hector, bharat, bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D4395

Restyle docs to match web site

updated the architecture section

…data Summary: Originally issue has been discovered as `RaftConsensusITest.TestAddRemoveVoter` random test failure in TSAN mode due to a data race. ``` WARNING: ThreadSanitizer: data race (pid=11050) 1663 [ts-2] Write of size 8 at 0x7b4c000603a8 by thread T51 (mutexes: write M3613): ... 1674 [ts-2] #10 yb::tablet::KvStoreInfo::LoadTablesFromPB(google::protobuf::RepeatedPtrField<yb::tablet::TableInfoPB>, string) src/yb/tablet/tablet_metadata.cc:170 1675 [ts-2] #11 yb::tablet::KvStoreInfo::LoadFromPB(yb::tablet::KvStoreInfoPB const&, string) src/yb/tablet/tablet_metadata.cc:189:10 1676 [ts-2] #12 yb::tablet::RaftGroupMetadata::LoadFromSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) src/yb/tablet/tablet_metadata.cc:508:5 1677 [ts-2] #13 yb::tablet::RaftGroupMetadata::ReplaceSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) src/yb/tablet/tablet_metadata.cc:545:3 1678 [ts-2] #14 yb::tserver::RemoteBootstrapClient::Finish() src/yb/tserver/remote_bootstrap_client.cc:486:3 ... Previous read of size 4 at 0x7b4c000603a8 by thread T16: 1697 [ts-2] #0 yb::tablet::RaftGroupMetadata::schema_version() const src/yb/tablet/tablet_metadata.h:251:34 1698 [ts-2] #1 yb::tserver::TSTabletManager::CreateReportedTabletPB(std::__1::shared_ptr<yb::tablet::TabletPeer> const&, yb::master::ReportedTabletPB*) src/yb/tserver/ts_tablet_manager.cc:1323:71 1699 [ts-2] #2 yb::tserver::TSTabletManager::GenerateIncrementalTabletReport(yb::master::TabletReportPB*) src/yb/tserver/ts_tablet_manager.cc:1359:5 1700 [ts-2] #3 yb::tserver::Heartbeater::Thread::TryHeartbeat() src/yb/tserver/heartbeater.cc:371:32 1701 [ts-2] #4 yb::tserver::Heartbeater::Thread::DoHeartbeat() src/yb/tserver/heartbeater.cc:531:19 ``` The reason is that although `RaftGroupMetadata::schema_version()` is getting `TableInfo` pointer from `primary_table_info()` under mutex lock, but then it accesses its field without lock. Added `RaftGroupMetadata::primary_table_info_guarded()` private method which returns a pair of `TableInfo*` and `std::unique_lock` and used it in `RaftGroupMetadata::schema_version()` and other `RaftGroupMetadata` functions accessing primary table info fields. Test Plan: `ybd tsan --sj --cxx-test integration-tests_raft_consensus-itest --gtest_filter RaftConsensusITest.TestAddRemoveVoter -n 1000` Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D6813

…ed to the earlier commit 864e72b Original commit message: ENG-2793 Do not fail when deciding if we can flush an empty immutable memtable Summary: There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message. ``` #0 operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178 #1 std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857 #2 0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267 #3 rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...) at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259 #4 0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143 #5 0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586 #6 0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816 #7 0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838 #8 0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133 #9 0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157 #10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333 ``` Test Plan: Jenkins Reviewers: hector, sergei Reviewed By: hector, sergei Subscribers: sergei, bogdan, bharat, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D4044

…ed to the earlier commit 566d6d2 Original commit message: ENG-4240: #613: Fix checking of tablet presence during transaction CLEANUP Summary: We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service. This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload. ```#0 yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0, resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222 #1 0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...) at ../../src/yb/tserver/tablet_service.cc:431 #2 0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267 #3 0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214``` Changed LookupTablePeerOrRespond to return complete result using return value. Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable. Reviewers: robert, hector, mikhail, kannan Reviewed By: mikhail, kannan Subscribers: kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D5772

… memtable Summary: There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message. ``` #0 operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178 yugabyte#1 std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857 yugabyte#2 0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267 yugabyte#3 rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...) at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259 yugabyte#4 0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143 yugabyte#5 0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586 yugabyte#6 0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816 yugabyte#7 0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838 yugabyte#8 0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133 yugabyte#9 0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157 yugabyte#10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333 ``` Test Plan: Jenkins Reviewers: hector, sergei Reviewed By: hector, sergei Subscribers: sergei, bogdan, bharat, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D4044 Note: This commit provides additional functionality that is logically related to the earlier commit yugabyte@864e72b and supersedes the commit yugabyte@2932b0a

…ction CLEANUP Summary: We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service. This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload. ```#0 yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0, resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222 yugabyte#1 0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...) at ../../src/yb/tserver/tablet_service.cc:431 yugabyte#2 0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267 yugabyte#3 0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214``` Changed LookupTablePeerOrRespond to return complete result using return value. Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable. Reviewers: robert, hector, mikhail, kannan Reviewed By: mikhail, kannan Subscribers: kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D5772 Note: This commit provides additional functionality that is logically related to the earlier commit yugabyte@566d6d2 and supersedes the commit yugabyte@63bae60

@tverona

Summary: Adds MyDatabaseId to the T-server cache key to address an issue caused by different databases sharing T-server cache entries. When a Postgres backend starts, it prefetches these 3 shared tables: ``` YbRegisterTable(prefetcher, YB_PFETCH_TABLE_PG_AUTH_MEMBERS); YbRegisterTable(prefetcher, YB_PFETCH_TABLE_PG_DATABASE); YbRegisterTable(prefetcher, YB_PFETCH_TABLE_PG_DB_ROLE_SETTINGS); ``` These tables are then cached in the T-server cache, which is keyed by the database OID and the OIDs of these tables. Because these are shared tables, the OID of the template1 database is used. As a result, when another backend process starts up, it will issue the same prefetch request, which will result in a hit in the T-server cache (assuming the catalog version has not changed). Here is how the issue manifests in detail by @tverona, (requires D28071, ysql_enable_read_request_caching=true): ### 1. Start with just `yugabytedb`. Connect to `yugabytedb` for the first time. * **a.** Relcacheinit file is built. So we preload a bunch of tables from master. * **b.** Create tserver cache entry for those tables (which includes `pg_database`). Key contains `yugabytedb` oid (since that’s part of the request). * **c.** Create `db1`. ### 2. Connect to `db1` for the first time. * **a.** Same flow as #1 above - we create a new relcacheinit file for `db1`. * **b.** We create another tserver cache entry (might be more than one, but just simplifying) with key containing `db1` oid. ### 3. Connect to `db1` for the 2nd time. * **a. With D28071:** - **i.** Relcache file is not built, since cache is not invalidated. - **ii.** We fetch the 3 tables (including `pg_database`) from master and create a new tserver cache entry, with key include `template0` dbid(?). Values include `db1`. * **b. Without D28071:** - **i.** We preload a bunch of tables for `db1`. We match on tserver cache entry from 2.b. We do not hit master. * **c.** Create `db2`. ### 4. Connect to `db2` for the first time. * **a.** Same flow as #1 above - we create a new relcacheinit file for `db2`. * **b.** We create another tserver cache entry (might be more than one, but just simplifying) with key containing `db2` oid. ### 5. Connect to `db2` for the 2nd time. * **a. With D28071:** - **i.** Relcache file is not built, since cache is not invalidated. - **ii.** We fetch the 3 tables (including `pg_database`) from master and match on the key in 3.a.ii. We do not hit master. We get back `pg_database` containing entries for `db1` but not `db2`. - **iii.** We fail later in `CheckMyDatabase`. * **b. Without the diff:** - **i.** We preload a bunch of tables for `db2`. We match on tserver cache entry from 4.b. We do not hit master. By always including MyDatabaseId in the cache key, we avoid serving stale versions of shared relations to different databases. **Upgrade/Rollback safety:** Only PG to T-Server RPCs are changed. Jira: DB-8163 Test Plan: # Connect to yugabyte # Connect to yugabyte # Create db1 # Connect to db1 # Connect to db1 <-- fails before this change with D28071, ysql_enable_read_request_caching=true Reviewers: myang, dmitry Reviewed By: dmitry Subscribers: ybase, yql, tverona Differential Revision: https://phorge.dev.yugabyte.com/D28945

…nnections Summary: This diff fixes two issues - - **PLAT-11176**: Previously, we were only passing YBA's PEM trust store from the custom CA trust store for `play.ws.ssl` TLS handshakes. Consequently, when we attempted to upload multiple CA certificates to YBA's trust store, it resulted in SSL handshake failures for the previously uploaded certificates. With this update, we have included YBA's Java trust store as well. - **PLAT-11170**: There was an issue with deletion of CA cert from YBA's trust store. Specifically, when we had uploaded one certificate chain and another certificate that only contained the root of the previously uploaded certificate chain, the deletion of the latter was failing. This issue has been resolved in this diff. Test Plan: **PLAT-11170** - Uploaded the root cert to YBA's trust store. - Created a certificate chain using the root certificate mentioned above and also uploaded it. - Verified that deletion of cert uploaded in #1 was successful. **PLAT-11176** - Created HA setup with two standup portals. - Each portal is using it's own custom CA certs. - Uploaded both the cert chains to YBA's trust store. - Verified that the backup is successful on both the standby setups configured. Reviewers: amalyshev Reviewed By: amalyshev Subscribers: yugaware Differential Revision: https://phorge.dev.yugabyte.com/D29985

Summary: One of the first tasks that are kicked off during an edit universe is for DiskResizing. This change makes the createResizeDiskTask function idempotent. It will only create the disk resize tasks if the size specified is different from the current volume on the pod. Test Plan: Tested by making task abortable and retryable, retried the edit kubernetes task after aborting disk resize in the middle. ``` YW 2023-11-03T06:04:18.533Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from EditKubernetesUniverse in TaskPool-6 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out5549963527175565003tmp, stderr=/tmp/shell_process_err5819548201875528501tmp YW 2023-11-03T06:04:19.095Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 508 ms ] YW 2023-11-03T06:04:19.104Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Incrementing RF for us-west1-a to: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Number of nodes in us-west1-a: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCheckVolumeExpansion YW 2023-11-03T06:04:19.105Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCheckVolumeExpansion details= {"platformVersion":"2.21.0.0-PRE_RELEASE","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"newNamingStyle":true,"namespace":"yb-admin-test1","providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","helmReleaseName":"ybtest1-us-west1-a-twed"} YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #0: KubernetesVolumeInfo YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from AbstractTaskBase in TaskPool-6 - Executor name: task YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.110Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"STS_DELETE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.111Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #1: ResizingDisk YW 2023-11-03T06:04:19.113Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning YW 2023-11-03T06:04:19.115Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.117Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"PVC_EXPAND_SIZE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.119Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #2: ResizingDisk YW 2023-11-03T06:04:19.120Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning ... ``` Verified disk size was increased ``` [centos@dev-server-anijhawan-4 managed]$ kubectl -n yb-admin-test1 get pvc ybtest1-us-west1-b-uwed-datadir0-ybtest1-us-west1-b-uwed-yb-tserver-0 ybtest1-us-west1-a-twed-datadir0-ybtest1-us-west1-a-twed-yb-tserver-0 ybtest1-us-west1-c-vwed-datadir0-ybtest1-us-west1-c-vwed-yb-tserver-0 -o yaml | grep storage volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi ``` Retry logs we can see function was invoke but task creation was skipped. ``` YW 2023-11-03T06:07:10.173Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TaskExecutor in TaskPool-7 - Invoking run() of task EditKubernetesUniverse(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:07:10.173Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from CustomerTaskController in application-akka.actor.default-dispatcher-2292 - Saved task uuid 66611664-a25f-4ad2-93aa-e40a7db67654 in customer tasks table for target 347eb7be-88b5-44ed-b519-1052487e5ced:test1 YW 2023-11-03T06:07:10.322Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TransactionUtil in TaskPool-7 - Trying(1)... YW 2023-11-03T06:07:10.333Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from UniverseTaskBase in TaskPool-7 - Cancelling any active health-checks for universe 347eb7be-88b5-44ed-b519-1052487e5ced YW 2023-11-03T06:07:10.379Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from EditKubernetesUniverse in TaskPool-7 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out15761747450556728945tmp, stderr=/tmp/shell_process_err16162390392062292532tmp YW 2023-11-03T06:07:10.941Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed' '-o' 'json' - logging stdout=/tmp/shell_process_out16328458040940971014tmp, stderr=/tmp/shell_process_err9595293916813332432tmp YW 2023-11-03T06:07:11.487Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:11.526Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json YW 2023-11-03T06:07:11.527Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed' '-o' 'json' - logging stdout=/tmp/shell_process_out11035907328384396246tmp, stderr=/tmp/shell_process_err3826067280996541352tmp YW 2023-11-03T06:07:12.031Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json' status=success [ 505 ms ] ``` Reviewers: sanketh, nsingh, sneelakantan, dshubin Reviewed By: sanketh, nsingh, dshubin Subscribers: yugaware Differential Revision: https://phorge.dev.yugabyte.com/D29938

…CA having cert chain in YBA trust's store Summary: Currently YBA assumes that the CA certs added to YBA trust store will be a single root cert. With this diff we enable the support for cert chain as well. This was observed in fidelity environment where our migration V274 failed for the same reason. Some minor other improvements/fixes - - Fix the deletion of CA certs from YBA's trust store. In case the deletion fails in the first attempt the `certContent` that stores the filePath starts storing `certContent` which causes the subsequent deletion attempt to fail - This diff fixes it. [PLAT-11176][PLAT-11170] Pass Java PKCS TrustStore for play.ws.ssl connections This diff fixes two issues - - **PLAT-11176**: Previously, we were only passing YBA's PEM trust store from the custom CA trust store for `play.ws.ssl` TLS handshakes. Consequently, when we attempted to upload multiple CA certificates to YBA's trust store, it resulted in SSL handshake failures for the previously uploaded certificates. With this update, we have included YBA's Java trust store as well. - **PLAT-11170**: There was an issue with deletion of CA cert from YBA's trust store. Specifically, when we had uploaded one certificate chain and another certificate that only contained the root of the previously uploaded certificate chain, the deletion of the latter was failing. This issue has been resolved in this diff. Depends on - D29985, D29143 Original Commit - yugabyte@863ae72 yugabyte@4c8978b Test Plan: **Case1** - Ran the migration with the fidelity postgres dump. - Ensured that the certs are correctly importerd in both YBA's PKCS12/PEM trust store. **Case2** - Deployed a keycloak server (OIDC server) - [[ https://10.23.16.17:8443 | https://10.23.16.17/ ]] that supports custom certs. - Created a cert chain certificates (root -> intermediate -> client). - Deployed the above server with client certificate. - Added the root/intermediate certs in YBA's trust store. - Ensured authentication is successful. - Deleted the certs from YBA trust store. - Now ensured SSO login is broken. - Uploaded partial, i.e., root only cert to YBA trust store. - Ensured that SSO login is broken. **Case3** - Verified crud for the custom CA trust store. **Case4** - Added a cert chain with root (r1) & intermediate (i1) -> (cert1) - Added another cert chain with root(r1) & intermediate (i2) -> (cert2) - Ensured our PEM store contains 3 entries now. - Removed cert1 from the trust store. - Verified that r1 & i2 are present in the YBA's PEM store. - Added back cert1 in trust store. - Replaced cert1 with some other cert chain -> (cert3) [root (r2) & intermediate i3] - Verified that PEM trust store contain now 4 certs -> [r1, i2, r2, i3]. - For PKCS12 store, we add/remove/delete based on the alias (cert name). So we don't need any special handling for that. **Case5** - Ensured that the migration V274 is idempotent, i.e, the directory created are cleared in case the migration fails, so that we remains in the same state from YBA's perspective. iTest pipeline UT's CA trust store related iTests **PLAT-11170** - Uploaded the root cert to YBA's trust store. - Created a certificate chain using the root certificate mentioned above and also uploaded it. - Verified that deletion of cert uploaded in yugabyte#1 was successful. **PLAT-11176** - Created HA setup with two standup portals. - Each portal is using it's own custom CA certs. - Uploaded both the cert chains to YBA's trust store. - Verified that the backup is successful on both the standby setups configured. Reviewers: #yba-api-review, nbhatia, cwang, amalyshev Reviewed By: amalyshev Subscribers: yugaware Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30055

…CA having cert chain in YBA trust's store Summary: Currently YBA assumes that the CA certs added to YBA trust store will be a single root cert. With this diff we enable the support for cert chain as well. This was observed in fidelity environment where our migration V274 failed for the same reason. Some minor other improvements/fixes - - Fix the deletion of CA certs from YBA's trust store. In case the deletion fails in the first attempt the `certContent` that stores the filePath starts storing `certContent` which causes the subsequent deletion attempt to fail - This diff fixes it. [PLAT-11176][PLAT-11170] Pass Java PKCS TrustStore for play.ws.ssl connections This diff fixes two issues - - **PLAT-11176**: Previously, we were only passing YBA's PEM trust store from the custom CA trust store for `play.ws.ssl` TLS handshakes. Consequently, when we attempted to upload multiple CA certificates to YBA's trust store, it resulted in SSL handshake failures for the previously uploaded certificates. With this update, we have included YBA's Java trust store as well. - **PLAT-11170**: There was an issue with deletion of CA cert from YBA's trust store. Specifically, when we had uploaded one certificate chain and another certificate that only contained the root of the previously uploaded certificate chain, the deletion of the latter was failing. This issue has been resolved in this diff. Depends on - D29985, D29143 Original Commit - yugabyte@863ae72 yugabyte@4c8978b Test Plan: **Case1** - Ran the migration with the fidelity postgres dump. - Ensured that the certs are correctly importerd in both YBA's PKCS12/PEM trust store. **Case2** - Deployed a keycloak server (OIDC server) - [[ https://10.23.16.17:8443 | https://10.23.16.17/ ]] that supports custom certs. - Created a cert chain certificates (root -> intermediate -> client). - Deployed the above server with client certificate. - Added the root/intermediate certs in YBA's trust store. - Ensured authentication is successful. - Deleted the certs from YBA trust store. - Now ensured SSO login is broken. - Uploaded partial, i.e., root only cert to YBA trust store. - Ensured that SSO login is broken. **Case3** - Verified crud for the custom CA trust store. **Case4** - Added a cert chain with root (r1) & intermediate (i1) -> (cert1) - Added another cert chain with root(r1) & intermediate (i2) -> (cert2) - Ensured our PEM store contains 3 entries now. - Removed cert1 from the trust store. - Verified that r1 & i2 are present in the YBA's PEM store. - Added back cert1 in trust store. - Replaced cert1 with some other cert chain -> (cert3) [root (r2) & intermediate i3] - Verified that PEM trust store contain now 4 certs -> [r1, i2, r2, i3]. - For PKCS12 store, we add/remove/delete based on the alias (cert name). So we don't need any special handling for that. **Case5** - Ensured that the migration V274 is idempotent, i.e, the directory created are cleared in case the migration fails, so that we remains in the same state from YBA's perspective. iTest pipeline UT's CA trust store related iTests **PLAT-11170** - Uploaded the root cert to YBA's trust store. - Created a certificate chain using the root certificate mentioned above and also uploaded it. - Verified that deletion of cert uploaded in yugabyte#1 was successful. **PLAT-11176** - Created HA setup with two standup portals. - Each portal is using it's own custom CA certs. - Uploaded both the cert chains to YBA's trust store. - Verified that the backup is successful on both the standby setups configured. Reviewers: #yba-api-review, nbhatia, cwang, amalyshev Reviewed By: amalyshev Subscribers: yugaware Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30054

…rt -1 Summary: Original commit: 651a2e8 / D29938 One of the first tasks that are kicked off during an edit universe is for DiskResizing. This change makes the createResizeDiskTask function idempotent. It will only create the disk resize tasks if the size specified is different from the current volume on the pod. Test Plan: Tested by making task abortable and retryable, retried the edit kubernetes task after aborting disk resize in the middle. ``` YW 2023-11-03T06:04:18.533Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from EditKubernetesUniverse in TaskPool-6 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out5549963527175565003tmp, stderr=/tmp/shell_process_err5819548201875528501tmp YW 2023-11-03T06:04:19.095Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 508 ms ] YW 2023-11-03T06:04:19.104Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Incrementing RF for us-west1-a to: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Number of nodes in us-west1-a: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCheckVolumeExpansion YW 2023-11-03T06:04:19.105Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCheckVolumeExpansion details= {"platformVersion":"2.21.0.0-PRE_RELEASE","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"newNamingStyle":true,"namespace":"yb-admin-test1","providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","helmReleaseName":"ybtest1-us-west1-a-twed"} YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #0: KubernetesVolumeInfo YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from AbstractTaskBase in TaskPool-6 - Executor name: task YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.110Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"STS_DELETE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.111Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #1: ResizingDisk YW 2023-11-03T06:04:19.113Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning YW 2023-11-03T06:04:19.115Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.117Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"PVC_EXPAND_SIZE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.119Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #2: ResizingDisk YW 2023-11-03T06:04:19.120Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning ... ``` Verified disk size was increased ``` [centos@dev-server-anijhawan-4 managed]$ kubectl -n yb-admin-test1 get pvc ybtest1-us-west1-b-uwed-datadir0-ybtest1-us-west1-b-uwed-yb-tserver-0 ybtest1-us-west1-a-twed-datadir0-ybtest1-us-west1-a-twed-yb-tserver-0 ybtest1-us-west1-c-vwed-datadir0-ybtest1-us-west1-c-vwed-yb-tserver-0 -o yaml | grep storage volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi ``` Retry logs we can see function was invoke but task creation was skipped. ``` YW 2023-11-03T06:07:10.173Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TaskExecutor in TaskPool-7 - Invoking run() of task EditKubernetesUniverse(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:07:10.173Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from CustomerTaskController in application-akka.actor.default-dispatcher-2292 - Saved task uuid 66611664-a25f-4ad2-93aa-e40a7db67654 in customer tasks table for target 347eb7be-88b5-44ed-b519-1052487e5ced:test1 YW 2023-11-03T06:07:10.322Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TransactionUtil in TaskPool-7 - Trying(1)... YW 2023-11-03T06:07:10.333Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from UniverseTaskBase in TaskPool-7 - Cancelling any active health-checks for universe 347eb7be-88b5-44ed-b519-1052487e5ced YW 2023-11-03T06:07:10.379Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from EditKubernetesUniverse in TaskPool-7 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out15761747450556728945tmp, stderr=/tmp/shell_process_err16162390392062292532tmp YW 2023-11-03T06:07:10.941Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed' '-o' 'json' - logging stdout=/tmp/shell_process_out16328458040940971014tmp, stderr=/tmp/shell_process_err9595293916813332432tmp YW 2023-11-03T06:07:11.487Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:11.526Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json YW 2023-11-03T06:07:11.527Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed' '-o' 'json' - logging stdout=/tmp/shell_process_out11035907328384396246tmp, stderr=/tmp/shell_process_err3826067280996541352tmp YW 2023-11-03T06:07:12.031Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json' status=success [ 505 ms ] ``` Reviewers: sanketh, nsingh, sneelakantan, dshubin, cwang, nbhatia Reviewed By: cwang, nbhatia Subscribers: cwang, nbhatia, yugaware Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30901

…part -1 Summary: Original commit: 98de5da / D30901 One of the first tasks that are kicked off during an edit universe is for DiskResizing. This change makes the createResizeDiskTask function idempotent. It will only create the disk resize tasks if the size specified is different from the current volume on the pod. Test Plan: Tested by making task abortable and retryable, retried the edit kubernetes task after aborting disk resize in the middle. ``` YW 2023-11-03T06:04:18.533Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from EditKubernetesUniverse in TaskPool-6 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out5549963527175565003tmp, stderr=/tmp/shell_process_err5819548201875528501tmp YW 2023-11-03T06:04:19.095Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 508 ms ] YW 2023-11-03T06:04:19.104Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Incrementing RF for us-west1-a to: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Number of nodes in us-west1-a: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCheckVolumeExpansion YW 2023-11-03T06:04:19.105Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCheckVolumeExpansion details= {"platformVersion":"2.21.0.0-PRE_RELEASE","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"newNamingStyle":true,"namespace":"yb-admin-test1","providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","helmReleaseName":"ybtest1-us-west1-a-twed"} YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #0: KubernetesVolumeInfo YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from AbstractTaskBase in TaskPool-6 - Executor name: task YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.110Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"STS_DELETE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.111Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #1: ResizingDisk YW 2023-11-03T06:04:19.113Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning YW 2023-11-03T06:04:19.115Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.117Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"PVC_EXPAND_SIZE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.119Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #2: ResizingDisk YW 2023-11-03T06:04:19.120Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning ... ``` Verified disk size was increased ``` [centos@dev-server-anijhawan-4 managed]$ kubectl -n yb-admin-test1 get pvc ybtest1-us-west1-b-uwed-datadir0-ybtest1-us-west1-b-uwed-yb-tserver-0 ybtest1-us-west1-a-twed-datadir0-ybtest1-us-west1-a-twed-yb-tserver-0 ybtest1-us-west1-c-vwed-datadir0-ybtest1-us-west1-c-vwed-yb-tserver-0 -o yaml | grep storage volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi ``` Retry logs we can see function was invoke but task creation was skipped. ``` YW 2023-11-03T06:07:10.173Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TaskExecutor in TaskPool-7 - Invoking run() of task EditKubernetesUniverse(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:07:10.173Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from CustomerTaskController in application-akka.actor.default-dispatcher-2292 - Saved task uuid 66611664-a25f-4ad2-93aa-e40a7db67654 in customer tasks table for target 347eb7be-88b5-44ed-b519-1052487e5ced:test1 YW 2023-11-03T06:07:10.322Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TransactionUtil in TaskPool-7 - Trying(1)... YW 2023-11-03T06:07:10.333Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from UniverseTaskBase in TaskPool-7 - Cancelling any active health-checks for universe 347eb7be-88b5-44ed-b519-1052487e5ced YW 2023-11-03T06:07:10.379Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from EditKubernetesUniverse in TaskPool-7 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out15761747450556728945tmp, stderr=/tmp/shell_process_err16162390392062292532tmp YW 2023-11-03T06:07:10.941Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed' '-o' 'json' - logging stdout=/tmp/shell_process_out16328458040940971014tmp, stderr=/tmp/shell_process_err9595293916813332432tmp YW 2023-11-03T06:07:11.487Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:11.526Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json YW 2023-11-03T06:07:11.527Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed' '-o' 'json' - logging stdout=/tmp/shell_process_out11035907328384396246tmp, stderr=/tmp/shell_process_err3826067280996541352tmp YW 2023-11-03T06:07:12.031Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json' status=success [ 505 ms ] ``` Reviewers: sanketh, nsingh, sneelakantan, dshubin, cwang, nbhatia Reviewed By: nsingh, dshubin Subscribers: yugaware, nbhatia, cwang Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31095

Summary: YB Seq Scan code path is not hit because Foreign Scan is the default and pg hint plan does not work. Upcoming merge with YB master will bring in master commit 465ee2c which changes the default to YB Seq Scan. To test YB Seq Scan, a temporary patch is needed (see the test plan). With that, two bugs are encountered: fix them. 1. FailedAssertion("TTS_IS_VIRTUAL(slot)" On simple test case create table t (i int primary key, j int); select * from t; get TRAP: FailedAssertion("TTS_IS_VIRTUAL(slot)", File: "../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c", Line: 3473, PID: 2774450) Details: #0 0x00007fd52616eacf in raise () from /lib64/libc.so.6 #1 0x00007fd526141ea5 in abort () from /lib64/libc.so.6 #2 0x0000000000af33ad in ExceptionalCondition (conditionName=conditionName@entry=0xc2938d "TTS_IS_VIRTUAL(slot)", errorType=errorType@entry=0xc01498 "FailedAssertion", fileName=fileName@entry=0xc28f18 "../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c", lineNumber=lineNumber@entry=3473) at ../../../../../../../src/postgres/src/backend/utils/error/assert.c:69 #3 0x00000000005c26bd in ybFetchNext (handle=0x2600ffc43680, slot=slot@entry=0x2600ff6c2980, relid=16384) at ../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c:3473 #4 0x00000000007de444 in YbSeqNext (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:156 #5 0x000000000078b3c6 in ExecScanFetch (node=node@entry=0x2600ff6c2778, accessMtd=accessMtd@entry=0x7de2b9 <YbSeqNext>, recheckMtd=recheckMtd@entry=0x7de26e <YbSeqRecheck>) at ../../../../../../src/postgres/src/backend/executor/execScan.c:133 #6 0x000000000078b44e in ExecScan (node=0x2600ff6c2778, accessMtd=accessMtd@entry=0x7de2b9 <YbSeqNext>, recheckMtd=recheckMtd@entry=0x7de26e <YbSeqRecheck>) at ../../../../../../src/postgres/src/backend/executor/execScan.c:182 #7 0x00000000007de298 in ExecYbSeqScan (pstate=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:191 #8 0x00000000007871ef in ExecProcNodeFirst (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/backend/executor/execProcnode.c:480 #9 0x000000000077db0e in ExecProcNode (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/include/executor/executor.h:285 #10 ExecutePlan (execute_once=<optimized out>, dest=0x2600ff6b1a10, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x2600ff6c2778, estate=0x2600ff6c2128) at ../../../../../../src/postgres/src/backend/executor/execMain.c:1650 #11 standard_ExecutorRun (queryDesc=0x2600ff675128, direction=<optimized out>, count=0, execute_once=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/execMain.c:367 #12 0x000000000077dbfe in ExecutorRun (queryDesc=queryDesc@entry=0x2600ff675128, direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/execMain.c:308 #13 0x0000000000982617 in PortalRunSelect (portal=portal@entry=0x2600ff90e128, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x2600ff6b1a10) at ../../../../../../src/postgres/src/backend/tcop/pquery.c:954 #14 0x000000000098433c in PortalRun (portal=portal@entry=0x2600ff90e128, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x2600ff6b1a10, altdest=altdest@entry=0x2600ff6b1a10, qc=0x7fffc14a13c0) at ../../../../../../src/postgres/src/backend/tcop/pquery.c:786 #15 0x000000000097e65b in exec_simple_query (query_string=0x2600ffdc6128 "select * from t;") at ../../../../../../src/postgres/src/backend/tcop/postgres.c:1321 #16 yb_exec_simple_query_impl (query_string=query_string@entry=0x2600ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5060 #17 0x000000000097b7a5 in yb_exec_query_wrapper_one_attempt (exec_context=exec_context@entry=0x2600ffdc6000, restart_data=restart_data@entry=0x7fffc14a1640, functor=functor@entry=0x97e033 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2600ffdc6128, attempt=attempt@entry=0, retry=retry@entry=0x7fffc14a15ff) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5028 #18 0x000000000097d077 in yb_exec_query_wrapper (exec_context=exec_context@entry=0x2600ffdc6000, restart_data=restart_data@entry=0x7fffc14a1640, functor=functor@entry=0x97e033 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2600ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5052 #19 0x000000000097d0ca in yb_exec_simple_query (query_string=query_string@entry=0x2600ffdc6128 "select * from t;", exec_context=exec_context@entry=0x2600ffdc6000) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5075 #20 0x000000000097fe8a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5794 #21 0x00000000008c8354 in BackendRun (port=0x2600ff8423c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4791 #22 BackendStartup (port=0x2600ff8423c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4491 #23 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1878 #24 0x00000000008caa55 in PostmasterMain (argc=argc@entry=25, argv=argv@entry=0x2600ffdc01a0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1533 #25 0x0000000000804ba8 in PostgresServerProcessMain (argc=25, argv=0x2600ffdc01a0) at ../../../../../../src/postgres/src/backend/main/main.c:208 #26 0x0000000000804bc8 in main () 3469 ybFetchNext(YBCPgStatement handle, 3470 TupleTableSlot *slot, Oid relid) 3471 { 3472 Assert(slot != NULL); 3473 Assert(TTS_IS_VIRTUAL(slot)); (gdb) p *slot $2 = {type = T_TupleTableSlot, tts_flags = 18, tts_nvalid = 0, tts_ops = 0xeaf5e0 <TTSOpsHeapTuple>, tts_tupleDescriptor = 0x2600ff6416c0, tts_values = 0x2600ff6c2a00, tts_isnull = 0x2600ff6c2a10, tts_mcxt = 0x2600ff6c2000, tts_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 0, yb_item = {ybctid = 0}}, tts_tableOid = 0, tts_yb_insert_oid = 0} Fix by making YB Seq Scan always use virtual slot. This is similar to what is done for YB Foreign Scan. 2. segfault in ending scan Same simple test case gives segfault at a later stage. Details: #0 0x00000000007de762 in table_endscan (scan=0x3debfe3ab88) at ../../../../../../src/postgres/src/include/access/tableam.h:997 #1 ExecEndYbSeqScan (node=node@entry=0x3debfe3a778) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:298 #2 0x0000000000787a75 in ExecEndNode (node=0x3debfe3a778) at ../../../../../../src/postgres/src/backend/executor/execProcnode.c:649 #3 0x000000000077ffaf in ExecEndPlan (estate=0x3debfe3a128, planstate=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/execMain.c:1489 #4 standard_ExecutorEnd (queryDesc=0x2582fdc88928) at ../../../../../../src/postgres/src/backend/executor/execMain.c:503 #5 0x00000000007800f8 in ExecutorEnd (queryDesc=queryDesc@entry=0x2582fdc88928) at ../../../../../../src/postgres/src/backend/executor/execMain.c:474 #6 0x00000000006f140c in PortalCleanup (portal=0x2582ff900128) at ../../../../../../src/postgres/src/backend/commands/portalcmds.c:305 #7 0x0000000000b3c36a in PortalDrop (portal=portal@entry=0x2582ff900128, isTopCommit=isTopCommit@entry=false) at ../../../../../../../src/postgres/src/backend/utils/mmgr/portalmem.c:514 #8 0x000000000097e667 in exec_simple_query (query_string=0x2582ffdc6128 "select * from t;") at ../../../../../../src/postgres/src/backend/tcop/postgres.c:1331 #9 yb_exec_simple_query_impl (query_string=query_string@entry=0x2582ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5060 #10 0x000000000097b79a in yb_exec_query_wrapper_one_attempt (exec_context=exec_context@entry=0x2582ffdc6000, restart_data=restart_data@entry=0x7ffc81c0e7d0, functor=functor@entry=0x97e028 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2582ffdc6128, attempt=attempt@entry=0, retry=retry@entry=0x7ffc81c0e78f) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5028 #11 0x000000000097d06c in yb_exec_query_wrapper (exec_context=exec_context@entry=0x2582ffdc6000, restart_data=restart_data@entry=0x7ffc81c0e7d0, functor=functor@entry=0x97e028 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2582ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5052 #12 0x000000000097d0bf in yb_exec_simple_query (query_string=query_string@entry=0x2582ffdc6128 "select * from t;", exec_context=exec_context@entry=0x2582ffdc6000) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5075 #13 0x000000000097fe7f in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5794 #14 0x00000000008c8349 in BackendRun (port=0x2582ff8403c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4791 #15 BackendStartup (port=0x2582ff8403c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4491 #16 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1878 #17 0x00000000008caa4a in PostmasterMain (argc=argc@entry=25, argv=argv@entry=0x2582ffdc01a0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1533 #18 0x0000000000804b9d in PostgresServerProcessMain (argc=25, argv=0x2582ffdc01a0) at ../../../../../../src/postgres/src/backend/main/main.c:208 #19 0x0000000000804bbd in main () 294 /* 295 * close heap scan 296 */ 297 if (tsdesc != NULL) 298 table_endscan(tsdesc); Reason is initial merge 55782d5 incorrectly merges end of ExecEndYbSeqScan. Upstream PG 9ddef36278a9f676c07d0b4d9f33fa22e48ce3b5 removes code, but initial merge duplicates lines. Remove those lines. Test Plan: Apply the following patch to activate YB Seq Scan: diff --git a/src/postgres/src/backend/optimizer/path/allpaths.c b/src/postgres/src/backend/optimizer/path/allpaths.c index 8a4c38a965..854d84a648 100644 --- a/src/postgres/src/backend/optimizer/path/allpaths.c +++ b/src/postgres/src/backend/optimizer/path/allpaths.c @@ -576,7 +576,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, else { /* Plain relation */ - if (IsYBRelationById(rte->relid)) + if (false) { /* * Using a foreign scan which will use the YB FDW by On almalinux 8, ./yb_build.sh fastdebug --gcc11 pg15_tests/run_all_tests.sh fastdebug --gcc11 --sj --sp --scb fails the following tests: - test_D29546 - test_pg15_regress: yb_pg15 - test_types_geo: yb_pg_box - test_hash_in_queries: yb_hash_in_queries Manually check to see that they are due to YB Seq Scan explain output differences. Reviewers: aagrawal, tfoucher Reviewed By: tfoucher Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D31139

…ction Summary: The are several unit tests which suffers from tsan data race warning with the following stack: ``` WARNING: ThreadSanitizer: data race (pid=38656) Read of size 8 at 0x7f6f2a44b038 by thread T21: #0 memcpy /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115:5 (pg_ddl_concurrency-test+0x9e197) #1 <null> <null> (libnss_sss.so.2+0x72ef) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b) #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9) #3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7) #4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7) #5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe) #6 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:647:20 (libpq.so.5+0x2c279) #7 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${BUILD_ROOT}/../../src/yb/yql/pgwrapper/libpq_utils.cc:278:24 (libpq_utils.so+0x11d6b) ... Previous write of size 8 at 0x7f6f2a44b038 by thread T20 (mutexes: write M0): #0 mmap64 /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:7485:3 (pg_ddl_concurrency-test+0xda204) #1 <null> <null> (libnss_sss.so.2+0x7169) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b) #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9) #3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7) #4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7) #5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe) #6 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:647:20 (libpq.so.5+0x2c279) #7 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${BUILD_ROOT}/../../src/yb/yql/pgwrapper/libpq_utils.cc:278:24 (libpq_utils.so+0x11d6b) ... Location is global '??' at 0x7f6f2a44b000 (passwd+0x38) Mutex M0 (0x7f6f2af29380) created at: #0 pthread_mutex_lock /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (pg_ddl_concurrency-test+0xa464b) #1 <null> <null> (libnss_sss.so.2+0x70d6) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b) #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9) #3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7) #4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7) #5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe) ... ``` All failing tests has common feature - all of them creates connection to postgres from multiple threads at same time. On creating new connection the `libpq` library calls the `getpwuid_r` standard function internally. This function is thread safe and tsan warning is not expected there. Solution is to suppress warning in the `getpwuid_r` function. **Note:** because there is no `getpwuid_r` function name in the tsan warning stack the warning for the caller function `pqGetpwuid` is suppressed. Jira: DB-9523 Test Plan: Jenkins Reviewers: sergei, bogdan Reviewed By: sergei Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31646

Summary: The function `yb_single_row_update_or_delete_path` was reworked for PG15 in D27692. There are a few more issues in the function that this revision fixes: # With PG commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35, ## the target list returned by `build_path_tlist(root, subpath)` only contains the modified and junk columns. So, there is no need to ignore the unspecified columns. ## "tableoid" junk column is added for partitioned tables. Ignore it, along with other junk cols, when iterating over `build_path_tlist(root, subpath)`. ## `RelOptInfo` entries in `simple_rel_array` corresponding to the non-leaf relations of a partitioned table are NOT NULL. Ignore these when checking the number of relations being updated. ## When updating a partitioned table, the child of the UPDATE node is an APPEND node. This append node is skipped in the final plan if it has only one child. Take this into account when applying conditions to the `ModifyTablePath.subpath` by passing the subpath through `get_singleton_append_subpath`. # D27692 added an assertion ` Assert(root->update_colnos->length > update_col_index)`, which is incorrect. The pre-existing code comment clearly stated: `.. it is possible that planner adds extra expressions. In particular, we've seen a RowExpr when a view was updated`. As expected, this incorrect assertion fails when updating a view with a trigger (see the added test). Remove the assertion. Instead, move the expression-out-of-range check before the reading `root->update_colnos`. Test Plan: Jenkins: rebase: pg15 Added two tests in yb_pg15: - one, to test whether single-row optimization is invoked when only one partition is updated (fix #1) - other, to test UPDATE view with INSTEAD OF UPDATE trigger (fix #2). This test is taken from yb_pg_triggers (yb_pg_triggers still fails with unrelated errors) ./yb_build.sh --java-test org.yb.pgsql.TestPg15Regress#testPg15Regress Reviewers: jason, tnayak, amartsinchyk Reviewed By: amartsinchyk Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D31709

Summary: Restore YBC flow currently has preflight checks for: 1. DB version comparison 2. Autoflags check This diff modifies #1 to check for version numbers greater (compare stable to stable, preview to preview, other combinations result in error). Autoflags check remains the same. Test Plan: Manually test all existing flows work as usual. Run UTs. Run itests. Reviewers: sanketh, vbansal Reviewed By: vbansal Subscribers: yugaware Differential Revision: https://phorge.dev.yugabyte.com/D32944

Summary: Restore YBC flow currently has preflight checks for: 1. DB version comparison 2. Autoflags check This diff modifies yugabyte#1 to check for version numbers greater (compare stable to stable, preview to preview, other combinations result in error). Autoflags check remains the same. Test Plan: Manually test all existing flows work as usual. Run UTs. Run itests. Reviewers: sanketh, vbansal Reviewed By: vbansal Subscribers: yugaware Differential Revision: https://phorge.dev.yugabyte.com/D32944

…retained for CDC" Summary: D33131 introduced a segmentation fault which was identified in multiple tests. ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x00007f4d2b6f3a84 libpthread.so.0`__pthread_mutex_lock + 4 frame #1: 0x000055d6d1e1190b yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) const [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock[abi:v170002](this=0x00007f4ccb6feaa0, __m=0x0000000000000110) at unique_lock.h:41:11 frame #2: 0x000055d6d1e118f5 yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(this=0x00000000000000f0, min_allowed=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4ccb6feb08) const at mvcc.cc:500:32 frame #3: 0x000055d6d1ef58e3 yb-tserver`yb::tablet::TransactionParticipant::Impl::ProcessRemoveQueueUnlocked(this=0x000037e27d26fb00, min_running_notifier=0x00007f4ccb6fef28) at transaction_participant.cc:1537:45 frame #4: 0x000055d6d1efc11a yb-tserver`yb::tablet::TransactionParticipant::Impl::EnqueueRemoveUnlocked(this=0x000037e27d26fb00, id=<unavailable>, reason=<unavailable>, min_running_notifier=0x00007f4ccb6fef28, expected_deadlock_status=<unavailable>) at transaction_participant.cc:1516:5 frame #5: 0x000055d6d1e3afbe yb-tserver`yb::tablet::RunningTransaction::DoStatusReceived(this=0x000037e2679b5218, status_tablet="d5922c26c9704f298d6812aff8f615f6", status=<unavailable>, response=<unavailable>, serial_no=56986, shared_self=std::__1::shared_ptr<yb::tablet::RunningTransaction>::element_type @ 0x000037e2679b5218) at running_transaction.cc:424:16 frame #6: 0x000055d6d0d7db5f yb-tserver`yb::client::(anonymous namespace)::TransactionRpcBase::Finished(this=0x000037e29c80b420, status=<unavailable>) at transaction_rpc.cc:67:7 ``` This diff reverts the change to unblock the tests. The proper fix for this problem is WIP Jira: DB-10780, DB-10466 Test Plan: Jenkins: urgent Reviewers: rthallam Reviewed By: rthallam Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D34245

Summary: The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` #/bin/bash ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db # Count the number of dump files to avoid having to use `sudo coredumpctl` idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: ybase, smishra, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35116

…CA having cert chain in YBA trust's store Summary: Currently YBA assumes that the CA certs added to YBA trust store will be a single root cert. With this diff we enable the support for cert chain as well. This was observed in fidelity environment where our migration V274 failed for the same reason. Some minor other improvements/fixes - - Fix the deletion of CA certs from YBA's trust store. In case the deletion fails in the first attempt the `certContent` that stores the filePath starts storing `certContent` which causes the subsequent deletion attempt to fail - This diff fixes it. [PLAT-11176][PLAT-11170] Pass Java PKCS TrustStore for play.ws.ssl connections This diff fixes two issues - - **PLAT-11176**: Previously, we were only passing YBA's PEM trust store from the custom CA trust store for `play.ws.ssl` TLS handshakes. Consequently, when we attempted to upload multiple CA certificates to YBA's trust store, it resulted in SSL handshake failures for the previously uploaded certificates. With this update, we have included YBA's Java trust store as well. - **PLAT-11170**: There was an issue with deletion of CA cert from YBA's trust store. Specifically, when we had uploaded one certificate chain and another certificate that only contained the root of the previously uploaded certificate chain, the deletion of the latter was failing. This issue has been resolved in this diff. Depends on - D29985, D29143 Original Commit - 43160f4 82f944a Test Plan: **Case1** - Ran the migration with the fidelity postgres dump. - Ensured that the certs are correctly importerd in both YBA's PKCS12/PEM trust store. **Case2** - Deployed a keycloak server (OIDC server) - [[ https://10.23.16.17:8443 | https://10.23.16.17/ ]] that supports custom certs. - Created a cert chain certificates (root -> intermediate -> client). - Deployed the above server with client certificate. - Added the root/intermediate certs in YBA's trust store. - Ensured authentication is successful. - Deleted the certs from YBA trust store. - Now ensured SSO login is broken. - Uploaded partial, i.e., root only cert to YBA trust store. - Ensured that SSO login is broken. **Case3** - Verified crud for the custom CA trust store. **Case4** - Added a cert chain with root (r1) & intermediate (i1) -> (cert1) - Added another cert chain with root(r1) & intermediate (i2) -> (cert2) - Ensured our PEM store contains 3 entries now. - Removed cert1 from the trust store. - Verified that r1 & i2 are present in the YBA's PEM store. - Added back cert1 in trust store. - Replaced cert1 with some other cert chain -> (cert3) [root (r2) & intermediate i3] - Verified that PEM trust store contain now 4 certs -> [r1, i2, r2, i3]. - For PKCS12 store, we add/remove/delete based on the alias (cert name). So we don't need any special handling for that. **Case5** - Ensured that the migration V274 is idempotent, i.e, the directory created are cleared in case the migration fails, so that we remains in the same state from YBA's perspective. iTest pipeline UT's CA trust store related iTests **PLAT-11170** - Uploaded the root cert to YBA's trust store. - Created a certificate chain using the root certificate mentioned above and also uploaded it. - Verified that deletion of cert uploaded in #1 was successful. **PLAT-11176** - Created HA setup with two standup portals. - Each portal is using it's own custom CA certs. - Uploaded both the cert chains to YBA's trust store. - Verified that the backup is successful on both the standby setups configured. Reviewers: #yba-api-review, nbhatia, cwang, amalyshev Reviewed By: amalyshev Subscribers: yugaware Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30054

…retained for CDC" Summary: D33131 introduced a segmentation fault which was identified in multiple tests. ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x00007f4d2b6f3a84 libpthread.so.0`__pthread_mutex_lock + 4 frame #1: 0x000055d6d1e1190b yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) const [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock[abi:v170002](this=0x00007f4ccb6feaa0, __m=0x0000000000000110) at unique_lock.h:41:11 frame #2: 0x000055d6d1e118f5 yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(this=0x00000000000000f0, min_allowed=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4ccb6feb08) const at mvcc.cc:500:32 frame #3: 0x000055d6d1ef58e3 yb-tserver`yb::tablet::TransactionParticipant::Impl::ProcessRemoveQueueUnlocked(this=0x000037e27d26fb00, min_running_notifier=0x00007f4ccb6fef28) at transaction_participant.cc:1537:45 frame #4: 0x000055d6d1efc11a yb-tserver`yb::tablet::TransactionParticipant::Impl::EnqueueRemoveUnlocked(this=0x000037e27d26fb00, id=<unavailable>, reason=<unavailable>, min_running_notifier=0x00007f4ccb6fef28, expected_deadlock_status=<unavailable>) at transaction_participant.cc:1516:5 frame #5: 0x000055d6d1e3afbe yb-tserver`yb::tablet::RunningTransaction::DoStatusReceived(this=0x000037e2679b5218, status_tablet="d5922c26c9704f298d6812aff8f615f6", status=<unavailable>, response=<unavailable>, serial_no=56986, shared_self=std::__1::shared_ptr<yb::tablet::RunningTransaction>::element_type @ 0x000037e2679b5218) at running_transaction.cc:424:16 frame #6: 0x000055d6d0d7db5f yb-tserver`yb::client::(anonymous namespace)::TransactionRpcBase::Finished(this=0x000037e29c80b420, status=<unavailable>) at transaction_rpc.cc:67:7 ``` This diff reverts the change to unblock the tests. The proper fix for this problem is WIP Jira: DB-10780, DB-10466 Test Plan: Jenkins: urgent Reviewers: rthallam Reviewed By: rthallam Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D34245

Summary: The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` #/bin/bash ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db # Count the number of dump files to avoid having to use `sudo coredumpctl` idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: ybase, smishra, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35116

… SIGTERM Summary: Original commit: 5862233 / D35116 The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` #/bin/bash ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db # Count the number of dump files to avoid having to use `sudo coredumpctl` idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: yql, smishra, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35169

…IGTERM Summary: Original commit: 5862233 / D35116 The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: yql, smishra, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35171

…IGTERM Summary: Original commit: 5862233 / D35116 The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` #/bin/bash ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db # Count the number of dump files to avoid having to use `sudo coredumpctl` idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: ybase, smishra, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35170

The pg_stat_monitor is based on PostgreSQL-11's pg_stat_statement. To keep track of the changes, this is the base code of PostgreSQL's pg_stat_statement. (commit = d898edf4f233a3ffe6a0da64179fc268a1d46200).

kmuthukk assigned mbautin Feb 13, 2018

yugabyte-ci closed this as completed in 5a91590 Mar 19, 2018

ajcaldera1 mentioned this issue Oct 9, 2018

inaccurate compaction counters due to undersized printf #499

Closed

mbautin mentioned this issue Apr 9, 2019

[YCQL] Undefined behavior (5e+308 outside of range for float) in QLTestSelectedExpr.TestCastDecimal #1182

Closed

hectorgcr mentioned this issue Apr 9, 2019

TSAN warning signal-unsafe call inside of a signal inside in Proxy::SyncRequest #1177

Closed

ajcaldera1 added a commit that referenced this issue Apr 19, 2019

Merge pull request #1 from YugaByte/master (#1225)

89ef904

ajcaldera1 added a commit that referenced this issue Apr 19, 2019

Revert "Merge pull request #1 from YugaByte/master (#1225)"

98ef206

This reverts commit 89ef904.

ajcaldera1 added a commit that referenced this issue Apr 19, 2019

Revert "Merge pull request #1 from YugaByte/master (#1225)" (#1226)

ef6150a

This reverts commit 89ef904.

OlegLoginov mentioned this issue Apr 23, 2019

ASAN:undefined-behavior issue in QLTestSelectedExpr_TestCastDecimal #1241

Closed

This was referenced May 6, 2019

TSAN Failure in QLDmlTest.ReadFollower test #1321

Closed

TSAN failure in QLTransactionTest.CorrectStatusRequestBatching test #1322

Closed

mbautin pushed a commit that referenced this issue Jun 20, 2019

Merged in feature/restyle (pull request #1)

b075c95

Restyle docs to match web site

mbautin pushed a commit that referenced this issue Jun 20, 2019

Merge pull request #1 from YugaByte/arch-changes

1d03abc

updated the architecture section

amitanandaiyer mentioned this issue Sep 7, 2019

Seg fault in ParseFromCodedStream #2243

Closed

rao-vasireddy mentioned this issue Sep 18, 2019

core dump in call_home when tserver is out of memory #2371

Closed

mbautin mentioned this issue Oct 30, 2019

Check failed: !table->HasTasks() IsDeleteTableDone found pending tasks in org.yb.cql.TestIndex#testDropDuringWrite #2799

Closed

d-uspenskiy mentioned this issue Nov 7, 2019

[YSQL] A large number of YSQL test failures in macOS debug mode #2509

Closed

nspiegelberg mentioned this issue Nov 27, 2019

[2DC] AlterReplication API to add / remove tables to replicate #2051

Closed

foucher mentioned this issue Oct 25, 2023

[YSQL][PG15] "DROP DATABASE" doesn't clear DocDB namespace entry, preventing create with OID #19656

Open

1 task

vaibhav-yb mentioned this issue Oct 31, 2023

[CDCSDK] Catalog manager can delete entries from cdc_state on new tablets #19746

Closed

SrivastavaAnubhav mentioned this issue Nov 3, 2023

[DocDB] Multiple heap profile calls can race when setting sampling frequency #19841

Closed

1 task

d-uspenskiy mentioned this issue Jan 11, 2024

[YSQL] flaky test: PgDDLConcurrencyTest.IndexCreation #20517

Open

1 task

hari90 mentioned this issue Apr 22, 2024

[DocDB] e6bb62ab80a9c419df8a2807e2fe182cd2d7aa58/D34307 introduced a TSAN race #22101

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update build not to be hard-coded to require brew on linux #1

Update build not to be hard-coded to require brew on linux #1

hengestone commented Nov 6, 2017

mbautin commented Nov 6, 2017

hengestone commented Nov 7, 2017 •

edited

Loading

mbautin commented Mar 19, 2018

hengestone commented Mar 23, 2018

Update build not to be hard-coded to require brew on linux #1

Update build not to be hard-coded to require brew on linux #1

Comments

hengestone commented Nov 6, 2017

mbautin commented Nov 6, 2017

hengestone commented Nov 7, 2017 • edited Loading

mbautin commented Mar 19, 2018

hengestone commented Mar 23, 2018

hengestone commented Nov 7, 2017 •

edited

Loading