Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update build not to be hard-coded to require brew on linux #1

Closed
hengestone opened this issue Nov 6, 2017 · 4 comments
Closed

Update build not to be hard-coded to require brew on linux #1

hengestone opened this issue Nov 6, 2017 · 4 comments
Assignees

Comments

@hengestone
Copy link

Use cmake to detect library dependencies and fail if not found.

@mbautin
Copy link
Collaborator

mbautin commented Nov 6, 2017

@hengestone Great point, we'll definitely look into this! The reason to introduce Linuxbrew was that CentOS is our primary target production platform, but the default toolchain on CentOS 7 is quite outdated. And it turns out that Linuxbrew has a pretty recent Boost version as well. Building without Linuxbrew on CentOS might still be challenging, but we might be able to build without it e.g. on a recent version of Ubuntu. What platform are you trying to build on, and what does your toolchain look like, by the way?

@hengestone
Copy link
Author

hengestone commented Nov 7, 2017

Totally understood - there just should be an alternative for environments that are more up to date and able to install newer packages. I'm on Ubuntu (17.10), which is used by a large majority of cloud platforms. See e.g. zdnet or G+

I should add: congratulations on the public Beta, wish you lots of success!

yugabyte-ci pushed a commit that referenced this issue Feb 2, 2018
… memtable

Summary:
There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message.

```
#0  operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178
#1  std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...)  at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857
#2  0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267
#3  rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...)
    at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259
#4  0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143
#5  0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e,
    job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586
#6  0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0,
    log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816
#7  0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838
#8  0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133
#9  0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157
#10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333
```

Test Plan: Jenkins

Reviewers: hector, sergei

Reviewed By: hector, sergei

Subscribers: sergei, bogdan, bharat, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4044
@mbautin
Copy link
Collaborator

mbautin commented Mar 19, 2018

@hengestone We have made changes to our build script and YugaByte DB should now build on Ubuntu 17.10. Please give it a try when you have a chance and let us know how it goes! By the way, at this point, we're still doing all of our official testing and releases on CentOS, but the resulting Linux package works on both CentOS and Ubuntu.

@hengestone
Copy link
Author

Thanks!

The build process gets a lot further now. I'll file a separate bug for the compile failure.

yugabyte-ci pushed a commit that referenced this issue Nov 30, 2018
…EANUP

Summary:
We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service.
This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload.

```#0  yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0,
    resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222
#1  0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...)
    at ../../src/yb/tserver/tablet_service.cc:431
#2  0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267
#3  0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214```

Changed LookupTablePeerOrRespond to return complete result using return value.

Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable.

Reviewers: robert, hector, mikhail, kannan

Reviewed By: mikhail, kannan

Subscribers: kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D5772
ajcaldera1 added a commit that referenced this issue Apr 19, 2019
ajcaldera1 added a commit that referenced this issue Apr 19, 2019
mbautin added a commit to yugabyte/yugabyte-db-thirdparty that referenced this issue Apr 29, 2019
Summary:
Enable building on gcc 7 on Ubuntu 17.10
- Extending the FALLTHROUGH_INTENDED macro appropriately
- Removing some unused code that gcc 7 complains about
- Adding "#include <functional>" to a bunch of files because otherwise gcc 7 does not find
  `std::function`.
- Add the thirdparty library directory to rpath of libraries in that directory. Without this glog fails to find gflags. Not clear why this only happens on Ubuntu 17.10 and not on CentOS with Linuxbrew.
- Fixing the clean_thirdparty.sh script that apparently has been incorrect since the Python rewrite
  of the thirdparty framework.

This will fix yugabyte/yugabyte-db#1.

Test Plan: Jenkins

Reviewers: hector, bharat, bogdan, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4395
mbautin pushed a commit that referenced this issue Jun 20, 2019
Restyle docs to match web site
mbautin pushed a commit that referenced this issue Jun 20, 2019
updated the architecture section
yugabyte-ci pushed a commit that referenced this issue Jun 27, 2019
…data

Summary:
Originally issue has been discovered as `RaftConsensusITest.TestAddRemoveVoter` random test failure in TSAN mode due to a data race.
```
WARNING: ThreadSanitizer: data race (pid=11050)
1663	[ts-2]	   Write of size 8 at 0x7b4c000603a8 by thread T51 (mutexes: write M3613):
...
1674	[ts-2]	     #10 yb::tablet::KvStoreInfo::LoadTablesFromPB(google::protobuf::RepeatedPtrField<yb::tablet::TableInfoPB>, string) src/yb/tablet/tablet_metadata.cc:170
1675	[ts-2]	     #11 yb::tablet::KvStoreInfo::LoadFromPB(yb::tablet::KvStoreInfoPB const&, string) src/yb/tablet/tablet_metadata.cc:189:10
1676	[ts-2]	     #12 yb::tablet::RaftGroupMetadata::LoadFromSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) src/yb/tablet/tablet_metadata.cc:508:5
1677	[ts-2]	     #13 yb::tablet::RaftGroupMetadata::ReplaceSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) src/yb/tablet/tablet_metadata.cc:545:3
1678	[ts-2]	     #14 yb::tserver::RemoteBootstrapClient::Finish() src/yb/tserver/remote_bootstrap_client.cc:486:3
...
   Previous read of size 4 at 0x7b4c000603a8 by thread T16:
1697	[ts-2]	     #0 yb::tablet::RaftGroupMetadata::schema_version() const src/yb/tablet/tablet_metadata.h:251:34
1698	[ts-2]	     #1 yb::tserver::TSTabletManager::CreateReportedTabletPB(std::__1::shared_ptr<yb::tablet::TabletPeer> const&, yb::master::ReportedTabletPB*) src/yb/tserver/ts_tablet_manager.cc:1323:71
1699	[ts-2]	     #2 yb::tserver::TSTabletManager::GenerateIncrementalTabletReport(yb::master::TabletReportPB*) src/yb/tserver/ts_tablet_manager.cc:1359:5
1700	[ts-2]	     #3 yb::tserver::Heartbeater::Thread::TryHeartbeat() src/yb/tserver/heartbeater.cc:371:32
1701	[ts-2]	     #4 yb::tserver::Heartbeater::Thread::DoHeartbeat() src/yb/tserver/heartbeater.cc:531:19
```

The reason is that although `RaftGroupMetadata::schema_version()` is getting `TableInfo` pointer from `primary_table_info()` under mutex lock, but then it accesses its field without lock.

Added `RaftGroupMetadata::primary_table_info_guarded()` private method which returns a pair of `TableInfo*` and `std::unique_lock` and used it in `RaftGroupMetadata::schema_version()` and other `RaftGroupMetadata` functions accessing primary table info fields.

Test Plan: `ybd tsan --sj --cxx-test integration-tests_raft_consensus-itest --gtest_filter RaftConsensusITest.TestAddRemoveVoter -n 1000`

Reviewers: bogdan, sergei

Reviewed By: sergei

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D6813
mbautin added a commit that referenced this issue Jul 11, 2019
…ed to the

earlier commit 864e72b

Original commit message:

ENG-2793 Do not fail when deciding if we can flush an empty immutable memtable

Summary:
There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message.

```
#0  operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178
#1  std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...)  at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857
#2  0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267
#3  rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...)
    at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259
#4  0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143
#5  0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e,
    job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586
#6  0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0,
    log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816
#7  0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838
#8  0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133
#9  0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157
#10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333
```

Test Plan: Jenkins

Reviewers: hector, sergei

Reviewed By: hector, sergei

Subscribers: sergei, bogdan, bharat, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4044
mbautin pushed a commit that referenced this issue Jul 11, 2019
…ed to the

earlier commit 566d6d2

Original commit message:

ENG-4240: #613: Fix checking of tablet presence during transaction CLEANUP

Summary:
We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service.
This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload.

```#0  yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0,
    resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222
#1  0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...)
    at ../../src/yb/tserver/tablet_service.cc:431
#2  0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267
#3  0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214```

Changed LookupTablePeerOrRespond to return complete result using return value.

Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable.

Reviewers: robert, hector, mikhail, kannan

Reviewed By: mikhail, kannan

Subscribers: kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D5772
mbautin added a commit to mbautin/yugabyte-db that referenced this issue Jul 16, 2019
… memtable

Summary:
There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message.

```
#0  operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178
yugabyte#1  std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...)  at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857
yugabyte#2  0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267
yugabyte#3  rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...)
    at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259
yugabyte#4  0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143
yugabyte#5  0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e,
    job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586
yugabyte#6  0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0,
    log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816
yugabyte#7  0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838
yugabyte#8  0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133
yugabyte#9  0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157
yugabyte#10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333
```

Test Plan: Jenkins

Reviewers: hector, sergei

Reviewed By: hector, sergei

Subscribers: sergei, bogdan, bharat, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D4044

Note:
This commit provides additional functionality that is logically related to
the earlier commit yugabyte@864e72b
and supersedes the commit yugabyte@2932b0a
mbautin pushed a commit to mbautin/yugabyte-db that referenced this issue Jul 16, 2019
…ction CLEANUP

Summary:
We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service.
This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload.

```#0  yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0,
    resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222
yugabyte#1  0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...)
    at ../../src/yb/tserver/tablet_service.cc:431
yugabyte#2  0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267
yugabyte#3  0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214```

Changed LookupTablePeerOrRespond to return complete result using return value.

Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable.

Reviewers: robert, hector, mikhail, kannan

Reviewed By: mikhail, kannan

Subscribers: kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D5772

Note:
This commit provides additional functionality that is logically related to
the earlier commit yugabyte@566d6d2
and supersedes the commit yugabyte@63bae60
kai-franz added a commit that referenced this issue Oct 13, 2023
Summary:
Adds MyDatabaseId to the T-server cache key to address an issue caused by different databases sharing T-server cache entries.

When a Postgres backend starts, it prefetches these 3 shared tables:
```
        YbRegisterTable(prefetcher, YB_PFETCH_TABLE_PG_AUTH_MEMBERS);
	YbRegisterTable(prefetcher, YB_PFETCH_TABLE_PG_DATABASE);
	YbRegisterTable(prefetcher, YB_PFETCH_TABLE_PG_DB_ROLE_SETTINGS);
```
These tables are then cached in the T-server cache, which is keyed by the database OID and the OIDs of these tables. Because these are shared tables, the OID of the template1 database is used. As a result, when another backend process starts up, it will issue the same prefetch request, which will result in a hit in the T-server cache (assuming the catalog version has not changed).

Here is how the issue manifests in detail by @tverona, (requires D28071, ysql_enable_read_request_caching=true):

### 1. Start with just `yugabytedb`. Connect to `yugabytedb` for the first time.
   * **a.** Relcacheinit file is built. So we preload a bunch of tables from master.
   * **b.** Create tserver cache entry for those tables (which includes `pg_database`). Key contains `yugabytedb` oid (since that’s part of the request).
   * **c.** Create `db1`.

### 2. Connect to `db1` for the first time.
   * **a.** Same flow as #1 above - we create a new relcacheinit file for `db1`.
   * **b.** We create another tserver cache entry (might be more than one, but just simplifying) with key containing `db1` oid.

### 3. Connect to `db1` for the 2nd time.
   * **a. With D28071:**
     - **i.** Relcache file is not built, since cache is not invalidated.
     - **ii.** We fetch the 3 tables (including `pg_database`) from master and create a new tserver cache entry, with key include `template0` dbid(?). Values include `db1`.
   * **b. Without D28071:**
     - **i.** We preload a bunch of tables for `db1`. We match on tserver cache entry from 2.b. We do not hit master.
   * **c.** Create `db2`.

### 4. Connect to `db2` for the first time.
   * **a.** Same flow as #1 above - we create a new relcacheinit file for `db2`.
   * **b.** We create another tserver cache entry (might be more than one, but just simplifying) with key containing `db2` oid.

### 5. Connect to `db2` for the 2nd time.
   * **a. With D28071:**
     - **i.** Relcache file is not built, since cache is not invalidated.
     - **ii.** We fetch the 3 tables (including `pg_database`) from master and match on the key in 3.a.ii. We do not hit master. We get back `pg_database` containing entries for `db1` but not `db2`.
     - **iii.** We fail later in `CheckMyDatabase`.
   * **b. Without the diff:**
     - **i.** We preload a bunch of tables for `db2`. We match on tserver cache entry from 4.b. We do not hit master.

By always including MyDatabaseId in the cache key, we avoid serving stale versions of shared relations to different databases.

**Upgrade/Rollback safety:**
Only PG to T-Server RPCs are changed.
Jira: DB-8163

Test Plan:
  # Connect to yugabyte
  # Connect to yugabyte
    # Create db1
  # Connect to db1
  # Connect to db1 <-- fails before this change with D28071, ysql_enable_read_request_caching=true

Reviewers: myang, dmitry

Reviewed By: dmitry

Subscribers: ybase, yql, tverona

Differential Revision: https://phorge.dev.yugabyte.com/D28945
Vars-07 added a commit that referenced this issue Nov 7, 2023
…nnections

Summary:
This diff fixes two issues -
  - **PLAT-11176**: Previously, we were only passing YBA's PEM trust store from the custom CA trust store for `play.ws.ssl` TLS handshakes. Consequently, when we attempted to upload multiple CA certificates to YBA's trust store, it resulted in SSL handshake failures for the previously uploaded certificates. With this update, we have included YBA's Java trust store as well.

  - **PLAT-11170**: There was an issue with deletion of CA cert from YBA's trust store. Specifically, when we had uploaded one certificate chain and another certificate that only contained the root of the previously uploaded certificate chain, the deletion of the latter was failing. This issue has been resolved in this diff.

Test Plan:
**PLAT-11170**
  - Uploaded the root cert to YBA's trust store.
  - Created a certificate chain using the root certificate mentioned above and also uploaded it.
  - Verified that deletion of cert uploaded in #1 was successful.

**PLAT-11176**
  - Created HA setup with two standup portals.
  - Each portal is using it's own custom CA certs.
  - Uploaded both the cert chains to YBA's trust store.
  - Verified that the backup is successful on both the standby setups configured.

Reviewers: amalyshev

Reviewed By: amalyshev

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D29985
amannijhawan added a commit that referenced this issue Nov 13, 2023
Summary:
One of the first tasks that are kicked off during an edit universe is for DiskResizing.
This change makes the createResizeDiskTask function idempotent.
It will only create the disk resize tasks if the size specified is different from the current
volume on the pod.

Test Plan:
Tested by making task abortable and retryable, retried the edit kubernetes task after aborting disk resize in the middle.

```
YW 2023-11-03T06:04:18.533Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from EditKubernetesUniverse in TaskPool-6 - Creating task for disk size change from 100 to 200
YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json
YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out5549963527175565003tmp, stderr=/tmp/shell_process_err5819548201875528501tmp
YW 2023-11-03T06:04:19.095Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 508 ms ]
YW 2023-11-03T06:04:19.104Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Incrementing RF for us-west1-a to: 1
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Number of nodes in us-west1-a: 1
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCheckVolumeExpansion
YW 2023-11-03T06:04:19.105Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCheckVolumeExpansion details= {"platformVersion":"2.21.0.0-PRE_RELEASE","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"newNamingStyle":true,"namespace":"yb-admin-test1","providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","helmReleaseName":"ybtest1-us-west1-a-twed"}
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #0: KubernetesVolumeInfo
YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from AbstractTaskBase in TaskPool-6 - Executor name: task
YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:04:19.110Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"STS_DELETE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]}
YW 2023-11-03T06:04:19.111Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #1: ResizingDisk
YW 2023-11-03T06:04:19.113Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning
YW 2023-11-03T06:04:19.115Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:04:19.117Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"PVC_EXPAND_SIZE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]}
YW 2023-11-03T06:04:19.119Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #2: ResizingDisk
YW 2023-11-03T06:04:19.120Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning
...
```
Verified disk size was increased

```
[centos@dev-server-anijhawan-4 managed]$ kubectl -n yb-admin-test1  get pvc ybtest1-us-west1-b-uwed-datadir0-ybtest1-us-west1-b-uwed-yb-tserver-0  ybtest1-us-west1-a-twed-datadir0-ybtest1-us-west1-a-twed-yb-tserver-0  ybtest1-us-west1-c-vwed-datadir0-ybtest1-us-west1-c-vwed-yb-tserver-0  -o yaml | grep storage
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi

```

Retry logs we can see function was invoke but task creation was skipped.

```
YW 2023-11-03T06:07:10.173Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TaskExecutor in TaskPool-7 - Invoking run() of task EditKubernetesUniverse(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:07:10.173Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from CustomerTaskController in application-akka.actor.default-dispatcher-2292 - Saved task uuid 66611664-a25f-4ad2-93aa-e40a7db67654 in customer tasks table for target 347eb7be-88b5-44ed-b519-1052487e5ced:test1
YW 2023-11-03T06:07:10.322Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TransactionUtil in TaskPool-7 - Trying(1)...
YW 2023-11-03T06:07:10.333Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from UniverseTaskBase in TaskPool-7 - Cancelling any active health-checks for universe 347eb7be-88b5-44ed-b519-1052487e5ced
YW 2023-11-03T06:07:10.379Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from EditKubernetesUniverse in TaskPool-7 - Creating task for disk size change from 100 to 200
YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json
YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out15761747450556728945tmp, stderr=/tmp/shell_process_err16162390392062292532tmp
YW 2023-11-03T06:07:10.941Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 505 ms ]
YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json
YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed' '-o' 'json' - logging stdout=/tmp/shell_process_out16328458040940971014tmp, stderr=/tmp/shell_process_err9595293916813332432tmp
YW 2023-11-03T06:07:11.487Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json' status=success [ 505 ms ]
YW 2023-11-03T06:07:11.526Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json
YW 2023-11-03T06:07:11.527Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed' '-o' 'json' - logging stdout=/tmp/shell_process_out11035907328384396246tmp, stderr=/tmp/shell_process_err3826067280996541352tmp
YW 2023-11-03T06:07:12.031Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json' status=success [ 505 ms ]
```

Reviewers: sanketh, nsingh, sneelakantan, dshubin

Reviewed By: sanketh, nsingh, dshubin

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D29938
Spikatrix pushed a commit to Spikatrix/yugabyte-db that referenced this issue Nov 29, 2023
…CA having cert chain in YBA trust's store

Summary:
Currently YBA assumes that the CA certs added to YBA trust store will be a single root
cert.
With this diff we enable the support for cert chain as well.
This was observed in fidelity environment where our migration V274 failed for the same reason.

Some minor other improvements/fixes -
  - Fix the deletion of CA certs from YBA's trust store. In case the deletion fails in the first attempt the `certContent` that stores the filePath starts storing `certContent` which causes the subsequent deletion attempt to fail - This diff fixes it.

[PLAT-11176][PLAT-11170] Pass Java PKCS TrustStore for play.ws.ssl connections

This diff fixes two issues -
  - **PLAT-11176**: Previously, we were only passing YBA's PEM trust store from the custom CA trust store for `play.ws.ssl` TLS handshakes. Consequently, when we attempted to upload multiple CA certificates to YBA's trust store, it resulted in SSL handshake failures for the previously uploaded certificates. With this update, we have included YBA's Java trust store as well.

  - **PLAT-11170**: There was an issue with deletion of CA cert from YBA's trust store. Specifically, when we had uploaded one certificate chain and another certificate that only contained the root of the previously uploaded certificate chain, the deletion of the latter was failing. This issue has been resolved in this diff.

Depends on - D29985, D29143
Original Commit -
yugabyte@863ae72
yugabyte@4c8978b

Test Plan:
**Case1**
  - Ran the migration with the fidelity postgres dump.
  - Ensured that the certs are correctly importerd in both YBA's PKCS12/PEM trust store.

**Case2**
  - Deployed a keycloak server (OIDC server) - [[ https://10.23.16.17:8443 | https://10.23.16.17/ ]] that supports custom certs.
  - Created a cert chain certificates (root -> intermediate -> client).
  - Deployed the above server with client certificate.
  - Added the root/intermediate certs in YBA's trust store.
  - Ensured authentication is successful.
  - Deleted the certs from YBA trust store.
  - Now ensured SSO login is broken.
  - Uploaded partial, i.e., root only cert to YBA trust store.
  - Ensured that SSO login is broken.

**Case3**
 - Verified crud for the custom CA trust store.

**Case4**
 - Added a cert chain with root (r1) & intermediate (i1) -> (cert1)
 - Added another cert chain with root(r1) & intermediate (i2) -> (cert2)
 - Ensured our PEM store contains 3 entries now.
 - Removed cert1 from the trust store.
 - Verified that r1 & i2 are present in the YBA's PEM store.
 - Added back cert1 in trust store.
 - Replaced cert1 with some other cert chain -> (cert3) [root (r2) & intermediate i3]
 - Verified that PEM trust store contain now 4 certs -> [r1, i2, r2, i3].
 - For PKCS12 store, we add/remove/delete based on the alias (cert name). So we don't need any special handling for that.

**Case5**
  - Ensured that the migration V274 is idempotent, i.e, the directory created are cleared in case the migration fails, so that we remains in the same state from YBA's perspective.

iTest pipeline
UT's

CA trust store related iTests

**PLAT-11170**
  - Uploaded the root cert to YBA's trust store.
  - Created a certificate chain using the root certificate mentioned above and also uploaded it.
  - Verified that deletion of cert uploaded in yugabyte#1 was successful.

**PLAT-11176**
  - Created HA setup with two standup portals.
  - Each portal is using it's own custom CA certs.
  - Uploaded both the cert chains to YBA's trust store.
  - Verified that the backup is successful on both the standby setups configured.

Reviewers: #yba-api-review, nbhatia, cwang, amalyshev

Reviewed By: amalyshev

Subscribers: yugaware

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30055
Spikatrix pushed a commit to Spikatrix/yugabyte-db that referenced this issue Nov 29, 2023
…CA having cert chain in YBA trust's store

Summary:
Currently YBA assumes that the CA certs added to YBA trust store will be a single root
cert.
With this diff we enable the support for cert chain as well.
This was observed in fidelity environment where our migration V274 failed for the same reason.

Some minor other improvements/fixes -
  - Fix the deletion of CA certs from YBA's trust store. In case the deletion fails in the first attempt the `certContent` that stores the filePath starts storing `certContent` which causes the subsequent deletion attempt to fail - This diff fixes it.

[PLAT-11176][PLAT-11170] Pass Java PKCS TrustStore for play.ws.ssl connections

This diff fixes two issues -
  - **PLAT-11176**: Previously, we were only passing YBA's PEM trust store from the custom CA trust store for `play.ws.ssl` TLS handshakes. Consequently, when we attempted to upload multiple CA certificates to YBA's trust store, it resulted in SSL handshake failures for the previously uploaded certificates. With this update, we have included YBA's Java trust store as well.

  - **PLAT-11170**: There was an issue with deletion of CA cert from YBA's trust store. Specifically, when we had uploaded one certificate chain and another certificate that only contained the root of the previously uploaded certificate chain, the deletion of the latter was failing. This issue has been resolved in this diff.

Depends on - D29985, D29143
Original Commit -
yugabyte@863ae72
yugabyte@4c8978b

Test Plan:
**Case1**
  - Ran the migration with the fidelity postgres dump.
  - Ensured that the certs are correctly importerd in both YBA's PKCS12/PEM trust store.

**Case2**
  - Deployed a keycloak server (OIDC server) - [[ https://10.23.16.17:8443 | https://10.23.16.17/ ]] that supports custom certs.
  - Created a cert chain certificates (root -> intermediate -> client).
  - Deployed the above server with client certificate.
  - Added the root/intermediate certs in YBA's trust store.
  - Ensured authentication is successful.
  - Deleted the certs from YBA trust store.
  - Now ensured SSO login is broken.
  - Uploaded partial, i.e., root only cert to YBA trust store.
  - Ensured that SSO login is broken.

**Case3**
 - Verified crud for the custom CA trust store.

**Case4**
 - Added a cert chain with root (r1) & intermediate (i1) -> (cert1)
 - Added another cert chain with root(r1) & intermediate (i2) -> (cert2)
 - Ensured our PEM store contains 3 entries now.
 - Removed cert1 from the trust store.
 - Verified that r1 & i2 are present in the YBA's PEM store.
 - Added back cert1 in trust store.
 - Replaced cert1 with some other cert chain -> (cert3) [root (r2) & intermediate i3]
 - Verified that PEM trust store contain now 4 certs -> [r1, i2, r2, i3].
 - For PKCS12 store, we add/remove/delete based on the alias (cert name). So we don't need any special handling for that.

**Case5**
  - Ensured that the migration V274 is idempotent, i.e, the directory created are cleared in case the migration fails, so that we remains in the same state from YBA's perspective.

iTest pipeline
UT's

CA trust store related iTests

**PLAT-11170**
  - Uploaded the root cert to YBA's trust store.
  - Created a certificate chain using the root certificate mentioned above and also uploaded it.
  - Verified that deletion of cert uploaded in yugabyte#1 was successful.

**PLAT-11176**
  - Created HA setup with two standup portals.
  - Each portal is using it's own custom CA certs.
  - Uploaded both the cert chains to YBA's trust store.
  - Verified that the backup is successful on both the standby setups configured.

Reviewers: #yba-api-review, nbhatia, cwang, amalyshev

Reviewed By: amalyshev

Subscribers: yugaware

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30054
amannijhawan added a commit that referenced this issue Dec 9, 2023
…rt -1

Summary:
Original commit: 651a2e8 / D29938
One of the first tasks that are kicked off during an edit universe is for DiskResizing.
This change makes the createResizeDiskTask function idempotent.
It will only create the disk resize tasks if the size specified is different from the current
volume on the pod.

Test Plan:
Tested by making task abortable and retryable, retried the edit kubernetes task after aborting disk resize in the middle.

```
YW 2023-11-03T06:04:18.533Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from EditKubernetesUniverse in TaskPool-6 - Creating task for disk size change from 100 to 200
YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json
YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out5549963527175565003tmp, stderr=/tmp/shell_process_err5819548201875528501tmp
YW 2023-11-03T06:04:19.095Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 508 ms ]
YW 2023-11-03T06:04:19.104Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Incrementing RF for us-west1-a to: 1
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Number of nodes in us-west1-a: 1
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCheckVolumeExpansion
YW 2023-11-03T06:04:19.105Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCheckVolumeExpansion details= {"platformVersion":"2.21.0.0-PRE_RELEASE","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"newNamingStyle":true,"namespace":"yb-admin-test1","providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","helmReleaseName":"ybtest1-us-west1-a-twed"}
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #0: KubernetesVolumeInfo
YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from AbstractTaskBase in TaskPool-6 - Executor name: task
YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:04:19.110Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"STS_DELETE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]}
YW 2023-11-03T06:04:19.111Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #1: ResizingDisk
YW 2023-11-03T06:04:19.113Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning
YW 2023-11-03T06:04:19.115Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:04:19.117Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"PVC_EXPAND_SIZE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]}
YW 2023-11-03T06:04:19.119Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #2: ResizingDisk
YW 2023-11-03T06:04:19.120Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning
...
```
Verified disk size was increased

```
[centos@dev-server-anijhawan-4 managed]$ kubectl -n yb-admin-test1  get pvc ybtest1-us-west1-b-uwed-datadir0-ybtest1-us-west1-b-uwed-yb-tserver-0  ybtest1-us-west1-a-twed-datadir0-ybtest1-us-west1-a-twed-yb-tserver-0  ybtest1-us-west1-c-vwed-datadir0-ybtest1-us-west1-c-vwed-yb-tserver-0  -o yaml | grep storage
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi

```

Retry logs we can see function was invoke but task creation was skipped.

```
YW 2023-11-03T06:07:10.173Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TaskExecutor in TaskPool-7 - Invoking run() of task EditKubernetesUniverse(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:07:10.173Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from CustomerTaskController in application-akka.actor.default-dispatcher-2292 - Saved task uuid 66611664-a25f-4ad2-93aa-e40a7db67654 in customer tasks table for target 347eb7be-88b5-44ed-b519-1052487e5ced:test1
YW 2023-11-03T06:07:10.322Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TransactionUtil in TaskPool-7 - Trying(1)...
YW 2023-11-03T06:07:10.333Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from UniverseTaskBase in TaskPool-7 - Cancelling any active health-checks for universe 347eb7be-88b5-44ed-b519-1052487e5ced
YW 2023-11-03T06:07:10.379Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from EditKubernetesUniverse in TaskPool-7 - Creating task for disk size change from 100 to 200
YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json
YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out15761747450556728945tmp, stderr=/tmp/shell_process_err16162390392062292532tmp
YW 2023-11-03T06:07:10.941Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 505 ms ]
YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json
YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed' '-o' 'json' - logging stdout=/tmp/shell_process_out16328458040940971014tmp, stderr=/tmp/shell_process_err9595293916813332432tmp
YW 2023-11-03T06:07:11.487Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json' status=success [ 505 ms ]
YW 2023-11-03T06:07:11.526Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json
YW 2023-11-03T06:07:11.527Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed' '-o' 'json' - logging stdout=/tmp/shell_process_out11035907328384396246tmp, stderr=/tmp/shell_process_err3826067280996541352tmp
YW 2023-11-03T06:07:12.031Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json' status=success [ 505 ms ]
```

Reviewers: sanketh, nsingh, sneelakantan, dshubin, cwang, nbhatia

Reviewed By: cwang, nbhatia

Subscribers: cwang, nbhatia, yugaware

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30901
amannijhawan added a commit that referenced this issue Dec 14, 2023
…part -1

Summary:
Original commit: 98de5da / D30901
One of the first tasks that are kicked off during an edit universe is for DiskResizing.
This change makes the createResizeDiskTask function idempotent.
It will only create the disk resize tasks if the size specified is different from the current
volume on the pod.

Test Plan:
Tested by making task abortable and retryable, retried the edit kubernetes task after aborting disk resize in the middle.

```
YW 2023-11-03T06:04:18.533Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from EditKubernetesUniverse in TaskPool-6 - Creating task for disk size change from 100 to 200
YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json
YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out5549963527175565003tmp, stderr=/tmp/shell_process_err5819548201875528501tmp
YW 2023-11-03T06:04:19.095Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 508 ms ]
YW 2023-11-03T06:04:19.104Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Incrementing RF for us-west1-a to: 1
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Number of nodes in us-west1-a: 1
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCheckVolumeExpansion
YW 2023-11-03T06:04:19.105Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCheckVolumeExpansion details= {"platformVersion":"2.21.0.0-PRE_RELEASE","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"newNamingStyle":true,"namespace":"yb-admin-test1","providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","helmReleaseName":"ybtest1-us-west1-a-twed"}
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #0: KubernetesVolumeInfo
YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from AbstractTaskBase in TaskPool-6 - Executor name: task
YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:04:19.110Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"STS_DELETE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]}
YW 2023-11-03T06:04:19.111Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #1: ResizingDisk
YW 2023-11-03T06:04:19.113Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning
YW 2023-11-03T06:04:19.115Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:04:19.117Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"PVC_EXPAND_SIZE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]}
YW 2023-11-03T06:04:19.119Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #2: ResizingDisk
YW 2023-11-03T06:04:19.120Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning
...
```
Verified disk size was increased

```
[centos@dev-server-anijhawan-4 managed]$ kubectl -n yb-admin-test1  get pvc ybtest1-us-west1-b-uwed-datadir0-ybtest1-us-west1-b-uwed-yb-tserver-0  ybtest1-us-west1-a-twed-datadir0-ybtest1-us-west1-a-twed-yb-tserver-0  ybtest1-us-west1-c-vwed-datadir0-ybtest1-us-west1-c-vwed-yb-tserver-0  -o yaml | grep storage
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi

```

Retry logs we can see function was invoke but task creation was skipped.

```
YW 2023-11-03T06:07:10.173Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TaskExecutor in TaskPool-7 - Invoking run() of task EditKubernetesUniverse(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:07:10.173Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from CustomerTaskController in application-akka.actor.default-dispatcher-2292 - Saved task uuid 66611664-a25f-4ad2-93aa-e40a7db67654 in customer tasks table for target 347eb7be-88b5-44ed-b519-1052487e5ced:test1
YW 2023-11-03T06:07:10.322Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TransactionUtil in TaskPool-7 - Trying(1)...
YW 2023-11-03T06:07:10.333Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from UniverseTaskBase in TaskPool-7 - Cancelling any active health-checks for universe 347eb7be-88b5-44ed-b519-1052487e5ced
YW 2023-11-03T06:07:10.379Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from EditKubernetesUniverse in TaskPool-7 - Creating task for disk size change from 100 to 200
YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json
YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out15761747450556728945tmp, stderr=/tmp/shell_process_err16162390392062292532tmp
YW 2023-11-03T06:07:10.941Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 505 ms ]
YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json
YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed' '-o' 'json' - logging stdout=/tmp/shell_process_out16328458040940971014tmp, stderr=/tmp/shell_process_err9595293916813332432tmp
YW 2023-11-03T06:07:11.487Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json' status=success [ 505 ms ]
YW 2023-11-03T06:07:11.526Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json
YW 2023-11-03T06:07:11.527Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed' '-o' 'json' - logging stdout=/tmp/shell_process_out11035907328384396246tmp, stderr=/tmp/shell_process_err3826067280996541352tmp
YW 2023-11-03T06:07:12.031Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json' status=success [ 505 ms ]
```

Reviewers: sanketh, nsingh, sneelakantan, dshubin, cwang, nbhatia

Reviewed By: nsingh, dshubin

Subscribers: yugaware, nbhatia, cwang

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31095
jasonyb pushed a commit that referenced this issue Dec 15, 2023
Summary:
YB Seq Scan code path is not hit because Foreign Scan is the default and
pg hint plan does not work.  Upcoming merge with YB master will bring in
master commit 465ee2c which changes the
default to YB Seq Scan.

To test YB Seq Scan, a temporary patch is needed (see the test plan).
With that, two bugs are encountered: fix them.

1. FailedAssertion("TTS_IS_VIRTUAL(slot)"

   On simple test case

       create table t (i int primary key, j int);
       select * from t;

   get

       TRAP: FailedAssertion("TTS_IS_VIRTUAL(slot)", File: "../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c", Line: 3473, PID: 2774450)

   Details:

       #0  0x00007fd52616eacf in raise () from /lib64/libc.so.6
       #1  0x00007fd526141ea5 in abort () from /lib64/libc.so.6
       #2  0x0000000000af33ad in ExceptionalCondition (conditionName=conditionName@entry=0xc2938d "TTS_IS_VIRTUAL(slot)", errorType=errorType@entry=0xc01498 "FailedAssertion",
           fileName=fileName@entry=0xc28f18 "../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c", lineNumber=lineNumber@entry=3473)
           at ../../../../../../../src/postgres/src/backend/utils/error/assert.c:69
       #3  0x00000000005c26bd in ybFetchNext (handle=0x2600ffc43680, slot=slot@entry=0x2600ff6c2980, relid=16384)
           at ../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c:3473
       #4  0x00000000007de444 in YbSeqNext (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:156
       #5  0x000000000078b3c6 in ExecScanFetch (node=node@entry=0x2600ff6c2778, accessMtd=accessMtd@entry=0x7de2b9 <YbSeqNext>, recheckMtd=recheckMtd@entry=0x7de26e <YbSeqRecheck>)
           at ../../../../../../src/postgres/src/backend/executor/execScan.c:133
       #6  0x000000000078b44e in ExecScan (node=0x2600ff6c2778, accessMtd=accessMtd@entry=0x7de2b9 <YbSeqNext>, recheckMtd=recheckMtd@entry=0x7de26e <YbSeqRecheck>)
           at ../../../../../../src/postgres/src/backend/executor/execScan.c:182
       #7  0x00000000007de298 in ExecYbSeqScan (pstate=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:191
       #8  0x00000000007871ef in ExecProcNodeFirst (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/backend/executor/execProcnode.c:480
       #9  0x000000000077db0e in ExecProcNode (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/include/executor/executor.h:285
       #10 ExecutePlan (execute_once=<optimized out>, dest=0x2600ff6b1a10, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT,
           use_parallel_mode=<optimized out>, planstate=0x2600ff6c2778, estate=0x2600ff6c2128) at ../../../../../../src/postgres/src/backend/executor/execMain.c:1650
       #11 standard_ExecutorRun (queryDesc=0x2600ff675128, direction=<optimized out>, count=0, execute_once=<optimized out>)
           at ../../../../../../src/postgres/src/backend/executor/execMain.c:367
       #12 0x000000000077dbfe in ExecutorRun (queryDesc=queryDesc@entry=0x2600ff675128, direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=<optimized out>)
           at ../../../../../../src/postgres/src/backend/executor/execMain.c:308
       #13 0x0000000000982617 in PortalRunSelect (portal=portal@entry=0x2600ff90e128, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x2600ff6b1a10)
           at ../../../../../../src/postgres/src/backend/tcop/pquery.c:954
       #14 0x000000000098433c in PortalRun (portal=portal@entry=0x2600ff90e128, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true,
           dest=dest@entry=0x2600ff6b1a10, altdest=altdest@entry=0x2600ff6b1a10, qc=0x7fffc14a13c0) at ../../../../../../src/postgres/src/backend/tcop/pquery.c:786
       #15 0x000000000097e65b in exec_simple_query (query_string=0x2600ffdc6128 "select * from t;") at ../../../../../../src/postgres/src/backend/tcop/postgres.c:1321
       #16 yb_exec_simple_query_impl (query_string=query_string@entry=0x2600ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5060
       #17 0x000000000097b7a5 in yb_exec_query_wrapper_one_attempt (exec_context=exec_context@entry=0x2600ffdc6000, restart_data=restart_data@entry=0x7fffc14a1640,
           functor=functor@entry=0x97e033 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2600ffdc6128, attempt=attempt@entry=0, retry=retry@entry=0x7fffc14a15ff)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5028
       #18 0x000000000097d077 in yb_exec_query_wrapper (exec_context=exec_context@entry=0x2600ffdc6000, restart_data=restart_data@entry=0x7fffc14a1640,
           functor=functor@entry=0x97e033 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2600ffdc6128)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5052
       #19 0x000000000097d0ca in yb_exec_simple_query (query_string=query_string@entry=0x2600ffdc6128 "select * from t;", exec_context=exec_context@entry=0x2600ffdc6000)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5075
       #20 0x000000000097fe8a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5794
       #21 0x00000000008c8354 in BackendRun (port=0x2600ff8423c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4791
       #22 BackendStartup (port=0x2600ff8423c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4491
       #23 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1878
       #24 0x00000000008caa55 in PostmasterMain (argc=argc@entry=25, argv=argv@entry=0x2600ffdc01a0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1533
       #25 0x0000000000804ba8 in PostgresServerProcessMain (argc=25, argv=0x2600ffdc01a0) at ../../../../../../src/postgres/src/backend/main/main.c:208
       #26 0x0000000000804bc8 in main ()

       3469    ybFetchNext(YBCPgStatement handle,
       3470                            TupleTableSlot *slot, Oid relid)
       3471    {
       3472            Assert(slot != NULL);
       3473            Assert(TTS_IS_VIRTUAL(slot));

       (gdb) p *slot
       $2 = {type = T_TupleTableSlot, tts_flags = 18, tts_nvalid = 0, tts_ops = 0xeaf5e0 <TTSOpsHeapTuple>, tts_tupleDescriptor = 0x2600ff6416c0, tts_values = 0x2600ff6c2a00, tts_isnull = 0x2600ff6c2a10, tts_mcxt = 0x2600ff6c2000, tts_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 0, yb_item = {ybctid = 0}}, tts_tableOid = 0, tts_yb_insert_oid = 0}

   Fix by making YB Seq Scan always use virtual slot.  This is similar
   to what is done for YB Foreign Scan.

2. segfault in ending scan

   Same simple test case gives segfault at a later stage.

   Details:

       #0  0x00000000007de762 in table_endscan (scan=0x3debfe3ab88) at ../../../../../../src/postgres/src/include/access/tableam.h:997
       #1  ExecEndYbSeqScan (node=node@entry=0x3debfe3a778) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:298
       #2  0x0000000000787a75 in ExecEndNode (node=0x3debfe3a778) at ../../../../../../src/postgres/src/backend/executor/execProcnode.c:649
       #3  0x000000000077ffaf in ExecEndPlan (estate=0x3debfe3a128, planstate=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/execMain.c:1489
       #4  standard_ExecutorEnd (queryDesc=0x2582fdc88928) at ../../../../../../src/postgres/src/backend/executor/execMain.c:503
       #5  0x00000000007800f8 in ExecutorEnd (queryDesc=queryDesc@entry=0x2582fdc88928) at ../../../../../../src/postgres/src/backend/executor/execMain.c:474
       #6  0x00000000006f140c in PortalCleanup (portal=0x2582ff900128) at ../../../../../../src/postgres/src/backend/commands/portalcmds.c:305
       #7  0x0000000000b3c36a in PortalDrop (portal=portal@entry=0x2582ff900128, isTopCommit=isTopCommit@entry=false)
           at ../../../../../../../src/postgres/src/backend/utils/mmgr/portalmem.c:514
       #8  0x000000000097e667 in exec_simple_query (query_string=0x2582ffdc6128 "select * from t;") at ../../../../../../src/postgres/src/backend/tcop/postgres.c:1331
       #9  yb_exec_simple_query_impl (query_string=query_string@entry=0x2582ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5060
       #10 0x000000000097b79a in yb_exec_query_wrapper_one_attempt (exec_context=exec_context@entry=0x2582ffdc6000, restart_data=restart_data@entry=0x7ffc81c0e7d0,
           functor=functor@entry=0x97e028 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2582ffdc6128, attempt=attempt@entry=0, retry=retry@entry=0x7ffc81c0e78f)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5028
       #11 0x000000000097d06c in yb_exec_query_wrapper (exec_context=exec_context@entry=0x2582ffdc6000, restart_data=restart_data@entry=0x7ffc81c0e7d0,
           functor=functor@entry=0x97e028 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2582ffdc6128)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5052
       #12 0x000000000097d0bf in yb_exec_simple_query (query_string=query_string@entry=0x2582ffdc6128 "select * from t;", exec_context=exec_context@entry=0x2582ffdc6000)
           at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5075
       #13 0x000000000097fe7f in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5794
       #14 0x00000000008c8349 in BackendRun (port=0x2582ff8403c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4791
       #15 BackendStartup (port=0x2582ff8403c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4491
       #16 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1878
       #17 0x00000000008caa4a in PostmasterMain (argc=argc@entry=25, argv=argv@entry=0x2582ffdc01a0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1533
       #18 0x0000000000804b9d in PostgresServerProcessMain (argc=25, argv=0x2582ffdc01a0) at ../../../../../../src/postgres/src/backend/main/main.c:208
       #19 0x0000000000804bbd in main ()

       294             /*
       295              * close heap scan
       296              */
       297             if (tsdesc != NULL)
       298                     table_endscan(tsdesc);

   Reason is initial merge 55782d5
   incorrectly merges end of ExecEndYbSeqScan.  Upstream PG
   9ddef36278a9f676c07d0b4d9f33fa22e48ce3b5 removes code, but initial
   merge duplicates lines.  Remove those lines.

Test Plan:
Apply the following patch to activate YB Seq Scan:

    diff --git a/src/postgres/src/backend/optimizer/path/allpaths.c b/src/postgres/src/backend/optimizer/path/allpaths.c
    index 8a4c38a965..854d84a648 100644
    --- a/src/postgres/src/backend/optimizer/path/allpaths.c
    +++ b/src/postgres/src/backend/optimizer/path/allpaths.c
    @@ -576,7 +576,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
                     else
                     {
                         /* Plain relation */
    -                    if (IsYBRelationById(rte->relid))
    +                    if (false)
                         {
                             /*
                              * Using a foreign scan which will use the YB FDW by

On almalinux 8,

    ./yb_build.sh fastdebug --gcc11
    pg15_tests/run_all_tests.sh fastdebug --gcc11 --sj --sp --scb

fails the following tests:

- test_D29546
- test_pg15_regress: yb_pg15
- test_types_geo: yb_pg_box
- test_hash_in_queries: yb_hash_in_queries

Manually check to see that they are due to YB Seq Scan explain output
differences.

Reviewers: aagrawal, tfoucher

Reviewed By: tfoucher

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D31139
d-uspenskiy added a commit that referenced this issue Jan 12, 2024
…ction

Summary:
The are several unit tests which suffers from tsan data race warning with the following stack:

```
WARNING: ThreadSanitizer: data race (pid=38656)
  Read of size 8 at 0x7f6f2a44b038 by thread T21:
    #0 memcpy /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115:5 (pg_ddl_concurrency-test+0x9e197)
    #1 <null> <null> (libnss_sss.so.2+0x72ef) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b)
    #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9)
    #3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7)
    #4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7)
    #5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe)
    #6 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:647:20 (libpq.so.5+0x2c279)
    #7 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${BUILD_ROOT}/../../src/yb/yql/pgwrapper/libpq_utils.cc:278:24 (libpq_utils.so+0x11d6b)
...

  Previous write of size 8 at 0x7f6f2a44b038 by thread T20 (mutexes: write M0):
    #0 mmap64 /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:7485:3 (pg_ddl_concurrency-test+0xda204)
    #1 <null> <null> (libnss_sss.so.2+0x7169) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b)
    #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9)
    #3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7)
    #4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7)
    #5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe)
    #6 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:647:20 (libpq.so.5+0x2c279)
    #7 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${BUILD_ROOT}/../../src/yb/yql/pgwrapper/libpq_utils.cc:278:24 (libpq_utils.so+0x11d6b)
...

  Location is global '??' at 0x7f6f2a44b000 (passwd+0x38)

  Mutex M0 (0x7f6f2af29380) created at:
    #0 pthread_mutex_lock /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (pg_ddl_concurrency-test+0xa464b)
    #1 <null> <null> (libnss_sss.so.2+0x70d6) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b)
    #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9)
    #3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7)
    #4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7)
    #5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe)
...
```

All failing tests has common feature - all of them creates connection to postgres from multiple threads at same time.
On creating new connection the `libpq` library calls the `getpwuid_r` standard function internally. This function is thread safe and tsan warning is not expected there.

Solution is to suppress warning in the `getpwuid_r` function.
**Note:** because there is no `getpwuid_r` function name in the tsan warning stack the warning for the caller function `pqGetpwuid` is suppressed.
Jira: DB-9523

Test Plan: Jenkins

Reviewers: sergei, bogdan

Reviewed By: sergei

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31646
arpang added a commit that referenced this issue Jan 30, 2024
Summary:
The function `yb_single_row_update_or_delete_path` was reworked for PG15 in D27692. There are a few more issues in the function that this revision fixes:

  # With PG commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35,
  ## the target list returned by `build_path_tlist(root, subpath)` only contains the modified and junk columns. So, there is no need to ignore the unspecified columns.
  ## "tableoid" junk column is added for partitioned tables. Ignore it, along with other junk cols, when iterating over `build_path_tlist(root, subpath)`.
  ## `RelOptInfo` entries in `simple_rel_array` corresponding to the non-leaf relations of a partitioned table are NOT NULL. Ignore these when checking the number of relations being updated.
  ## When updating a partitioned table, the child of the UPDATE node is an APPEND node. This append node is skipped in the final plan if it has only one child. Take this into account when applying conditions to the `ModifyTablePath.subpath` by passing the subpath through `get_singleton_append_subpath`.
  # D27692 added an assertion ` Assert(root->update_colnos->length > update_col_index)`, which is incorrect. The pre-existing code comment clearly stated: `.. it is possible that planner adds extra expressions. In particular, we've seen a RowExpr when a view was updated`. As expected, this incorrect assertion fails when updating a view with a trigger (see the added test). Remove the assertion. Instead, move the expression-out-of-range check before the reading `root->update_colnos`.

Test Plan:
Jenkins: rebase: pg15

Added two tests in yb_pg15:
- one, to test whether single-row optimization is invoked when only one partition is updated (fix #1)
- other, to test UPDATE view with INSTEAD OF UPDATE trigger (fix #2). This test is taken from yb_pg_triggers (yb_pg_triggers still fails with unrelated errors)

./yb_build.sh --java-test org.yb.pgsql.TestPg15Regress#testPg15Regress

Reviewers: jason, tnayak, amartsinchyk

Reviewed By: amartsinchyk

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D31709
Sahith02 added a commit that referenced this issue Mar 13, 2024
Summary:
Restore YBC flow currently has preflight checks for:
1. DB version comparison
2. Autoflags check

This diff modifies #1 to check for version numbers greater (compare stable to stable, preview to preview, other combinations result in error).
Autoflags check remains the same.

Test Plan:
Manually test all existing flows work as usual.
Run UTs.
Run itests.

Reviewers: sanketh, vbansal

Reviewed By: vbansal

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D32944
asrinivasanyb pushed a commit to asrinivasanyb/yugabyte-db that referenced this issue Mar 18, 2024
Summary:
Restore YBC flow currently has preflight checks for:
1. DB version comparison
2. Autoflags check

This diff modifies yugabyte#1 to check for version numbers greater (compare stable to stable, preview to preview, other combinations result in error).
Autoflags check remains the same.

Test Plan:
Manually test all existing flows work as usual.
Run UTs.
Run itests.

Reviewers: sanketh, vbansal

Reviewed By: vbansal

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D32944
yusong-yan added a commit that referenced this issue Apr 18, 2024
…retained for CDC"

Summary:
D33131 introduced a segmentation fault which was  identified in multiple tests.
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x00007f4d2b6f3a84 libpthread.so.0`__pthread_mutex_lock + 4
    frame #1: 0x000055d6d1e1190b yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) const [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock[abi:v170002](this=0x00007f4ccb6feaa0, __m=0x0000000000000110) at unique_lock.h:41:11
    frame #2: 0x000055d6d1e118f5 yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(this=0x00000000000000f0, min_allowed=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4ccb6feb08) const at mvcc.cc:500:32
    frame #3: 0x000055d6d1ef58e3 yb-tserver`yb::tablet::TransactionParticipant::Impl::ProcessRemoveQueueUnlocked(this=0x000037e27d26fb00, min_running_notifier=0x00007f4ccb6fef28) at transaction_participant.cc:1537:45
    frame #4: 0x000055d6d1efc11a yb-tserver`yb::tablet::TransactionParticipant::Impl::EnqueueRemoveUnlocked(this=0x000037e27d26fb00, id=<unavailable>, reason=<unavailable>, min_running_notifier=0x00007f4ccb6fef28, expected_deadlock_status=<unavailable>) at transaction_participant.cc:1516:5
    frame #5: 0x000055d6d1e3afbe yb-tserver`yb::tablet::RunningTransaction::DoStatusReceived(this=0x000037e2679b5218, status_tablet="d5922c26c9704f298d6812aff8f615f6", status=<unavailable>, response=<unavailable>, serial_no=56986, shared_self=std::__1::shared_ptr<yb::tablet::RunningTransaction>::element_type @ 0x000037e2679b5218) at running_transaction.cc:424:16
    frame #6: 0x000055d6d0d7db5f yb-tserver`yb::client::(anonymous namespace)::TransactionRpcBase::Finished(this=0x000037e29c80b420, status=<unavailable>) at transaction_rpc.cc:67:7
```
This diff reverts the change to unblock the tests.

The proper fix for this problem is WIP
Jira: DB-10780, DB-10466

Test Plan: Jenkins: urgent

Reviewers: rthallam

Reviewed By: rthallam

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34245
karthik-ramanathan-3006 added a commit that referenced this issue May 17, 2024
Summary:
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                #3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: ybase, smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35116
jharveysmith pushed a commit that referenced this issue May 24, 2024
…CA having cert chain in YBA trust's store

Summary:
Currently YBA assumes that the CA certs added to YBA trust store will be a single root
cert.
With this diff we enable the support for cert chain as well.
This was observed in fidelity environment where our migration V274 failed for the same reason.

Some minor other improvements/fixes -
  - Fix the deletion of CA certs from YBA's trust store. In case the deletion fails in the first attempt the `certContent` that stores the filePath starts storing `certContent` which causes the subsequent deletion attempt to fail - This diff fixes it.

[PLAT-11176][PLAT-11170] Pass Java PKCS TrustStore for play.ws.ssl connections

This diff fixes two issues -
  - **PLAT-11176**: Previously, we were only passing YBA's PEM trust store from the custom CA trust store for `play.ws.ssl` TLS handshakes. Consequently, when we attempted to upload multiple CA certificates to YBA's trust store, it resulted in SSL handshake failures for the previously uploaded certificates. With this update, we have included YBA's Java trust store as well.

  - **PLAT-11170**: There was an issue with deletion of CA cert from YBA's trust store. Specifically, when we had uploaded one certificate chain and another certificate that only contained the root of the previously uploaded certificate chain, the deletion of the latter was failing. This issue has been resolved in this diff.

Depends on - D29985, D29143
Original Commit -
43160f4
82f944a

Test Plan:
**Case1**
  - Ran the migration with the fidelity postgres dump.
  - Ensured that the certs are correctly importerd in both YBA's PKCS12/PEM trust store.

**Case2**
  - Deployed a keycloak server (OIDC server) - [[ https://10.23.16.17:8443 | https://10.23.16.17/ ]] that supports custom certs.
  - Created a cert chain certificates (root -> intermediate -> client).
  - Deployed the above server with client certificate.
  - Added the root/intermediate certs in YBA's trust store.
  - Ensured authentication is successful.
  - Deleted the certs from YBA trust store.
  - Now ensured SSO login is broken.
  - Uploaded partial, i.e., root only cert to YBA trust store.
  - Ensured that SSO login is broken.

**Case3**
 - Verified crud for the custom CA trust store.

**Case4**
 - Added a cert chain with root (r1) & intermediate (i1) -> (cert1)
 - Added another cert chain with root(r1) & intermediate (i2) -> (cert2)
 - Ensured our PEM store contains 3 entries now.
 - Removed cert1 from the trust store.
 - Verified that r1 & i2 are present in the YBA's PEM store.
 - Added back cert1 in trust store.
 - Replaced cert1 with some other cert chain -> (cert3) [root (r2) & intermediate i3]
 - Verified that PEM trust store contain now 4 certs -> [r1, i2, r2, i3].
 - For PKCS12 store, we add/remove/delete based on the alias (cert name). So we don't need any special handling for that.

**Case5**
  - Ensured that the migration V274 is idempotent, i.e, the directory created are cleared in case the migration fails, so that we remains in the same state from YBA's perspective.

iTest pipeline
UT's

CA trust store related iTests

**PLAT-11170**
  - Uploaded the root cert to YBA's trust store.
  - Created a certificate chain using the root certificate mentioned above and also uploaded it.
  - Verified that deletion of cert uploaded in #1 was successful.

**PLAT-11176**
  - Created HA setup with two standup portals.
  - Each portal is using it's own custom CA certs.
  - Uploaded both the cert chains to YBA's trust store.
  - Verified that the backup is successful on both the standby setups configured.

Reviewers: #yba-api-review, nbhatia, cwang, amalyshev

Reviewed By: amalyshev

Subscribers: yugaware

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D30054
svarnau pushed a commit that referenced this issue May 25, 2024
…retained for CDC"

Summary:
D33131 introduced a segmentation fault which was  identified in multiple tests.
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x00007f4d2b6f3a84 libpthread.so.0`__pthread_mutex_lock + 4
    frame #1: 0x000055d6d1e1190b yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) const [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock[abi:v170002](this=0x00007f4ccb6feaa0, __m=0x0000000000000110) at unique_lock.h:41:11
    frame #2: 0x000055d6d1e118f5 yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(this=0x00000000000000f0, min_allowed=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4ccb6feb08) const at mvcc.cc:500:32
    frame #3: 0x000055d6d1ef58e3 yb-tserver`yb::tablet::TransactionParticipant::Impl::ProcessRemoveQueueUnlocked(this=0x000037e27d26fb00, min_running_notifier=0x00007f4ccb6fef28) at transaction_participant.cc:1537:45
    frame #4: 0x000055d6d1efc11a yb-tserver`yb::tablet::TransactionParticipant::Impl::EnqueueRemoveUnlocked(this=0x000037e27d26fb00, id=<unavailable>, reason=<unavailable>, min_running_notifier=0x00007f4ccb6fef28, expected_deadlock_status=<unavailable>) at transaction_participant.cc:1516:5
    frame #5: 0x000055d6d1e3afbe yb-tserver`yb::tablet::RunningTransaction::DoStatusReceived(this=0x000037e2679b5218, status_tablet="d5922c26c9704f298d6812aff8f615f6", status=<unavailable>, response=<unavailable>, serial_no=56986, shared_self=std::__1::shared_ptr<yb::tablet::RunningTransaction>::element_type @ 0x000037e2679b5218) at running_transaction.cc:424:16
    frame #6: 0x000055d6d0d7db5f yb-tserver`yb::client::(anonymous namespace)::TransactionRpcBase::Finished(this=0x000037e29c80b420, status=<unavailable>) at transaction_rpc.cc:67:7
```
This diff reverts the change to unblock the tests.

The proper fix for this problem is WIP
Jira: DB-10780, DB-10466

Test Plan: Jenkins: urgent

Reviewers: rthallam

Reviewed By: rthallam

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34245
svarnau pushed a commit that referenced this issue May 25, 2024
Summary:
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                #3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: ybase, smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35116
svarnau pushed a commit that referenced this issue May 29, 2024
… SIGTERM

Summary:
Original commit: 5862233 / D35116
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                #3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: yql, smishra, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35169
karthik-ramanathan-3006 added a commit that referenced this issue Jun 6, 2024
…IGTERM

Summary:
Original commit: 5862233 / D35116
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                #3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: yql, smishra, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35171
karthik-ramanathan-3006 added a commit that referenced this issue Jun 6, 2024
…IGTERM

Summary:
Original commit: 5862233 / D35116
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                #3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: ybase, smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35170
jasonyb pushed a commit that referenced this issue Jun 11, 2024
The pg_stat_monitor is based on PostgreSQL-11's pg_stat_statement.
To keep track of the changes, this is the base code of
PostgreSQL's pg_stat_statement.

(commit = d898edf4f233a3ffe6a0da64179fc268a1d46200).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants