Implement the SCAN Redis command #2

hengestone · 2017-11-07T15:27:58Z

In order to make searching more useful.

hectorgcr · 2017-11-07T23:32:06Z

Thanks for the input @hengestone

I will be writing a design document for this feature and will share it here once it's ready.

* Created a docs directory (#2) Docs directory for design docs, discussions, etc. Also added a basic README. * Removed extra underline in the README doc.

… memtable Summary: There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message. ``` #0 operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178 #1 std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857 #2 0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267 #3 rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...) at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259 #4 0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143 #5 0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586 #6 0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816 #7 0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838 #8 0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133 #9 0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157 #10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333 ``` Test Plan: Jenkins Reviewers: hector, sergei Reviewed By: hector, sergei Subscribers: sergei, bogdan, bharat, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D4044

rahuldesirazu · 2018-04-06T18:47:37Z

SCAN cursor [MATCH pattern] [COUNT count]

https://redis.io/commands/scan

Overall Notes

In a nutshell, SCAN is a cursor based function that returns all redis keys in a cluster. Here are a few important pieces from the spec:

A full iteration always retrieves all the elements that were present in the collection from the start to the end of a full iteration. This means that if a given element is inside the collection when an iteration is started, and is still there when an iteration terminates, then at some point SCAN returned it to the user.

A full iteration only returns elements that were present at some point during the iteration. So if an element was removed before the start of an iteration, and is never added back to the collection for all the time an iteration lasts, SCAN ensures that this element will never be returned.

Calling SCAN with a broken, negative, out of range, or otherwise invalid cursor, will result into undefined behavior but never into a crash. What will be undefined is that the guarantees about the returned elements can no longer be ensured by the SCAN implementation.

It is important to note that the Match filter is applied after elements are retrieved from the collection, just before returning data to the client.

Count is just a hint for the implementation, however generally speaking this is what you could expect most of the times from the implementation.

Count

The Redis default is 10, and we will default to the same value. The specified count cannot be too large, or else the proxy might end up processing too many keys at once. Therefore, we should have some threshold gflag such that if count is larger than this number, we default to the threshold.

Match

Most importantly, the match filter is applied after all the elements have been retrieved and right before the set is returned to the client. With this in mind, we will designate the tserver to apply the filter, and return as part of the response to the proxy, both the number of total keys seen, and the list of keys that match the filter. This will result in less memory used on the proxy for processing, especially if the filter matches very few keys and the count is high.

Cursor

The return value of the cursor is a string representing an unsigned 64 bit number. Some redis clients take a string parameter for the cursor, while others take an unsigned long. For the current implementation, we plan to designate the first 2 bytes for the hash code, and the next 6 bytes representing the first 6 characters of the next key to start at. When we iterate using the cursor, we use to the first 2 bytes to find the appropriate hash partition and then seek to the smallest key starting with those 6 bytes. This could potentially end up repeated keys if multiple keys with the same 6 byte prefix

In a future implementation, we could have a string cursor implementation that returns : so that we can seek directly to the key just larger than the last key and not repeat any elements. Some clients only support long cursors, so depending on the format received by the redis server, the server will either do the long SCAN as described above or the string SCAN which would directly seek to the next largest key. The string cursor for the start of the collection would be represented as “0:” instead of “0”.

In the case of a corrupted cursor being used, we should error out to the client.

We need to worry about SCAN not terminating if one hash value has more keys with a certain prefix than the count specified in the query, since it can keep repeating the same keys on each iteration without moving on the next hash value. Since count is an implementation hint and not a requirement, to work around this we can process more keys than count until we are at a prefix greater than what we started at. As long as the cursor changes between calls, we are making progress and SCAN will eventually return to 0.

mbautin · 2018-04-06T20:51:42Z

@rahuldesirazu: I think we could do some optimizations for the pattern case. We could restrict the range of primary keys to scan in case the pattern does not start with a *. For each tablet, we can start scanning, find an existing hash value, and immediately seek to the earliest key with that hash value that could match the pattern. When we scan all the keys with this hash value that match the pattern, we will seek to the incremented hash value.

…EANUP Summary: We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service. This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload. ```#0 yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0, resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222 #1 0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...) at ../../src/yb/tserver/tablet_service.cc:431 #2 0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267 #3 0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214``` Changed LookupTablePeerOrRespond to return complete result using return value. Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable. Reviewers: robert, hector, mikhail, kannan Reviewed By: mikhail, kannan Subscribers: kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D5772

rahuldesirazu · 2019-06-05T21:32:26Z

@hengestone Thanks for your patience. For now, we're no longer adding new Redis functionality to YugaByte. However, we do have support for the KEYS command.

https://docs.yugabyte.com/latest/yedis/api/keys/

* Client drivers for YSQL * Client drivers for YSQL * Client Drivers for YSQL / Java * Client driver for YSQL / Go * Addressed review comments

Summary: New commit includes fix for https://github.com/YugaByte/yugabyte-installation/issues/9 Dupe of https://phabricator.dev.yugabyte.com/D6792 due to commit getting pulled back in https://phabricator.dev.yugabyte.com/D6795 Test Plan: Jenkins: skip Reviewers: mikhail, bogdan Reviewed By: bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D6803

…data Summary: Originally issue has been discovered as `RaftConsensusITest.TestAddRemoveVoter` random test failure in TSAN mode due to a data race. ``` WARNING: ThreadSanitizer: data race (pid=11050) 1663 [ts-2] Write of size 8 at 0x7b4c000603a8 by thread T51 (mutexes: write M3613): ... 1674 [ts-2] #10 yb::tablet::KvStoreInfo::LoadTablesFromPB(google::protobuf::RepeatedPtrField<yb::tablet::TableInfoPB>, string) src/yb/tablet/tablet_metadata.cc:170 1675 [ts-2] #11 yb::tablet::KvStoreInfo::LoadFromPB(yb::tablet::KvStoreInfoPB const&, string) src/yb/tablet/tablet_metadata.cc:189:10 1676 [ts-2] #12 yb::tablet::RaftGroupMetadata::LoadFromSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) src/yb/tablet/tablet_metadata.cc:508:5 1677 [ts-2] #13 yb::tablet::RaftGroupMetadata::ReplaceSuperBlock(yb::tablet::RaftGroupReplicaSuperBlockPB const&) src/yb/tablet/tablet_metadata.cc:545:3 1678 [ts-2] #14 yb::tserver::RemoteBootstrapClient::Finish() src/yb/tserver/remote_bootstrap_client.cc:486:3 ... Previous read of size 4 at 0x7b4c000603a8 by thread T16: 1697 [ts-2] #0 yb::tablet::RaftGroupMetadata::schema_version() const src/yb/tablet/tablet_metadata.h:251:34 1698 [ts-2] #1 yb::tserver::TSTabletManager::CreateReportedTabletPB(std::__1::shared_ptr<yb::tablet::TabletPeer> const&, yb::master::ReportedTabletPB*) src/yb/tserver/ts_tablet_manager.cc:1323:71 1699 [ts-2] #2 yb::tserver::TSTabletManager::GenerateIncrementalTabletReport(yb::master::TabletReportPB*) src/yb/tserver/ts_tablet_manager.cc:1359:5 1700 [ts-2] #3 yb::tserver::Heartbeater::Thread::TryHeartbeat() src/yb/tserver/heartbeater.cc:371:32 1701 [ts-2] #4 yb::tserver::Heartbeater::Thread::DoHeartbeat() src/yb/tserver/heartbeater.cc:531:19 ``` The reason is that although `RaftGroupMetadata::schema_version()` is getting `TableInfo` pointer from `primary_table_info()` under mutex lock, but then it accesses its field without lock. Added `RaftGroupMetadata::primary_table_info_guarded()` private method which returns a pair of `TableInfo*` and `std::unique_lock` and used it in `RaftGroupMetadata::schema_version()` and other `RaftGroupMetadata` functions accessing primary table info fields. Test Plan: `ybd tsan --sj --cxx-test integration-tests_raft_consensus-itest --gtest_filter RaftConsensusITest.TestAddRemoveVoter -n 1000` Reviewers: bogdan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D6813

…ed to the earlier commit 864e72b Original commit message: ENG-2793 Do not fail when deciding if we can flush an empty immutable memtable Summary: There was a crash during one of our performance integration tests that was caused by Frontiers() not being set on a memtable. That could only possibly happen if the memtable is empty, and it is still not clear how an empty memtable could get into the list of immutable memtables. Regardless of that, instead of crashing, we should just flush that memtable and log an error message. ``` #0 operator() (memtable=..., __closure=0x7f2e454b67b0) at ../../../../../src/yb/tablet/tablet_peer.cc:178 #1 std::_Function_handler<bool(const rocksdb::MemTable&), yb::tablet::TabletPeer::InitTabletPeer(const std::shared_ptr<yb::tablet::enterprise::Tablet>&, const std::shared_future<std::shared_ptr<yb::client::YBClient> >&, const scoped_refptr<yb::server::Clock>&, const std::shared_ptr<yb::rpc::Messenger>&, const scoped_refptr<yb::log::Log>&, const scoped_refptr<yb::MetricEntity>&, yb::ThreadPool*)::<lambda()>::<lambda(const rocksdb::MemTable&)> >::_M_invoke(const std::_Any_data &, const rocksdb::MemTable &) (__functor=..., __args#0=...) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:1857 #2 0x00007f2f7346a70e in operator() (__args#0=..., this=0x7f2e454b67b0) at /n/jenkins/linuxbrew/linuxbrew_2018-01-09T08_28_02/Cellar/gcc/5.5.0/include/c++/5.5.0/functional:2267 #3 rocksdb::MemTableList::PickMemtablesToFlush(rocksdb::autovector<rocksdb::MemTable*, 8ul>*, std::function<bool (rocksdb::MemTable const&)> const&) (this=0x7d02978, ret=ret@entry=0x7f2e454b6370, filter=...) at ../../../../../src/yb/rocksdb/db/memtable_list.cc:259 #4 0x00007f2f7345517f in rocksdb::FlushJob::Run (this=this@entry=0x7f2e454b6750, file_meta=file_meta@entry=0x7f2e454b68d0) at ../../../../../src/yb/rocksdb/db/flush_job.cc:143 #5 0x00007f2f7341b7c3 in rocksdb::DBImpl::FlushMemTableToOutputFile (this=this@entry=0x89d2400, cfd=cfd@entry=0x7d02300, mutable_cf_options=..., made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:1586 #6 0x00007f2f7341c19f in rocksdb::DBImpl::BackgroundFlush (this=this@entry=0x89d2400, made_progress=made_progress@entry=0x7f2e454b709e, job_context=job_context@entry=0x7f2e454b70b0, log_buffer=log_buffer@entry=0x7f2e454b7280) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2816 #7 0x00007f2f7342539b in rocksdb::DBImpl::BackgroundCallFlush (this=0x89d2400) at ../../../../../src/yb/rocksdb/db/db_impl.cc:2838 #8 0x00007f2f735154c3 in rocksdb::ThreadPool::BGThread (this=0x3b0bb20, thread_id=0) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:133 #9 0x00007f2f73515558 in rocksdb::BGThreadWrapper (arg=0xd970a20) at ../../../../../src/yb/rocksdb/util/thread_posix.cc:157 #10 0x00007f2f6c964694 in start_thread (arg=0x7f2e454b8700) at pthread_create.c:333 ``` Test Plan: Jenkins Reviewers: hector, sergei Reviewed By: hector, sergei Subscribers: sergei, bogdan, bharat, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D4044

…ed to the earlier commit 566d6d2 Original commit message: ENG-4240: #613: Fix checking of tablet presence during transaction CLEANUP Summary: We were failing to check the return code of the function `LookupTablePeerOrRespond` when CLEANUP request is received by tablet service. This was causing the following FATAL right after restart during software upgrade on a cluster with SecondaryIndex workload. ```#0 yb::tserver::TabletServiceImpl::CheckMemoryPressure<yb::tserver::UpdateTransactionResponsePB> (this=this@entry=0x24c2e00, tablet=tablet@entry=0x0, resp=resp@entry=0x14d3d410, context=context@entry=0x7f55b1eb5600) at ../../src/yb/tserver/tablet_service.cc:222 #1 0x00007f55d4c8a881 in yb::tserver::TabletServiceImpl::UpdateTransaction (this=this@entry=0x24c2e00, req=req@entry=0x1057aa90, resp=resp@entry=0x14d3d410, context=...) at ../../src/yb/tserver/tablet_service.cc:431 #2 0x00007f55d273f28a in yb::tserver::TabletServerServiceIf::Handle (this=0x24c2e00, call=...) at src/yb/tserver/tserver_service.service.cc:267 #3 0x00007f55cff0a3ea in yb::rpc::ServicePoolImpl::Handle (this=0x27ca540, incoming=...) at ../../src/yb/rpc/service_pool.cc:214``` Changed LookupTablePeerOrRespond to return complete result using return value. Test Plan: Update xdc-user-identity and check that is does not crash and workload is stable. Reviewers: robert, hector, mikhail, kannan Reviewed By: mikhail, kannan Subscribers: kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D5772

…rt -1 Summary: Original commit: 651a2e8 / D29938 One of the first tasks that are kicked off during an edit universe is for DiskResizing. This change makes the createResizeDiskTask function idempotent. It will only create the disk resize tasks if the size specified is different from the current volume on the pod. Test Plan: Tested by making task abortable and retryable, retried the edit kubernetes task after aborting disk resize in the middle. ``` YW 2023-11-03T06:04:18.533Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from EditKubernetesUniverse in TaskPool-6 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out5549963527175565003tmp, stderr=/tmp/shell_process_err5819548201875528501tmp YW 2023-11-03T06:04:19.095Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 508 ms ] YW 2023-11-03T06:04:19.104Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Incrementing RF for us-west1-a to: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Number of nodes in us-west1-a: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCheckVolumeExpansion YW 2023-11-03T06:04:19.105Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCheckVolumeExpansion details= {"platformVersion":"2.21.0.0-PRE_RELEASE","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"newNamingStyle":true,"namespace":"yb-admin-test1","providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","helmReleaseName":"ybtest1-us-west1-a-twed"} YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #0: KubernetesVolumeInfo YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from AbstractTaskBase in TaskPool-6 - Executor name: task YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.110Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"STS_DELETE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.111Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #1: ResizingDisk YW 2023-11-03T06:04:19.113Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning YW 2023-11-03T06:04:19.115Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.117Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"PVC_EXPAND_SIZE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.119Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #2: ResizingDisk YW 2023-11-03T06:04:19.120Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning ... ``` Verified disk size was increased ``` [centos@dev-server-anijhawan-4 managed]$ kubectl -n yb-admin-test1 get pvc ybtest1-us-west1-b-uwed-datadir0-ybtest1-us-west1-b-uwed-yb-tserver-0 ybtest1-us-west1-a-twed-datadir0-ybtest1-us-west1-a-twed-yb-tserver-0 ybtest1-us-west1-c-vwed-datadir0-ybtest1-us-west1-c-vwed-yb-tserver-0 -o yaml | grep storage volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi ``` Retry logs we can see function was invoke but task creation was skipped. ``` YW 2023-11-03T06:07:10.173Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TaskExecutor in TaskPool-7 - Invoking run() of task EditKubernetesUniverse(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:07:10.173Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from CustomerTaskController in application-akka.actor.default-dispatcher-2292 - Saved task uuid 66611664-a25f-4ad2-93aa-e40a7db67654 in customer tasks table for target 347eb7be-88b5-44ed-b519-1052487e5ced:test1 YW 2023-11-03T06:07:10.322Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TransactionUtil in TaskPool-7 - Trying(1)... YW 2023-11-03T06:07:10.333Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from UniverseTaskBase in TaskPool-7 - Cancelling any active health-checks for universe 347eb7be-88b5-44ed-b519-1052487e5ced YW 2023-11-03T06:07:10.379Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from EditKubernetesUniverse in TaskPool-7 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out15761747450556728945tmp, stderr=/tmp/shell_process_err16162390392062292532tmp YW 2023-11-03T06:07:10.941Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed' '-o' 'json' - logging stdout=/tmp/shell_process_out16328458040940971014tmp, stderr=/tmp/shell_process_err9595293916813332432tmp YW 2023-11-03T06:07:11.487Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:11.526Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json YW 2023-11-03T06:07:11.527Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed' '-o' 'json' - logging stdout=/tmp/shell_process_out11035907328384396246tmp, stderr=/tmp/shell_process_err3826067280996541352tmp YW 2023-11-03T06:07:12.031Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json' status=success [ 505 ms ] ``` Reviewers: sanketh, nsingh, sneelakantan, dshubin, cwang, nbhatia Reviewed By: cwang, nbhatia Subscribers: cwang, nbhatia, yugaware Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30901

…part -1 Summary: Original commit: 98de5da / D30901 One of the first tasks that are kicked off during an edit universe is for DiskResizing. This change makes the createResizeDiskTask function idempotent. It will only create the disk resize tasks if the size specified is different from the current volume on the pod. Test Plan: Tested by making task abortable and retryable, retried the edit kubernetes task after aborting disk resize in the middle. ``` YW 2023-11-03T06:04:18.533Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from EditKubernetesUniverse in TaskPool-6 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out5549963527175565003tmp, stderr=/tmp/shell_process_err5819548201875528501tmp YW 2023-11-03T06:04:19.095Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 508 ms ] YW 2023-11-03T06:04:19.104Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Incrementing RF for us-west1-a to: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Number of nodes in us-west1-a: 1 YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCheckVolumeExpansion YW 2023-11-03T06:04:19.105Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCheckVolumeExpansion details= {"platformVersion":"2.21.0.0-PRE_RELEASE","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"newNamingStyle":true,"namespace":"yb-admin-test1","providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","helmReleaseName":"ybtest1-us-west1-a-twed"} YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #0: KubernetesVolumeInfo YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from AbstractTaskBase in TaskPool-6 - Executor name: task YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.110Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"STS_DELETE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.111Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #1: ResizingDisk YW 2023-11-03T06:04:19.113Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning YW 2023-11-03T06:04:19.115Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:04:19.117Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"PVC_EXPAND_SIZE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]} YW 2023-11-03T06:04:19.119Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #2: ResizingDisk YW 2023-11-03T06:04:19.120Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning ... ``` Verified disk size was increased ``` [centos@dev-server-anijhawan-4 managed]$ kubectl -n yb-admin-test1 get pvc ybtest1-us-west1-b-uwed-datadir0-ybtest1-us-west1-b-uwed-yb-tserver-0 ybtest1-us-west1-a-twed-datadir0-ybtest1-us-west1-a-twed-yb-tserver-0 ybtest1-us-west1-c-vwed-datadir0-ybtest1-us-west1-c-vwed-yb-tserver-0 -o yaml | grep storage volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io storage: 200Gi storageClassName: yb-standard storage: 200Gi ``` Retry logs we can see function was invoke but task creation was skipped. ``` YW 2023-11-03T06:07:10.173Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TaskExecutor in TaskPool-7 - Invoking run() of task EditKubernetesUniverse(347eb7be-88b5-44ed-b519-1052487e5ced) YW 2023-11-03T06:07:10.173Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from CustomerTaskController in application-akka.actor.default-dispatcher-2292 - Saved task uuid 66611664-a25f-4ad2-93aa-e40a7db67654 in customer tasks table for target 347eb7be-88b5-44ed-b519-1052487e5ced:test1 YW 2023-11-03T06:07:10.322Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TransactionUtil in TaskPool-7 - Trying(1)... YW 2023-11-03T06:07:10.333Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from UniverseTaskBase in TaskPool-7 - Cancelling any active health-checks for universe 347eb7be-88b5-44ed-b519-1052487e5ced YW 2023-11-03T06:07:10.379Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from EditKubernetesUniverse in TaskPool-7 - Creating task for disk size change from 100 to 200 YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out15761747450556728945tmp, stderr=/tmp/shell_process_err16162390392062292532tmp YW 2023-11-03T06:07:10.941Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed' '-o' 'json' - logging stdout=/tmp/shell_process_out16328458040940971014tmp, stderr=/tmp/shell_process_err9595293916813332432tmp YW 2023-11-03T06:07:11.487Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json' status=success [ 505 ms ] YW 2023-11-03T06:07:11.526Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json YW 2023-11-03T06:07:11.527Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed' '-o' 'json' - logging stdout=/tmp/shell_process_out11035907328384396246tmp, stderr=/tmp/shell_process_err3826067280996541352tmp YW 2023-11-03T06:07:12.031Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json' status=success [ 505 ms ] ``` Reviewers: sanketh, nsingh, sneelakantan, dshubin, cwang, nbhatia Reviewed By: nsingh, dshubin Subscribers: yugaware, nbhatia, cwang Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31095

Summary: YB Seq Scan code path is not hit because Foreign Scan is the default and pg hint plan does not work. Upcoming merge with YB master will bring in master commit 465ee2c which changes the default to YB Seq Scan. To test YB Seq Scan, a temporary patch is needed (see the test plan). With that, two bugs are encountered: fix them. 1. FailedAssertion("TTS_IS_VIRTUAL(slot)" On simple test case create table t (i int primary key, j int); select * from t; get TRAP: FailedAssertion("TTS_IS_VIRTUAL(slot)", File: "../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c", Line: 3473, PID: 2774450) Details: #0 0x00007fd52616eacf in raise () from /lib64/libc.so.6 #1 0x00007fd526141ea5 in abort () from /lib64/libc.so.6 #2 0x0000000000af33ad in ExceptionalCondition (conditionName=conditionName@entry=0xc2938d "TTS_IS_VIRTUAL(slot)", errorType=errorType@entry=0xc01498 "FailedAssertion", fileName=fileName@entry=0xc28f18 "../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c", lineNumber=lineNumber@entry=3473) at ../../../../../../../src/postgres/src/backend/utils/error/assert.c:69 #3 0x00000000005c26bd in ybFetchNext (handle=0x2600ffc43680, slot=slot@entry=0x2600ff6c2980, relid=16384) at ../../../../../../../src/postgres/src/backend/access/yb_access/yb_scan.c:3473 #4 0x00000000007de444 in YbSeqNext (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:156 #5 0x000000000078b3c6 in ExecScanFetch (node=node@entry=0x2600ff6c2778, accessMtd=accessMtd@entry=0x7de2b9 <YbSeqNext>, recheckMtd=recheckMtd@entry=0x7de26e <YbSeqRecheck>) at ../../../../../../src/postgres/src/backend/executor/execScan.c:133 #6 0x000000000078b44e in ExecScan (node=0x2600ff6c2778, accessMtd=accessMtd@entry=0x7de2b9 <YbSeqNext>, recheckMtd=recheckMtd@entry=0x7de26e <YbSeqRecheck>) at ../../../../../../src/postgres/src/backend/executor/execScan.c:182 #7 0x00000000007de298 in ExecYbSeqScan (pstate=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:191 #8 0x00000000007871ef in ExecProcNodeFirst (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/backend/executor/execProcnode.c:480 #9 0x000000000077db0e in ExecProcNode (node=0x2600ff6c2778) at ../../../../../../src/postgres/src/include/executor/executor.h:285 #10 ExecutePlan (execute_once=<optimized out>, dest=0x2600ff6b1a10, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x2600ff6c2778, estate=0x2600ff6c2128) at ../../../../../../src/postgres/src/backend/executor/execMain.c:1650 #11 standard_ExecutorRun (queryDesc=0x2600ff675128, direction=<optimized out>, count=0, execute_once=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/execMain.c:367 #12 0x000000000077dbfe in ExecutorRun (queryDesc=queryDesc@entry=0x2600ff675128, direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/execMain.c:308 #13 0x0000000000982617 in PortalRunSelect (portal=portal@entry=0x2600ff90e128, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x2600ff6b1a10) at ../../../../../../src/postgres/src/backend/tcop/pquery.c:954 #14 0x000000000098433c in PortalRun (portal=portal@entry=0x2600ff90e128, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x2600ff6b1a10, altdest=altdest@entry=0x2600ff6b1a10, qc=0x7fffc14a13c0) at ../../../../../../src/postgres/src/backend/tcop/pquery.c:786 #15 0x000000000097e65b in exec_simple_query (query_string=0x2600ffdc6128 "select * from t;") at ../../../../../../src/postgres/src/backend/tcop/postgres.c:1321 #16 yb_exec_simple_query_impl (query_string=query_string@entry=0x2600ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5060 #17 0x000000000097b7a5 in yb_exec_query_wrapper_one_attempt (exec_context=exec_context@entry=0x2600ffdc6000, restart_data=restart_data@entry=0x7fffc14a1640, functor=functor@entry=0x97e033 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2600ffdc6128, attempt=attempt@entry=0, retry=retry@entry=0x7fffc14a15ff) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5028 #18 0x000000000097d077 in yb_exec_query_wrapper (exec_context=exec_context@entry=0x2600ffdc6000, restart_data=restart_data@entry=0x7fffc14a1640, functor=functor@entry=0x97e033 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2600ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5052 #19 0x000000000097d0ca in yb_exec_simple_query (query_string=query_string@entry=0x2600ffdc6128 "select * from t;", exec_context=exec_context@entry=0x2600ffdc6000) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5075 #20 0x000000000097fe8a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5794 #21 0x00000000008c8354 in BackendRun (port=0x2600ff8423c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4791 #22 BackendStartup (port=0x2600ff8423c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4491 #23 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1878 #24 0x00000000008caa55 in PostmasterMain (argc=argc@entry=25, argv=argv@entry=0x2600ffdc01a0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1533 #25 0x0000000000804ba8 in PostgresServerProcessMain (argc=25, argv=0x2600ffdc01a0) at ../../../../../../src/postgres/src/backend/main/main.c:208 #26 0x0000000000804bc8 in main () 3469 ybFetchNext(YBCPgStatement handle, 3470 TupleTableSlot *slot, Oid relid) 3471 { 3472 Assert(slot != NULL); 3473 Assert(TTS_IS_VIRTUAL(slot)); (gdb) p *slot $2 = {type = T_TupleTableSlot, tts_flags = 18, tts_nvalid = 0, tts_ops = 0xeaf5e0 <TTSOpsHeapTuple>, tts_tupleDescriptor = 0x2600ff6416c0, tts_values = 0x2600ff6c2a00, tts_isnull = 0x2600ff6c2a10, tts_mcxt = 0x2600ff6c2000, tts_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 0, yb_item = {ybctid = 0}}, tts_tableOid = 0, tts_yb_insert_oid = 0} Fix by making YB Seq Scan always use virtual slot. This is similar to what is done for YB Foreign Scan. 2. segfault in ending scan Same simple test case gives segfault at a later stage. Details: #0 0x00000000007de762 in table_endscan (scan=0x3debfe3ab88) at ../../../../../../src/postgres/src/include/access/tableam.h:997 #1 ExecEndYbSeqScan (node=node@entry=0x3debfe3a778) at ../../../../../../src/postgres/src/backend/executor/nodeYbSeqscan.c:298 #2 0x0000000000787a75 in ExecEndNode (node=0x3debfe3a778) at ../../../../../../src/postgres/src/backend/executor/execProcnode.c:649 #3 0x000000000077ffaf in ExecEndPlan (estate=0x3debfe3a128, planstate=<optimized out>) at ../../../../../../src/postgres/src/backend/executor/execMain.c:1489 #4 standard_ExecutorEnd (queryDesc=0x2582fdc88928) at ../../../../../../src/postgres/src/backend/executor/execMain.c:503 #5 0x00000000007800f8 in ExecutorEnd (queryDesc=queryDesc@entry=0x2582fdc88928) at ../../../../../../src/postgres/src/backend/executor/execMain.c:474 #6 0x00000000006f140c in PortalCleanup (portal=0x2582ff900128) at ../../../../../../src/postgres/src/backend/commands/portalcmds.c:305 #7 0x0000000000b3c36a in PortalDrop (portal=portal@entry=0x2582ff900128, isTopCommit=isTopCommit@entry=false) at ../../../../../../../src/postgres/src/backend/utils/mmgr/portalmem.c:514 #8 0x000000000097e667 in exec_simple_query (query_string=0x2582ffdc6128 "select * from t;") at ../../../../../../src/postgres/src/backend/tcop/postgres.c:1331 #9 yb_exec_simple_query_impl (query_string=query_string@entry=0x2582ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5060 #10 0x000000000097b79a in yb_exec_query_wrapper_one_attempt (exec_context=exec_context@entry=0x2582ffdc6000, restart_data=restart_data@entry=0x7ffc81c0e7d0, functor=functor@entry=0x97e028 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2582ffdc6128, attempt=attempt@entry=0, retry=retry@entry=0x7ffc81c0e78f) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5028 #11 0x000000000097d06c in yb_exec_query_wrapper (exec_context=exec_context@entry=0x2582ffdc6000, restart_data=restart_data@entry=0x7ffc81c0e7d0, functor=functor@entry=0x97e028 <yb_exec_simple_query_impl>, functor_context=functor_context@entry=0x2582ffdc6128) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5052 #12 0x000000000097d0bf in yb_exec_simple_query (query_string=query_string@entry=0x2582ffdc6128 "select * from t;", exec_context=exec_context@entry=0x2582ffdc6000) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5075 #13 0x000000000097fe7f in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ../../../../../../src/postgres/src/backend/tcop/postgres.c:5794 #14 0x00000000008c8349 in BackendRun (port=0x2582ff8403c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4791 #15 BackendStartup (port=0x2582ff8403c0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4491 #16 ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1878 #17 0x00000000008caa4a in PostmasterMain (argc=argc@entry=25, argv=argv@entry=0x2582ffdc01a0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1533 #18 0x0000000000804b9d in PostgresServerProcessMain (argc=25, argv=0x2582ffdc01a0) at ../../../../../../src/postgres/src/backend/main/main.c:208 #19 0x0000000000804bbd in main () 294 /* 295 * close heap scan 296 */ 297 if (tsdesc != NULL) 298 table_endscan(tsdesc); Reason is initial merge 55782d5 incorrectly merges end of ExecEndYbSeqScan. Upstream PG 9ddef36278a9f676c07d0b4d9f33fa22e48ce3b5 removes code, but initial merge duplicates lines. Remove those lines. Test Plan: Apply the following patch to activate YB Seq Scan: diff --git a/src/postgres/src/backend/optimizer/path/allpaths.c b/src/postgres/src/backend/optimizer/path/allpaths.c index 8a4c38a965..854d84a648 100644 --- a/src/postgres/src/backend/optimizer/path/allpaths.c +++ b/src/postgres/src/backend/optimizer/path/allpaths.c @@ -576,7 +576,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, else { /* Plain relation */ - if (IsYBRelationById(rte->relid)) + if (false) { /* * Using a foreign scan which will use the YB FDW by On almalinux 8, ./yb_build.sh fastdebug --gcc11 pg15_tests/run_all_tests.sh fastdebug --gcc11 --sj --sp --scb fails the following tests: - test_D29546 - test_pg15_regress: yb_pg15 - test_types_geo: yb_pg_box - test_hash_in_queries: yb_hash_in_queries Manually check to see that they are due to YB Seq Scan explain output differences. Reviewers: aagrawal, tfoucher Reviewed By: tfoucher Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D31139

…on releaseInstance and before AnsibleDestroyServer subtask Summary: Add a new pre-check subtask to `ReleaseInstanceFromUniverse` and as a subtask right before any ansibleDestroyServer subtask called `CheckNodeSafeToDelete`. `CheckNodeSafeToDelete` checks that there are no tserver tablets assigned to this node if applicable and there is no master on this node that is in the universe quorum. We base these checks off of the ip of the node. We will fail the `ReleaseInstanceFromUniverse` task if this pre-check fails. In addition, we also fail if we are taking down the server and tserver still has tablets or the server's master process is still part of the quorum. This is used as an extra pre-caution but is a short-term fix until we have a cluster whitelist implemented from the db side. Test Plan: Create a 4 node rf3 universe (2 1 1 for the azs): Perform a remove node -> release node for one of the nodes in the az that has 2 nodes. Make sure that on the release node, the task succeeds. Test #2 Perform a remove node. Manually start up the tserver on the node and remove the blacklist using ybadmin command. Perform a release node from universe. Check that the release node task fails. Test #3 Perform a remove node. Manually start up the master process on the node we just removed. Add the node back to the quorum using yb-admin. Perform a release node from the universe. Check that the release node task fails. (Can also just change the node state of the node to `Removed` and then run the ReleaseNodeFromUniverse task) Other testing: - Check that we are able to do a full move, delete a universe. Reviewers: sanketh, nsingh, hzare Reviewed By: nsingh Subscribers: yugaware Differential Revision: https://phorge.dev.yugabyte.com/D30103

…ction Summary: The are several unit tests which suffers from tsan data race warning with the following stack: ``` WARNING: ThreadSanitizer: data race (pid=38656) Read of size 8 at 0x7f6f2a44b038 by thread T21: #0 memcpy /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115:5 (pg_ddl_concurrency-test+0x9e197) #1 <null> <null> (libnss_sss.so.2+0x72ef) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b) #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9) #3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7) #4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7) #5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe) #6 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:647:20 (libpq.so.5+0x2c279) #7 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${BUILD_ROOT}/../../src/yb/yql/pgwrapper/libpq_utils.cc:278:24 (libpq_utils.so+0x11d6b) ... Previous write of size 8 at 0x7f6f2a44b038 by thread T20 (mutexes: write M0): #0 mmap64 /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:7485:3 (pg_ddl_concurrency-test+0xda204) #1 <null> <null> (libnss_sss.so.2+0x7169) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b) #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9) #3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7) #4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7) #5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe) #6 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:647:20 (libpq.so.5+0x2c279) #7 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${BUILD_ROOT}/../../src/yb/yql/pgwrapper/libpq_utils.cc:278:24 (libpq_utils.so+0x11d6b) ... Location is global '??' at 0x7f6f2a44b000 (passwd+0x38) Mutex M0 (0x7f6f2af29380) created at: #0 pthread_mutex_lock /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (pg_ddl_concurrency-test+0xa464b) #1 <null> <null> (libnss_sss.so.2+0x70d6) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b) #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9) #3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7) #4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7) #5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe) ... ``` All failing tests has common feature - all of them creates connection to postgres from multiple threads at same time. On creating new connection the `libpq` library calls the `getpwuid_r` standard function internally. This function is thread safe and tsan warning is not expected there. Solution is to suppress warning in the `getpwuid_r` function. **Note:** because there is no `getpwuid_r` function name in the tsan warning stack the warning for the caller function `pqGetpwuid` is suppressed. Jira: DB-9523 Test Plan: Jenkins Reviewers: sergei, bogdan Reviewed By: sergei Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31646

Summary: The function `yb_single_row_update_or_delete_path` was reworked for PG15 in D27692. There are a few more issues in the function that this revision fixes: # With PG commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35, ## the target list returned by `build_path_tlist(root, subpath)` only contains the modified and junk columns. So, there is no need to ignore the unspecified columns. ## "tableoid" junk column is added for partitioned tables. Ignore it, along with other junk cols, when iterating over `build_path_tlist(root, subpath)`. ## `RelOptInfo` entries in `simple_rel_array` corresponding to the non-leaf relations of a partitioned table are NOT NULL. Ignore these when checking the number of relations being updated. ## When updating a partitioned table, the child of the UPDATE node is an APPEND node. This append node is skipped in the final plan if it has only one child. Take this into account when applying conditions to the `ModifyTablePath.subpath` by passing the subpath through `get_singleton_append_subpath`. # D27692 added an assertion ` Assert(root->update_colnos->length > update_col_index)`, which is incorrect. The pre-existing code comment clearly stated: `.. it is possible that planner adds extra expressions. In particular, we've seen a RowExpr when a view was updated`. As expected, this incorrect assertion fails when updating a view with a trigger (see the added test). Remove the assertion. Instead, move the expression-out-of-range check before the reading `root->update_colnos`. Test Plan: Jenkins: rebase: pg15 Added two tests in yb_pg15: - one, to test whether single-row optimization is invoked when only one partition is updated (fix #1) - other, to test UPDATE view with INSTEAD OF UPDATE trigger (fix #2). This test is taken from yb_pg_triggers (yb_pg_triggers still fails with unrelated errors) ./yb_build.sh --java-test org.yb.pgsql.TestPg15Regress#testPg15Regress Reviewers: jason, tnayak, amartsinchyk Reviewed By: amartsinchyk Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D31709

… master config on releaseInstance and before AnsibleDestroyServer subtask Summary: Original commit: 252ef97 / D30103 Add a new pre-check subtask to `ReleaseInstanceFromUniverse` and as a subtask right before any ansibleDestroyServer subtask called `CheckNodeSafeToDelete`. `CheckNodeSafeToDelete` checks that there are no tserver tablets assigned to this node if applicable and there is no master on this node that is in the universe quorum. We base these checks off of the ip of the node. We will fail the `ReleaseInstanceFromUniverse` task if this pre-check fails. In addition, we also fail if we are taking down the server and tserver still has tablets or the server's master process is still part of the quorum. This is used as an extra pre-caution but is a short-term fix until we have a cluster whitelist implemented from the db side. Test Plan: Create a 4 node rf3 universe (2 1 1 for the azs): Perform a remove node -> release node for one of the nodes in the az that has 2 nodes. Make sure that on the release node, the task succeeds. Test #2 Perform a remove node. Manually start up the tserver on the node and remove the blacklist using ybadmin command. Perform a release node from universe. Check that the release node task fails. Test #3 Perform a remove node. Manually start up the master process on the node we just removed. Add the node back to the quorum using yb-admin. Perform a release node from the universe. Check that the release node task fails. (Can also just change the node state of the node to `Removed` and then run the ReleaseNodeFromUniverse task) Other testing: - Check that we are able to do a full move, delete a universe. Reviewers: sanketh, nsingh, yshchetinin Reviewed By: nsingh Subscribers: yugaware Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33265

… master config on releaseInstance and before AnsibleDestroyServer subtask Summary: Original commit: 252ef97 / D30103 Add a new pre-check subtask to `ReleaseInstanceFromUniverse` and as a subtask right before any ansibleDestroyServer subtask called `CheckNodeSafeToDelete`. `CheckNodeSafeToDelete` checks that there are no tserver tablets assigned to this node if applicable and there is no master on this node that is in the universe quorum. We base these checks off of the ip of the node. We will fail the `ReleaseInstanceFromUniverse` task if this pre-check fails. In addition, we also fail if we are taking down the server and tserver still has tablets or the server's master process is still part of the quorum. This is used as an extra pre-caution but is a short-term fix until we have a cluster whitelist implemented from the db side. Test Plan: Create a 4 node rf3 universe (2 1 1 for the azs): Perform a remove node -> release node for one of the nodes in the az that has 2 nodes. Make sure that on the release node, the task succeeds. Test #2 Perform a remove node. Manually start up the tserver on the node and remove the blacklist using ybadmin command. Perform a release node from universe. Check that the release node task fails. Test #3 Perform a remove node. Manually start up the master process on the node we just removed. Add the node back to the quorum using yb-admin. Perform a release node from the universe. Check that the release node task fails. (Can also just change the node state of the node to `Removed` and then run the ReleaseNodeFromUniverse task) Other testing: - Check that we are able to do a full move, delete a universe. Reviewers: sanketh, nsingh, yshchetinin Reviewed By: nsingh Subscribers: yugaware Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D33260

…retained for CDC" Summary: D33131 introduced a segmentation fault which was identified in multiple tests. ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x00007f4d2b6f3a84 libpthread.so.0`__pthread_mutex_lock + 4 frame #1: 0x000055d6d1e1190b yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) const [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock[abi:v170002](this=0x00007f4ccb6feaa0, __m=0x0000000000000110) at unique_lock.h:41:11 frame #2: 0x000055d6d1e118f5 yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(this=0x00000000000000f0, min_allowed=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4ccb6feb08) const at mvcc.cc:500:32 frame #3: 0x000055d6d1ef58e3 yb-tserver`yb::tablet::TransactionParticipant::Impl::ProcessRemoveQueueUnlocked(this=0x000037e27d26fb00, min_running_notifier=0x00007f4ccb6fef28) at transaction_participant.cc:1537:45 frame #4: 0x000055d6d1efc11a yb-tserver`yb::tablet::TransactionParticipant::Impl::EnqueueRemoveUnlocked(this=0x000037e27d26fb00, id=<unavailable>, reason=<unavailable>, min_running_notifier=0x00007f4ccb6fef28, expected_deadlock_status=<unavailable>) at transaction_participant.cc:1516:5 frame #5: 0x000055d6d1e3afbe yb-tserver`yb::tablet::RunningTransaction::DoStatusReceived(this=0x000037e2679b5218, status_tablet="d5922c26c9704f298d6812aff8f615f6", status=<unavailable>, response=<unavailable>, serial_no=56986, shared_self=std::__1::shared_ptr<yb::tablet::RunningTransaction>::element_type @ 0x000037e2679b5218) at running_transaction.cc:424:16 frame #6: 0x000055d6d0d7db5f yb-tserver`yb::client::(anonymous namespace)::TransactionRpcBase::Finished(this=0x000037e29c80b420, status=<unavailable>) at transaction_rpc.cc:67:7 ``` This diff reverts the change to unblock the tests. The proper fix for this problem is WIP Jira: DB-10780, DB-10466 Test Plan: Jenkins: urgent Reviewers: rthallam Reviewed By: rthallam Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D34245

…-examples application (#21900) * Upgrade the gorm docs to use latest go version and gorm v2 * Mentioned the use of smart drivers * Harsh daryani896 patch 1 (#2) * Update pgx driver version from v4 to v5 in docs. * review comments and copied to preview --------- Co-authored-by: Harsh Daryani <82017686+HarshDaryani896@users.noreply.github.com> Co-authored-by: aishwarya24 <ashchakravarthy@gmail.com>

…nal when creating telemetry provider Summary: This diff fixes the following 4 tickets: 1. [PLAT-13995] Tags should be optional when creating telemetry provider. 2. [PLAT-12270] Rename logRow to logRows + pgaudit.log_row to pgaudit.log_rows in YBA code 3. [PLAT-14002] Should not allow deleting a Telemetry Provider when it's in use 4. [PLAT-14011] Add AuditLog Payload in ModifyAuditLogging task type details For #2, it is okay to change the request body param `logRow` -> `logRows` since this is an internal API and no one uses this DB audit logs feature yet including UI, YBM, other clients, etc. Test Plan: Manually tested all 4 of the above test cases. Run UTs. Run itests. Reviewers: amalyshev Reviewed By: amalyshev Subscribers: yugaware Differential Revision: https://phorge.dev.yugabyte.com/D35124

…s should be optional when creating telemetry provider Summary: Original commit: 7f76a34 / D35124 This diff fixes the following 4 tickets: 1. [PLAT-13995] Tags should be optional when creating telemetry provider. 2. [PLAT-12270] Rename logRow to logRows + pgaudit.log_row to pgaudit.log_rows in YBA code 3. [PLAT-14002] Should not allow deleting a Telemetry Provider when it's in use 4. [PLAT-14011] Add AuditLog Payload in ModifyAuditLogging task type details For #2, it is okay to change the request body param `logRow` -> `logRows` since this is an internal API and no one uses this DB audit logs feature yet including UI, YBM, other clients, etc. Test Plan: Manually tested all 4 of the above test cases. Run UTs. Run itests. Reviewers: amalyshev Reviewed By: amalyshev Subscribers: yugaware Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35151

Summary: The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` #/bin/bash ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db # Count the number of dump files to avoid having to use `sudo coredumpctl` idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: ybase, smishra, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35116

…retained for CDC" Summary: D33131 introduced a segmentation fault which was identified in multiple tests. ``` * thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV * frame #0: 0x00007f4d2b6f3a84 libpthread.so.0`__pthread_mutex_lock + 4 frame #1: 0x000055d6d1e1190b yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) const [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock[abi:v170002](this=0x00007f4ccb6feaa0, __m=0x0000000000000110) at unique_lock.h:41:11 frame #2: 0x000055d6d1e118f5 yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(this=0x00000000000000f0, min_allowed=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4ccb6feb08) const at mvcc.cc:500:32 frame #3: 0x000055d6d1ef58e3 yb-tserver`yb::tablet::TransactionParticipant::Impl::ProcessRemoveQueueUnlocked(this=0x000037e27d26fb00, min_running_notifier=0x00007f4ccb6fef28) at transaction_participant.cc:1537:45 frame #4: 0x000055d6d1efc11a yb-tserver`yb::tablet::TransactionParticipant::Impl::EnqueueRemoveUnlocked(this=0x000037e27d26fb00, id=<unavailable>, reason=<unavailable>, min_running_notifier=0x00007f4ccb6fef28, expected_deadlock_status=<unavailable>) at transaction_participant.cc:1516:5 frame #5: 0x000055d6d1e3afbe yb-tserver`yb::tablet::RunningTransaction::DoStatusReceived(this=0x000037e2679b5218, status_tablet="d5922c26c9704f298d6812aff8f615f6", status=<unavailable>, response=<unavailable>, serial_no=56986, shared_self=std::__1::shared_ptr<yb::tablet::RunningTransaction>::element_type @ 0x000037e2679b5218) at running_transaction.cc:424:16 frame #6: 0x000055d6d0d7db5f yb-tserver`yb::client::(anonymous namespace)::TransactionRpcBase::Finished(this=0x000037e29c80b420, status=<unavailable>) at transaction_rpc.cc:67:7 ``` This diff reverts the change to unblock the tests. The proper fix for this problem is WIP Jira: DB-10780, DB-10466 Test Plan: Jenkins: urgent Reviewers: rthallam Reviewed By: rthallam Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D34245

…-examples application (#21900) * Upgrade the gorm docs to use latest go version and gorm v2 * Mentioned the use of smart drivers * Harsh daryani896 patch 1 (#2) * Update pgx driver version from v4 to v5 in docs. * review comments and copied to preview --------- Co-authored-by: Harsh Daryani <82017686+HarshDaryani896@users.noreply.github.com> Co-authored-by: aishwarya24 <ashchakravarthy@gmail.com>

…nal when creating telemetry provider Summary: This diff fixes the following 4 tickets: 1. [PLAT-13995] Tags should be optional when creating telemetry provider. 2. [PLAT-12270] Rename logRow to logRows + pgaudit.log_row to pgaudit.log_rows in YBA code 3. [PLAT-14002] Should not allow deleting a Telemetry Provider when it's in use 4. [PLAT-14011] Add AuditLog Payload in ModifyAuditLogging task type details For #2, it is okay to change the request body param `logRow` -> `logRows` since this is an internal API and no one uses this DB audit logs feature yet including UI, YBM, other clients, etc. Test Plan: Manually tested all 4 of the above test cases. Run UTs. Run itests. Reviewers: amalyshev Reviewed By: amalyshev Subscribers: yugaware Differential Revision: https://phorge.dev.yugabyte.com/D35124

Summary: The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` #/bin/bash ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db # Count the number of dump files to avoid having to use `sudo coredumpctl` idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: ybase, smishra, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35116

…s should be optional when creating telemetry provider Summary: Original commit: 7f76a34 / D35124 This diff fixes the following 4 tickets: 1. [PLAT-13995] Tags should be optional when creating telemetry provider. 2. [PLAT-12270] Rename logRow to logRows + pgaudit.log_row to pgaudit.log_rows in YBA code 3. [PLAT-14002] Should not allow deleting a Telemetry Provider when it's in use 4. [PLAT-14011] Add AuditLog Payload in ModifyAuditLogging task type details For #2, it is okay to change the request body param `logRow` -> `logRows` since this is an internal API and no one uses this DB audit logs feature yet including UI, YBM, other clients, etc. Test Plan: Manually tested all 4 of the above test cases. Run UTs. Run itests. Reviewers: amalyshev Reviewed By: amalyshev Subscribers: yugaware Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35151

… SIGTERM Summary: Original commit: 5862233 / D35116 The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` #/bin/bash ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db # Count the number of dump files to avoid having to use `sudo coredumpctl` idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: yql, smishra, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35169

…IGTERM Summary: Original commit: 5862233 / D35116 The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: yql, smishra, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35171

…IGTERM Summary: Original commit: 5862233 / D35116 The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster. ``` #0 0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip> #1 0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so) #2 0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip> #3 0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so) .... .... ``` The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received. The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`. According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit(): - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped. - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]]. - Objects with static storage duration are destroyed. - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed. This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor. It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated. Jira: DB-7796 Test Plan: To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`. On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue. The following bash script can be used to detect the coredumps: ``` #/bin/bash ITERATIONS=50 YBDB_PATH=/path/to/code/yugabyte-db # Count the number of dump files to avoid having to use `sudo coredumpctl` idumps=$(ls /var/lib/systemd/coredump/ | wc -l) for ((i = 0 ; i < $ITERATIONS ; i++ )) do echo "Iteration: $(($i + 1))"; $YBDB_PATH/bin/yb-ctl restart > /dev/null nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l) if (( nservers != 1)); then echo "Web server has not come up. Exiting" exit 1; fi sleep 5s # Kill the webserver pkill -TERM -f 'YSQL webserver' # Count the number of coredumps # Please validate that the coredump produced is that of postgres/webserver ndumps=$(ls /var/lib/systemd/coredump/ | wc -l) if (( ndumps > idumps )); then echo "Core dumps: $(($ndumps - $idumps))" else echo "No new core dumps found" fi done ``` Run the script with the load generation tool running against the webserver in the background. - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations. - With the fix, no coredumps were observed. Reviewers: telgersma, fizaa Reviewed By: telgersma Subscribers: ybase, smishra, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35170

Support for database/user/client based aggregates added to access these statistics with three new views added. Some new counters added including min/max/mean's time histograms. We are saving the parameters of the slow queries, which can be tested later. Did some refactoring of the code, by renaming the whole extension from pg_stat_statement to pg_stat_monitor.

kmuthukk added the kind/enhancement This is an enhancement of an existing feature label Nov 7, 2017

hectorgcr self-assigned this Nov 7, 2017

rkarthik007 added this to To Do in YCQL via automation Jan 26, 2018

rkarthik007 added a commit that referenced this issue Jan 31, 2018

Fix double underline in the README in the docs directory. (#38)

f1d469c

* Created a docs directory (#2) Docs directory for design docs, discussions, etc. Also added a basic README. * Removed extra underline in the README doc.

rkarthik007 mentioned this issue Mar 27, 2018

Support host names in YugaByte DB #128

Closed

rkarthik007 assigned rahuldesirazu and unassigned hectorgcr Apr 2, 2018

rkarthik007 modified the milestone: v1.0 Apr 11, 2018

bbaddepudi mentioned this issue Sep 4, 2018

Master server UI breaks when using hostname. #460

Closed

ajcaldera1 mentioned this issue Oct 9, 2018

inaccurate compaction counters due to undersized printf #499

Closed

mbautin mentioned this issue Apr 9, 2019

[YCQL] Undefined behavior (5e+308 outside of range for float) in QLTestSelectedExpr.TestCastDecimal #1182

Closed

hectorgcr mentioned this issue Apr 9, 2019

TSAN warning signal-unsafe call inside of a signal inside in Proxy::SyncRequest #1177

Closed

OlegLoginov mentioned this issue Apr 23, 2019

ASAN:undefined-behavior issue in QLTestSelectedExpr_TestCastDecimal #1241

Closed

This was referenced May 6, 2019

TSAN Failure in QLDmlTest.ReadFollower test #1321

Closed

TSAN failure in QLTransactionTest.CorrectStatusRequestBatching test #1322

Closed

rahuldesirazu removed their assignment Jun 5, 2019

rahuldesirazu added the priority/low Low priority label Jun 5, 2019

mbautin mentioned this issue Jun 8, 2019

Master crash when trying to enable the use of initial sys catalog snapshot in yb-ctl #1505

Closed

mbautin pushed a commit that referenced this issue Jun 20, 2019

Client Drivers for YSQL in Java, NodeJS and Go (#2)

c446260

* Client drivers for YSQL * Client drivers for YSQL * Client Drivers for YSQL / Java * Client driver for YSQL / Go * Addressed review comments

d-uspenskiy mentioned this issue Jan 11, 2024

[YSQL] flaky test: PgDDLConcurrencyTest.IndexCreation #20517

Open

1 task

hari90 mentioned this issue Apr 22, 2024

[DocDB] e6bb62ab80a9c419df8a2807e2fe182cd2d7aa58/D34307 introduced a TSAN race #22101

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the SCAN Redis command #2

Implement the SCAN Redis command #2

hengestone commented Nov 7, 2017

hectorgcr commented Nov 7, 2017

rahuldesirazu commented Apr 6, 2018 •

edited

Loading

mbautin commented Apr 6, 2018 •

edited

Loading

rahuldesirazu commented Jun 5, 2019

Implement the SCAN Redis command #2

Implement the SCAN Redis command #2

Comments

hengestone commented Nov 7, 2017

hectorgcr commented Nov 7, 2017

rahuldesirazu commented Apr 6, 2018 • edited Loading

mbautin commented Apr 6, 2018 • edited Loading

rahuldesirazu commented Jun 5, 2019

rahuldesirazu commented Apr 6, 2018 •

edited

Loading

mbautin commented Apr 6, 2018 •

edited

Loading