Skip to content

2.27.0.0-b480

@abhinab-yb abhinab-yb tagged this 22 Aug 10:37
Summary:
This diff adds instruments the xCluster poller with metadata and wait events.

New events added --
- XCluster_WaitingForGetChanges: XCluster Poller on target universe is waiting for changes from source universe.

The metadata is generated when the poller thread is created and is set to --
- root_request_id: Set to the producer stream ID
- top_level_node_id: Set to the tserver UUID of the target where the poller is running
- query_id: Set to 12, a new query id to distinguish the events of the poller
- database_id: Set to the database name of the target

Each instance of a poller gets a WaitStateInfo object and the wait event is initialized to
idle. The poller can schedule tasks / RPCs on different threads during it's lifetime. We
only capture the wait due to the GetChanges RPC to the source universe. All other
scheduling waits are marked as Idle to filter them out while collecting the events.
After getting the changes from the source, the poller does ApplyChanges that goes
through the regular write path, which is already instrumented.

MetadataSerializerFactory is set on XClusterRemoteClientHolder to take advantage of
the automatic ASH metadata serializing framework, so all the RPCs going from the
target to the source universe get automatically tagged with the metadata. This lets us
look at what the GetChanges RPC of a particular poller was doing in the source universe.
Jira: DB-12145

Test Plan:
Manual testing

Ran the SqlInserts workload on yb-sample-apss

```
java -jar yb-sample-apps/target/yb-sample-apps.jar --workload SqlInserts --nodes 10.150.0.16:5433
```

Then, used the following query

```
SELECT
    query_id,
    root_request_id,
    wait_event_component,
    wait_event,
    wait_event_type,
    wait_event_aux,
    COUNT(*)
FROM
    yb_active_session_history
WHERE
    query_id = 12
GROUP BY
    query_id,
    root_request_id,
    wait_event_component,
    wait_event,
    wait_event_type,
    wait_event_aux
ORDER BY
    query_id,
    root_request_id,
    wait_event_component,
    wait_event_type;
```
Output of source universe --
```
 query_id |           root_request_id            | wait_event_component |         wait_event          | wait_event_type | wait_event_aux  | count
----------+--------------------------------------+----------------------+-----------------------------+-----------------+-----------------+-------
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Active                | Cpu             | 2682cb943a564be |   398
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Active                | Cpu             |                 |    25
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Passive               | Cpu             | 2682cb943a564be |    73
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Passive               | Cpu             |                 |   192
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | Raft_ApplyingEdits          | Cpu             | 4a3d9b1ac53443a |     1
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | YBClient_WaitingOnDocDB     | RPCWait         | 2682cb943a564be |     3
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | MVCC_WaitForSafeTime        | WaitOnCondition | 2682cb943a564be |     1
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | ReplicaState_TakeUpdateLock | WaitOnCondition | 2682cb943a564be |   201
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | RocksDB_RateLimiter         | WaitOnCondition | 2682cb943a564be |     4
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | WaitForReadTime             | WaitOnCondition | 2682cb943a564be |     2
       12 | c05d17c3-dbde-4c44-b3a8-b929d9f3f30f | TServer              | OnCpu_Active                | Cpu             | 964be410df57402 |    19
       12 | c05d17c3-dbde-4c44-b3a8-b929d9f3f30f | TServer              | OnCpu_Active                | Cpu             |                 |     1
       12 | c05d17c3-dbde-4c44-b3a8-b929d9f3f30f | TServer              | OnCpu_Passive               | Cpu             | 964be410df57402 |    11
       12 | c05d17c3-dbde-4c44-b3a8-b929d9f3f30f | TServer              | OnCpu_Passive               | Cpu             |                 |    14
       12 | c05d17c3-dbde-4c44-b3a8-b929d9f3f30f | TServer              | YBClient_WaitingOnDocDB     | RPCWait         | 964be410df57402 |     1
       12 | c05d17c3-dbde-4c44-b3a8-b929d9f3f30f | TServer              | WaitForReadTime             | WaitOnCondition | 964be410df57402 |     1
       12 | da5989f5-2ae8-4daf-a17b-230d94f09a5f | TServer              | OnCpu_Active                | Cpu             | 0acd37b6f5ff465 |    32
       12 | da5989f5-2ae8-4daf-a17b-230d94f09a5f | TServer              | OnCpu_Active                | Cpu             |                 |     4
       12 | da5989f5-2ae8-4daf-a17b-230d94f09a5f | TServer              | OnCpu_Passive               | Cpu             | 0acd37b6f5ff465 |     9
       12 | da5989f5-2ae8-4daf-a17b-230d94f09a5f | TServer              | OnCpu_Passive               | Cpu             |                 |    17
       12 | da5989f5-2ae8-4daf-a17b-230d94f09a5f | TServer              | YBClient_WaitingOnDocDB     | RPCWait         | 0acd37b6f5ff465 |     2
       12 | da5989f5-2ae8-4daf-a17b-230d94f09a5f | TServer              | WaitForReadTime             | WaitOnCondition | 0acd37b6f5ff465 |     2
(22 rows)
```
Output of target universe --
```
 query_id |           root_request_id            | wait_event_component |          wait_event           | wait_event_type | wait_event_aux  | count
----------+--------------------------------------+----------------------+-------------------------------+-----------------+-----------------+-------
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Active                  | Cpu             | 2682cb943a564be |   156
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Active                  | Cpu             | 53675d0dac7c4b1 |    89
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Active                  | Cpu             |                 |     6
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Passive                 | Cpu             | 2682cb943a564be |     1
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Passive                 | Cpu             | 53675d0dac7c4b1 |    55
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | OnCpu_Passive                 | Cpu             |                 |    53
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | Raft_ApplyingEdits            | Cpu             | 53675d0dac7c4b1 |    59
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | Raft_WaitingForReplication    | RPCWait         | 53675d0dac7c4b1 |   606
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | XCluster_WaitingForGetChanges | RPCWait         | 2682cb943a564be |  1285
       12 | a2e8aca3-024d-4318-8dbf-7d7d10578b7d | TServer              | Rpc_Done                      | WaitOnCondition | 53675d0dac7c4b1 |     4
       12 | c05d17c3-dbde-4c44-b3a8-b929d9f3f30f | TServer              | OnCpu_Active                  | Cpu             | 964be410df57402 |    13
       12 | c05d17c3-dbde-4c44-b3a8-b929d9f3f30f | TServer              | XCluster_WaitingForGetChanges | RPCWait         | 964be410df57402 |    85
       12 | da5989f5-2ae8-4daf-a17b-230d94f09a5f | TServer              | OnCpu_Active                  | Cpu             | 0acd37b6f5ff465 |    13
       12 | da5989f5-2ae8-4daf-a17b-230d94f09a5f | TServer              | OnCpu_Passive                 | Cpu             | 0acd37b6f5ff465 |     6
       12 | da5989f5-2ae8-4daf-a17b-230d94f09a5f | TServer              | XCluster_WaitingForGetChanges | RPCWait         | 0acd37b6f5ff465 |    95
(15 rows)
```

Reviewers: hsunder, amitanand

Reviewed By: hsunder, amitanand

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D45882
Assets 2
Loading