Skip to content

2.25.2.0-b90

@myang2021 myang2021 tagged this 04 Mar 02:57
Summary:
In part 3, I have made changes to extend the tserver to master heartbeat response
message to also include the contents of `pg_yb_invalidation_messages` along with
the contents of `pg_yb_catalog_version` when there is a change in
`pg_yb_catalog_version` (via the existing fingerprint mechanism). The
invalidation messages is set in the `db_catalog_inval_messages_data` proto field
of the heartbeat response message.

This diff reads the `db_catalog_inval_messages_data` from the heartbeat response
message, and store it in the tserver private memory. A new map
`ysql_db_invalidation_messages_map_` is added: for each database, it stores a
doubly ended queue. Each element of the queue is a pair: (current_version,
messages). The current_version is the catalog version, and messages is a blob
representing the list of catalog cache invalidation messages generated by PG for
the current_version.

The maximum size of the queue is controlled by --ysql_max_invalidation_message_queue_size,
default to 1024. This means that for each database, we can store a history of
up to 1024 catalog versions. Logically this map simply extends the history of
`ysql_db_invalidation_messages_map_`. Because by default we only store a history
of 10 seconds of invalidation messages for each database, it is likely that in
the heartbeat response we see much less than 1024 versions. In that case, the
new versions and their associated messages are merged into the in-memory map
`ysql_db_invalidation_messages_map_` while old entries are removed from the front
of the queue. In this way, we let each tserver keep a more extended history of
catalog versions and their invalidation messages, so that we can tolerate a PG
transaction block that can run longer: a PG transaction block cannot do catalog
cache refresh (whether incremental or full refresh) until the transaction
completes. As a result, by the time the PG transaction block completes, there
may be many DDL statements already executed. For example, if the PG local
catalog version is 1 when it starts a transaction block, by the time the
transaction block completes, 100 DDLs have been executed and 50 of them have
incremented catalog versions, the latest catalog version that PG reads from
shared memory is now 51. PG will need to read the entire sequence of 2, 3, ...,
51 and their invalidation messages in order to do a valid incremental cache
refresh. By having a history of up to 1024 catalog versions, we can allow a
longer running PG transaction block to still do incremental refresh.

Note that even though `pg_yb_catalog_version` and `pg_yb_invalidation_messages`
are written transactionally, they are not read transactionally at the master
side when it prepares for the heartbeat response. Therefore we do not try to
process the `db_catalog_version_data` and `db_catalog_inval_messages_data`
atomically at tserver side since they could be out of sync in rare cases when
the following sequence of events happens at master side:
(1) pg_yb_catalog_version is read and put in response
(2) pg_yb_catalog_version and pg_yb_invalidation_messages are updated
transactionally (current_version = current_version + 1, with the messages)
(3) pg_yb_invalidation_messages is read and put in response

Test Plan:
(1) Run
YB_EXTRA_MASTER_FLAGS="--TEST_yb_enable_invalidation_messages=true --log_ysql_catalog_versions=true --vmodule=catalog_manager=2,heartbeater=2,master_heartbeat_service=2,pg_catversions=2 --TEST_simulate_catalog_message_read_failure=0.5" YB_EXTRA_TSERVER_FLAGS="--TEST_yb_enable_invalidation_messages=true --log_ysql_catalog_versions=true --vmodule=heartbeater=2,tablet_server=2,pg_catversions=2 --ysql_max_invalidation_message_queue_size=15" ./yb_build.sh --cxx-test pg_catalog_version-test

Also look at the test logs indicating some code coverage:

```
[m-1] W0228 00:36:14.667902 4069245 master_heartbeat_service.cc:358] Could not get YSQL invalidation
messages for heartbeat response: Internal error (yb/master/sys_catalog.cc:1695): Injected pg_yb_invalidation_messages read failure for testing.
```

```
[ts-3] I0228 00:36:54.320386 4069668 tablet_server.cc:1204] reset catalog_versions_fingerprint_

```

```
[ts-2] W0228 00:29:16.253911 4061495 tablet_server.cc:1265] db_oid 16384 not found in ysql_db_invalidation_messages_map_
```

```
[ts-3] I0228 00:32:06.696022 4064671 tablet_server.cc:1234] vlog2: db_oid 1 message queue size: 4
```

(2) Manual test:
```
yugabyte=# \i /tmp/t1.sql
create table foo (id int);
CREATE TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
alter table foo add column id2 text;
ALTER TABLE
alter table foo drop column id2;
ALTER TABLE
yugabyte=# alter table foo add column id2 text;
ALTER TABLE
yugabyte=# alter table foo drop column id2;
ALTER TABLE
```
The last 2 alter table commands are manually typed (not from /tmp/t1.sql).
Look at yb-tserver log:
```
I0228 02:52:11.464120 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 2
I0228 02:52:12.469295 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 7
I0228 02:52:13.474404 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 11
I0228 02:52:14.480101 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 15
I0228 02:52:15.485495 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 20
I0228 02:52:49.621912 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 16
I0228 02:52:52.636390 4124142 tablet_server.cc:1297] vlog2: db_oid 13515 message queue size: 16
```
We can see that about 5 back-to-back alter DDLs were executed per heartbeat interval.
When manually execute one by one, the message queue size remained at 16 (before pop_front is called).
So the message queue max size of 15 is checked and respected.

Reviewers: kfranz, sanketh, mihnea

Reviewed By: kfranz

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D42226
Assets 2
Loading