Skip to content

2.25.1.0-b239

tagged this 25 Jan 17:48
Summary:
This revision allows transactions to synchronize their snapshots. To that end, support the following PostgreSQL compatible syntax:
 - `SELECT pg_export_snapshot()`
 - `SET TRANSACTION SNAPSHOT <snapshot_name>`

To synchronize snapshots of two active transactions, first get the snapshot id from the output of `SELECT pg_export_snapshot()`. Then, `SET TRANSACTION SNAPSHOT <snapshot_name>` in the second transaction to use the same database snapshot as the first one. Now, both the transactions see identical content in the database, apart from their own writes.

Semantics:
 1. SELECT pg_export_snapshot() and SET TRANSACTION SNAPSHOT are only valid from within REPEATABLE READ transactions. Other isolation levels return an error.
 2. SELECT pg_export_snapshot() returns a id which can then be used to set the transaction snapshot of a transaction on any node of the universe.
 3. The importing transaction retains its transaction semantics.
  1. Cannot import snapshot into a transaction that already has a transaction snapshot. This is a fundamental property of a REPEATABLE READ transaction.
  2. Avoids stale read anomalies as expected of a YugabyteDB transaction.
 4. Currently, in YugabyteDB, the stored metadata is not cleaned up after a transaction ends. As a result, it is possible to import a snapshot even after the exporting transaction has concluded. However, this practice is not recommended and may lead to unpredictable behavior. Future revisions will introduce support for metadata cleanup, ensuring such imports are no longer allowed.

**Design Overview**:
The snapshot export and set functionality is enabled through the use of `read_time`. The `read_time` is associated with a snapshot ID and is stored in the exporting tserver's memory. When setting the transaction snapshot, this `read_time` is retrieved and set in tserver. The subsequent statements operate based on that specific read point.

During the export of a snapshot, certain metadata (same as PG) such as database OID, isolation level, and read-only status is gathered. Here, YugabyteDB and PG diverge in how snapshots are managed:

PG maintains snapshot boundaries using metadata fields including `xmin`and `xmax`, which track the visibility of transactions for consistency.
YugabyteDB simplifies this with a single field, `read_time`, which serves as an  equivalent to PG's xmin and xmax, encapsulating the read point for the snapshot.

**Implementation Details of `pg_export_snapshot`**
On the tserver, the `read_time` is determined using the `PgClientSession:SetupSession` function which is called with `read_time_manipulation` set to `ReadTimeManipulation::ENSURE_READ_TIME_IS_SET`, which ensures that a `read_time` is picked here and if it is a new transaction (`pg_export_snapshot` is the first statement) then current time is picked as `read_time`.

The following metadata is stored in the tserver's memory:

- Database OID
- Isolation level
- Read-only status
- `read_time`

**Snapshot Id**
Snapshot Id returned by `pg_export_snapshot` is of the following structure: `<exporting tserver UUID>-<random UUID>` e.g.
```
yugabyte=*# SELECT pg_export_snapshot();
                         pg_export_snapshot
---------------------------------------------------------------------
 4b6c7bfb62c6405db65b311d6516543e-c56d045007440191d34d983c5dbe8ab6
(1 row)
```

**Storage and Retrieval of Metadata**
Upon receiving metadata in the `PgTxnSnapshotManager::Register`, a random not in use UUID is generated and, the metadata is stored in tserver's memory only against this UUID.
To retrieve the data we obtain the UUID of the tserver on which this metadata is stored using the snapshot id.
 1. If the importing tserver is same as the exporting tserver, then it fetches the metadata stored against the snapshot id(the random UUID).
 2. If the importing tserver is different from exporting then an RPC call is made to exporting tserver's , which returns the requested metadata to this tserver for use.

Metadata is not persisted to disk because if a tserver crashes, the associated exporting session ends, aligning with PG semantics that disallow importing snapshots from ended transactions.

**Implementation Details of SET TRANSACTION SNAPSHOT**
When the `SET TRANSACTION SNAPSHOT` statement is executed, the previously stored metadata is retrieved from the exporting remote tserver as described above and applied to set the `read time`.

**Limitations**:
Currently, this functionality is only supported in the `REPEATABLE READ` isolation level. The stored metadata is not yet deleted after the exporting transaction ends; this will be addressed in future commits. Additionally, importing a snapshot is not permitted if `yb_read_time`, `read_time_for_follower_reads_`, or `yb_read_after_commit_visibility` is set.

**GFlag**
This feature is guarded by a **PREVIEW** flag `ysql_enable_pg_export_snapshot`, with default value false.

**Upgrade/Rollback safety:**
No existing proto message has been modified. New messages and RPCs are only used when the user executed new SQL syntax SELECT pg_export_snapshot() and SET TRANSACTION SNAPSHOT . The user is only allowed to use the new syntax after the upgrade has completed. Using the syntax in older version will fail, and if incorrectly used in the middle of an upgrade in the worst case will fail with "RPC Not implemented" errors. This is ok and does not cause any correctness issues. No AutoFlag is used since a brand new SQL syntax is involved to trigger any of this code path.

JIRA: DB-13042

Test Plan:
./yb_build.sh --cxx-test pgwrapper_pg_export_snapshot-test --gtest_filter "PgExportSnapshotTest.*"
./yb_build.sh --java-test 'TestPgBatch#testImportTxnSnapshot'
./yb_build.sh --java-test 'TestPgBatch#testExportTxnSnapshot'

Reviewers: skumar, pjain, stiwary, patnaik.balivada, mhaddad, aagrawal, dmitry, hsunder

Reviewed By: patnaik.balivada, dmitry, hsunder

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38542
Assets 2
Loading