Skip to content

2025.2.4.0-b116

@pao214 pao214 tagged this 09 Jun 20:08
Summary:
Original commit: 6b94be9cbed701a8422ce3193aca5ea8362c013d / D53813
### Summary

#### Problem

Commit 796a12e955 (#29701) stopped the query layer from retrying read restart
errors for non-DML statements, which includes CALL and DO. As a result a read
restart hit inside a procedure surfaces to the client instead of being retried
automatically.

Repro: a procedure that reads a table while another session inserts into it
concurrently, under READ COMMITTED:

    CREATE TABLE test_rr (id SERIAL PRIMARY KEY, t TEXT, i INT) SPLIT INTO 24 TABLETS;

    CREATE PROCEDURE test_rr_count_proc() LANGUAGE plpgsql AS $$
      DECLARE cnt bigint;
      BEGIN SELECT count(*) INTO cnt FROM test_rr; END;
    $$;

    -- session 1 (READ COMMITTED), in a loop:
    CALL test_rr_count_proc();

    -- session 2, concurrently in a loop:
    INSERT INTO test_rr (t, i) VALUES ('s', 1);

Session 1 fails with: ERROR: Restart read required

#### Fix

Allow query layer retries of read-restart errors for CALL and DO under
READ COMMITTED, matching the behavior of SELECT and DML. Outside READ
COMMITTED, CALL/DO read-restart retry is intentionally left as-is
(users can opt in via `yb_extra_commands_to_retry` if needed).

The asymmetry between read-restart and conflict / deadlock / aborted errors
for DDL (and other non-DML tags) is historical: conflicts on DDLs have
always been retried under READ COMMITTED ("retry any tag on non-read-
restart") and that path is preserved unchanged; read-restart errors on
DDLs have never been retried. So at the top level a DDL hitting a
conflict / deadlock / aborted error retries under RC just like it always
did, while a DDL hitting a read restart still surfaces unless opted in
via the GUC below. Outside READ COMMITTED, conflict / deadlock / aborted
errors are retried only for the default retriable set
(SELECT/INSERT/UPDATE/DELETE).

Anything outside the default retriable set can be opted in via
`yb_extra_commands_to_retry` (see below).

#### Proc-retriability gate

A retry of CALL/DO re-runs the entire procedure body, which is only safe when
the body ran statements whose effects are undone by the savepoint/transaction
rollback that precedes the retry. `spi.c` now tracks per-top-level-CALL/DO
whether the body ran anything outside the retriable set, exposed via
`YbProcRetryBlocked()`. The flag is reset on the outermost SPI connect so it
is scoped to one top-level invocation and accumulates across nested calls
within it. `yb_is_retry_possible` consults `YbProcRetryBlocked()` and refuses
to retry when the body ran anything outside that validated set. Statements like
ALTER TABLE may well be idempotent on re-execution, but the retry path
hasn't been validated for them, so we conservatively block. Users who know
their procedure body is safe can opt the offending tag in via
`yb_extra_commands_to_retry_in_proc` (see below).

Nested CALL/DO are tracked recursively through this same SPI path: a nested
read-only call stays retriable, while a nested call that runs e.g. DDL is
caught individually. EXECUTE of a prepared statement resolves to the prepared
statement's own command tag (so EXECUTE of a SELECT in a proc body stays
retriable); unresolved EXECUTE conservatively blocks.

#### Opt-in escape hatches: two list GUCs

Two opt-in list GUCs let users widen what's retriable when they know

- **`yb_extra_commands_to_retry`** -- widen the top-level retriable set.
- **`yb_extra_commands_to_retry_in_proc`** -- mark a tag as safe to
  re-execute inside a CALL/DO body, so the enclosing proc stays retriable.

Listing COPY, COPY FROM, or ANALYZE in either GUC is rejected at SET time
with an error -- these statements are never safe to retry (re-executing
COPY can double-apply rows; ANALYZE errors out on retry).

Test Plan:
Jenkins.

New tests in `TestPgTransparentRestarts` (concurrent inserts on `test_rr`):

    ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgTransparentRestarts#callProcReadRestartReadCommitted'
    ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgTransparentRestarts#doBlockReadRestartReadCommitted'
    ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgTransparentRestarts#nonRetriableProcReadRestartNotRetried'
    ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgTransparentRestarts#forceProcRetryRetriesNonRetriableProc'
    ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgTransparentRestarts#nestedNonRetriableProcReadRestartNotRetried'
    ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgTransparentRestarts#executePreparedStatementInProcReadRestartReadCommitted'
    ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgTransparentRestarts#retryTrackerResetAcrossProcsNoTxnBlock'
    ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgTransparentRestarts#retryTrackerResetAcrossProcsInTxnBlock'

New pg_regress coverage in `sql/yb.orig.guc.sql` (under
`yb_misc_serial_schedule`) for the parser of
`yb_extra_commands_to_retry[_in_proc]`: case-insensitivity, whitespace
tolerance, multi-word tags, unknown-tag ERROR, and the COPY/COPY FROM/
ANALYZE WARNING.

Coverage:

- `callProcReadRestartReadCommitted` / `doBlockReadRestartReadCommitted`:
  read restart inside CALL/DO retried transparently under READ COMMITTED
  (the original bug).
- `nonRetriableProcReadRestartNotRetried`: a CALL whose body runs
  `LOCK TABLE` blocks retry; the read restart surfaces.
- `forceProcRetryRetriesNonRetriableProc`: adding `LOCK TABLE` to
  `yb_extra_commands_to_retry_in_proc` unblocks the retry.
- `nestedNonRetriableProcReadRestartNotRetried`: a CALL whose body calls
  a nested non-retriable proc inherits the block; the outer CALL is not
  retried.
- `executePreparedStatementInProcReadRestartReadCommitted`: a proc body
  that issues SQL EXECUTE of a prepared SELECT statement is retriable
  (the EXECUTE tag is resolved to the prepared statement's underlying
  tag in `YbSpiStmtBlocksProcRetry`).
- `retryTrackerResetAcrossProcsNoTxnBlock` /
  `retryTrackerResetAcrossProcsInTxnBlock`: the proc-retriability flag
  is per top-level CALL; a retriable proc invoked right after a non-
  retriable one still retries transparently, both in autocommit and
  inside an explicit BEGIN/COMMIT block.

Backport-through: 2025.2

Reviewers: pjain, smishra

Reviewed By: pjain

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D54363
Assets 2
Loading