Skip to content

[YSQL] Txn timeouts during large txns such as table rewrites #27688

Open
@yugabyte-ci

Description

@yugabyte-ci

Jira Link: DB-17288

During large txns, such as those that result from table rewrites on large tables or a large partition hierarchy, we run into errors such as

ysqlsh:alter_table.sql:1: ERROR: could not serialize access due to concurrent update (query layer retry isn't possible, READ COMMITTED transaction was aborted and some data was already sent to the user) DETAIL: Heartbeat: Transaction 73389530-f34a-49e0-82f4-c408cfc6f770 expired or aborted by a conflict: YB001: . Errors from tablet servers: [Operation expired (yb/tablet/transaction_coordinator.cc:1766): Heartbeat: Transaction 73389530-f34a-49e0-82f4-c408cfc6f770 expired or aborted by a conflict: YB001 (pgsql error YB001) (transaction error 1)]

One way to simulate this is to trigger RAFT leader failures while a table rewrite is running. It can be fixed by increasing the txn timeout via --transaction_max_missed_heartbeat_periods=60 but it would be better to increase this timeouts automatically for such txns.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions