Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Server crash with transparent retries when executing multiple statements in ysqlsh command mode #21361

Closed
1 task done
pao214 opened this issue Mar 8, 2024 · 2 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@pao214
Copy link
Contributor

pao214 commented Mar 8, 2024

Jira Link: DB-10258

Description

The query layer supports multiple statements in one query message when using the simple query message protocol. See the https://www.postgresql.org/docs/11/protocol-flow.html#PROTOCOL-FLOW-MULTI-STATEMENT for more information.

We can trigger this mode using ./bin/ysqlsh -c "<... specify your ; separated commands ...>"

These commands are executed within an implicit transactional block, i.e. the expectation is that the commands are executed atomically. However, the commands may contain transactional statements such as COMMIT. The implementation details do not play well with READ COMMITTED internal subtransaction mechanism.

Issue

Let's take a look at an example scenario where this is problematic

Setup

CREATE TABLE kv(k INT, v INT, PRIMARY KEY(k ASC));
INSERT INTO kv (SELECT GENERATE_SERIES(1, 10), 0);

Now, we execute concurrent conflicting transactions

session 1                                           session 2
----------                                          ----------
$ ./bin/ysqlsh
# BEGIN;
# UPDATE kv SET v = v + 1 WHERE k <= 5;
                                               $ ./bin/ysqlsh -c "BEGIN; UPDATE kv SET v = v+1 where k > 5; COMMIT; SELECT v FROM kv FOR UPDATE;"
                                               <... waiting on SELECT FOR UPDATE ...>
# COMMIT;
                                               server closed the connection unexpectedly
                                                     This probably means the server terminated abnormally
                                                     before or while processing the request.

Expected Behavior

Notice that there is a transaction conflict at SELECT FOR UPDATE when session 1 commits. We hope to retry this statement transparently and return ten 1s.

Potential Cause

The server crash happens in debug mode because of an assert statement in our code that checks whether or not we are within the internal subtransaction started by READ COMMITTED (for the purposes of transparent restarts). Naturally, we suspect that the COMMIT command pops out the internal subtxn. Moreover, we suspect that a new subtxn is not created for each statement in the batch but only at the start of the batch. This leaves us without an internal subtransaction when trying to rollback the statement for a subsequent retry.

Other Considerations

Please lookout for issue #21297 since the READ COMMITTED mode there fails for a similar reason. Our issue executes batches using the simple query protocol while the related issue #21297 uses the extended query protocol. So, instead of missed updates, we should instead lookout for duplicate updates. Finally, good to verify transparent retries in REPEATABLE READ isolation as well.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@pao214 pao214 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Mar 8, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Mar 8, 2024
@pkj415
Copy link
Contributor

pkj415 commented Apr 3, 2024

Thanks @pao214 for raising this! There are 2 bugs here:

First, the SELECT v FROM kv FOR UPDATE; faces this FATAL:
2024-04-03 01:59:33.432 UTC [76855] FATAL: RollbackAndReleaseCurrentSubTransaction: unexpected state IMPLICIT_INPROGRESS

The fatal isn't expected because of the following reasoning: we don't register an internal savepoint for single statement transactions i.e., those outside a transaction block in RC. RC retries such single-statement transactions by just restarting and retrying the whole transaction instead of rolling back that statement and redoing it. In other words, single statement transactions are retried by the else part to if (IsYBReadCommitted() && IsInTransactionBlock(true /* isTopLevel */)) in the function yb_attempt_to_restart_on_error. However, in this case, given the FATAL, it seems like it went into the if part and hence IsInTransactionBlock was true, which is unexpected for the SELECT v FROM kv FOR UPDATE;. I will check this.

Even if the above mentioned issue with IsInTransactionBlock was fixed, we shouldn't retry the whole query because it would end up redoing the BEGIN; UPDATE kv SET v = v+1 where k > 5; COMMIT;. Currently our query layer retries, redo the whole query either by rolling back to the previous savepoint if in a RC txn block, or by restarting the whole txn if not in a txn block. RC's reliance on these full query retries will result in this shortcoming, which can be solved via #11573.

@pkj415
Copy link
Contributor

pkj415 commented Apr 3, 2024

RCA:

A few facts:

  1. Issuing multi-statement queries in ysqlsh manually will result in each statement to be sent separately to YSQL. To really issue a multi-statement query, use the "-c" option.
  2. For a multi-statement query, Pg treats it as a single implicit transaction unless there is an intervening "commit;". For example, consider the following:
create table test (k int primary key, v int);
insert into test values (1, 1);
create table test2 (k int primary key, v int);
insert into test2 values (1, 1);

./bin/ysqlsh -c "insert into test2 values (6, 6); insert into test values (1, 1);"
ERROR:  duplicate key value violates unique constraint "test_pkey"

After the above, a select on test2 doesn't return (6, 6), because both statements were considered part of the same txn.

However, the following results in (6, 6) to be inserted even though the 2nd insert faces a duplicate key error:

./bin/ysqlsh -c "begin; insert into test2 values (6, 6); commit; insert into test values (1, 1);"
ERROR:  duplicate key value violates unique constraint "test_pkey"

Given this, we have the following issues in RC: Top-level query layer retries for multi-statement queries should not be performed because it will result in redoing all the queries from the start. If there are no intervening "commit;" statements this is okay since it would be one implicit transaction. Otherwise, it can result in wrong results because the previous statements which have committed will also be re-executed. So, we should block query layer retries for multi-statement queries.

In future, we can try to implement retries for multi-statement queries. This will require implementing the retry logic per statement within exec_simple_query similar to yb_attempt_to_restart_on_error, but with some minor changes.

@rthallamko3 rthallamko3 removed the status/awaiting-triage Issue awaiting triage label Apr 22, 2024
pkj415 added a commit that referenced this issue Apr 22, 2024
Summary:
Query layer retries should be blocked for multi-statement queries.
This is because they might have transaction blocks within them that
had committed before a statement faces a kConflict, and a
query layer retry would redo the whole query and hence the
transaction block too, which might not be idempotent. Something
like the below where the second update faces a kConflict:

"begin ... repeatable read; update ...; commit; update ...;"

This diff also does cosmetic changes to log messages related to
query layer retries and error messages that are sent back to the
external client.
Jira: DB-10258

Test Plan:
Jenkins

Added yb_query_layer_retries_for_a_multi_statement_query to the yb_pg_isolation_schedule

Reviewers: tfoucher, patnaik.balivada, ishan.chhangani, shubhankar.shastri

Reviewed By: tfoucher

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D33860
pkj415 added a commit that referenced this issue Apr 30, 2024
…tatement queries

Summary:
Query layer retries should be blocked for multi-statement queries.
This is because they might have transaction blocks within them that
had committed before a statement faces a kConflict, and a
query layer retry would redo the whole query and hence the
transaction block too, which might not be idempotent. Something
like the below where the second update faces a kConflict:

"begin ... repeatable read; update ...; commit; update ...;"

This diff also does cosmetic changes to log messages related to
query layer retries and error messages that are sent back to the
external client.
Jira: DB-10258

Original commit: b72433e / D33860

Test Plan:
Jenkins

Added yb_query_layer_retries_for_a_multi_statement_query to the yb_pg_isolation_schedule

Reviewers: tfoucher, patnaik.balivada

Reviewed By: patnaik.balivada

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D34633
@pkj415 pkj415 closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
Status: Done
Development

No branches or pull requests

4 participants