Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] PgClient session has a default timeout of 120 secs #16157

Closed
aabraham-yb opened this issue Feb 20, 2023 · 5 comments
Closed

[YSQL] PgClient session has a default timeout of 120 secs #16157

aabraham-yb opened this issue Feb 20, 2023 · 5 comments

Comments

@aabraham-yb
Copy link

aabraham-yb commented Feb 20, 2023

Jira Link: DB-5591

Description

The timeout value for PgClient is not being set and a default of 120 secs is being set always.

So the value passed when the cluster is being created is ignored. The below will still result in the client timing out only after 120 secs.

 --tserver_flags 'ysql_pg_conf_csv="statement_timeout=4000"

e7cc94a modified the way we handle session timeout and fbcb98e removed the introduced logic.

Previous Description

A session blocked by another session times out with the message below after waiting for 120s. This is not in line with the expected behavior of waiting until the lock is resolved (by either commit or rollback by the blocking session. This is inspite of setting statement_timeout to 0 which mean there is no defined timeout.

ERROR: Timed out: Perform RPC (request call id 71) to 172.151.28.116:9100 timed out after 120.000s

This behavior is only observed when read committed isolation is used with "wait queues" implemented pessimistic locking. When read committed isolation is used with pessimistic locking without wait queues, the blocked session does not time out which is the desired behaviour.

Steps to reproduce:

YB Build & G-flags used:


YB Build: 2.17.2.0-b90

ysql_enable_packed_row = true 
enable_automatic_tablet_splitting = true
yb_enable_read_committed_isolation = true
enable_deadlock_detection = true
enable_wait_queues = true

a) create table

create table test (k int primary key, v int);
insert into test values (1, 1);

b) In session 1,

show statement_timeout;
SELECT yb_get_effective_transaction_isolation_level();

begin;
update test set v=2 where k=1;
< do NOT commit >

c) In session 2,

show statement_timeout;
SELECT yb_get_effective_transaction_isolation_level();

begin;
update test set v=3 where k=1;
< starts waiting and then times out at 120s >
@aabraham-yb aabraham-yb added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Feb 20, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue area/docdb YugabyteDB core features and removed area/ysql Yugabyte SQL (YSQL) labels Feb 20, 2023
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Feb 27, 2023
@bmatican
Copy link
Contributor

@Huqicheng can you confirm if this is basically the same packed full row update issue?

@rthallamko3
Copy link
Contributor

@basavaraj29 / @robertsami , Can we check if this fails in master too? The full row update optimization in PackedRow has been disabled for now. So would be good to understand if this is still an issue and a blocker for 2.18.0 timeframe.

@basavaraj29
Copy link
Contributor

yes, the statements timeout on current master. here's the log.

2023-04-17 16:24:58.800 UTC [4185959] ERROR:  Wrong number of responses: 0, while 1 expected: 0 vs 1
2023-04-17 16:24:58.800 UTC [4185959] STATEMENT:  UPDATE test1_1 SET other_value=other_value+10 WHERE value=1
[ts-1] W0417 16:24:58.830046 4185614 long_operation_tracker.cc:150] ScopedRWOperation took a long time: 120.037s
[ts-1] W0417 16:24:58.830142 4185614 long_operation_tracker.cc:150] ScopedRWOperation took a long time: 120.037s
[ts-1] W0417 16:24:58.830420 4186277 tablet_rpc.cc:460] Internal error (yb/docdb/wait_queue.cc:1249): Failed Read(tablet: 53f3d82ab45542efbd99b9b68dfc1623, num_ops: 1, num_attempts: 1, txn: efc2eba8-33ea-4b73-b168-3c3cb316ea65, subtxn: [none]) to tablet 53f3d82ab45542efbd99b9b68dfc1623 on tablet server { uuid: 6929c496d3d94238ad6938124282b2fd private: [host: "127.0.0.2" port: 28928] public: [host: "127.0.0.3.ip.yugabyte" port: 28928] cloud_info: placement_cloud: "cloud0" placement_region: "rack1" placement_zone: "zone" after 1 attempt(s): Transaction efc2eba8-33ea-4b73-b168-3c3cb316ea65 was aborted while waiting for locks
[ts-1] W0417 16:24:58.831029 4186277 yb_rpc.cc:340] Call yb.tserver.PgClientService.Perform 127.0.0.1:41959 => 127.0.0.2:28928 (request call id 80) took 120038ms (client timeout 120000ms).
[ts-1] W0417 16:24:58.831115 4186277 yb_rpc.cc:344] Trace:
..  0417 16:22:58.792151 (+     0us) inbound_call.cc:80] Created InboundCall
..  0417 16:22:58.792151 (+     0us) service_pool.cc:184] Inserting onto call queue
..  0417 16:22:58.792151 (+     0us) service_pool.cc:262] Handling call Perform
..  0417 16:24:58.831524 (+120039373us) inbound_call.cc:202] Queueing success response
[m-1] I0417 16:25:22.078035 4185263 cluster_balance_util.cc:448] tablet server 4d454f633e1844f3ad7f2e57c53a80c6 has placement info incompatible with tablet 23a91cc6fcb5452991656f8dfc25a30f. Not allowing it to host this tablet.
2023-04-17 16:25:28.883 UTC [4185451] LOG:  received fast shutdown request

@robertsami robertsami added this to Needs Triage in Wait-Queue Based Locking via automation Apr 17, 2023
@robertsami robertsami moved this from Needs Triage to GA Blocking in Wait-Queue Based Locking Apr 17, 2023
@robertsami robertsami moved this from EA Blocking to In progress in Wait-Queue Based Locking May 8, 2023
@robertsami robertsami assigned basavaraj29 and unassigned robertsami May 8, 2023
@basavaraj29 basavaraj29 changed the title [YSQL] Blocked session times out even with statement_timeout set to 0 with wait queues enabled pessimistic locking [YSQL] PgClient session has a default timeout of 120 secs May 9, 2023
basavaraj29 added a commit that referenced this issue May 17, 2023
Summary:
PgClient session times out after a default value of 120 secs as the timeout value is not being set. Setting the timeout value explicitly after initializing a Pg session.

Note: We still don't fully support the feature of a pg statement not timing out when `statement_timeout=0` is set. Instead, the rpc times out after the 600 secs. If required, we can set the timeout to `MonoDelta::kMax` which will give us a greater timeout for the rpc.

Test Plan: ./yb_build.sh --gtest_filter PgBackendsTest.TestStatementTimeout

Reviewers: zyu, pjain, myang, dmitry

Reviewed By: myang, dmitry

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D25195
jasonyb pushed a commit that referenced this issue May 18, 2023
Summary:
Commit 0c45e0c, titled

    [#16157] YSQL: Set PgClient session timeout value

adds test TestStatementTimeout in pg_backends-test.cc.  It does not
belong there.

PgBackendsTest is for tests related to wait on backends catalog version.
The class itself sets flags related to that.  It does not make sense for
the newly added test to subclass that test since it doesn't make use of
anything there.

Move the test to PgLibPqTest, which is where general YSQL C++ external
mini cluster tests go.

Close: #17433

Test Plan:
    ./yb_build.sh fastdebug --gcc11 \
      --cxx-test pgwrapper_pg_libpq-test \
      --gtest_filter PgLibPqTest.TestStatementTimeout

Reviewers: bkolagani, dmitry, myang

Reviewed By: bkolagani, myang

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D25480
@rthallamko3
Copy link
Contributor

@m-iancu , Do you think this needs to be backported all the way to 2.14? Currently we are tracking the backports. But wanted to check if you think it is not needed etc.

@robertsami robertsami moved this from In progress to Needs Backport in Wait-Queue Based Locking Jun 1, 2023
@rthallamko3
Copy link
Contributor

@basavaraj29 , to add the link to Fiza's fix for honoring client side timeout issue. As such, we may not need to backport this all the way back to 2.14.

basavaraj29 added a commit that referenced this issue Jun 7, 2023
Summary:
PgClient session times out after a default value of 120 secs as the timeout value is not being set. Setting the timeout value explicitly after initializing a Pg session.

Note: We still don't fully support the feature of a pg statement not timing out when `statement_timeout=0` is set. Instead, the rpc times out after the 600 secs. If required, we can set the timeout to `MonoDelta::kMax` which will give us a greater timeout for the rpc.

Original commit: 0c45e0c/D25195
Jira: DB-5591

Test Plan: Skipping the unit test as it has a few more dependencies which aren't present in 2.16

Reviewers: rthallam, pjain, myang, dmitry

Reviewed By: myang

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D26025
basavaraj29 added a commit that referenced this issue Jun 7, 2023
Summary:
PgClient session times out after a default value of 120 secs as the timeout value is not being set. Setting the timeout value explicitly after initializing a Pg session.

Note: We still don't fully support the feature of a pg statement not timing out when `statement_timeout=0` is set. Instead, the rpc times out after the 600 secs. If required, we can set the timeout to `MonoDelta::kMax` which will give us a greater timeout for the rpc.

Original commit: 0c45e0c/D25195
Jira: DB-5591

Test Plan: Skipping the unit test as it has a few more dependencies which aren't present in 2.16

Reviewers: rthallam, pjain, myang, dmitry

Reviewed By: myang

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D26040
basavaraj29 added a commit that referenced this issue Jun 9, 2023
Summary:
PgClient session times out after a default value of 120 secs as the timeout value is not being set. Setting the timeout value explicitly after initializing a Pg session.

Note: We still don't fully support the feature of a pg statement not timing out when `statement_timeout=0` is set. Instead, the rpc times out after the 600 secs. If required, we can set the timeout to `MonoDelta::kMax` which will give us a greater timeout for the rpc.

Original commits: 0c45e0c/D25195 and 7968bd5/D25480
Jira: DB-5591

Test Plan: ./yb_build.sh --gtest_filter PgLibPqTest.TestStatementTimeout

Reviewers: rthallam, pjain, myang, dmitry

Reviewed By: myang

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D26022
Wait-Queue Based Locking automation moved this from Needs Backport to Done Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

7 participants