Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL][LST] FATAL: Check failed: _s.ok() Bad status: Network error (yb/rpc/yb_rpc.cc:556): Rpc timeout, passed: 15.101s, timeout: 15.000s, now: 557308.689s, last_read_time_: 557293.588s #13128

Closed
def- opened this issue Jul 1, 2022 · 4 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/high High Priority qa_automation Bugs identified via itest-system, LST, Stress automation or causing automation failures

Comments

@def-
Copy link
Contributor

def- commented Jul 1, 2022

Jira Link: DB-2832

Description

$ cd ~/code/yugabyte-db
$ git checkout 226465305d7db859424e06178723b445b6bf964c
$ ./yb_build.sh release
$ bin/yb-ctl --replication_factor 1 create --tserver_flags=yb_enable_read_committed_isolation=true,ysql_enable_packed_row=true,ysql_num_shards_per_tserver=1,enable_stream_compression=true,stream_compression_algo=1,yb_num_shards_per_tserver=1 --master_flags=yb_enable_read_committed_isolation=true,ysql_enable_packed_row=true,enable_stream_compression=true,stream_compression_algo=1,enable_automatic_tablet_splitting=true,tablet_split_low_phase_shard_count_per_node=40,tablet_split_high_phase_shard_count_per_node=50
$ cd ~/code/yb-long-system-test
$ git checkout 4111e0ef6454788c154799abff72d40faa5be06c
2022-06-30 22:02:48,940 MainThread INFO     Reproduce with: git checkout 4111e0ef && ./long_system_test.py --nodes=127.0.0.1:5433 --threads=10 --complexity=full --runtime=60 --max-columns=10 --seed=207194
2022-06-30 22:02:49,341 MainThread INFO     Database version: PostgreSQL 11.2-YB-2.15.1.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 12.0.1 (https://github.com/yugabyte/llvm-project.git bdb147e675d8c87cee72cc1f87c4b82855977d94), 64-bit
2022-06-30 22:02:49,343 MainThread INFO     Creating tables for database db_lst_207194
2022-06-30 22:02:57,583 MainThread INFO     Starting worker_0: RandomSelectAction, SetConfigAction
2022-06-30 22:02:57,583 MainThread INFO     Starting worker_1: RandomSelectAction, SetConfigAction
2022-06-30 22:02:57,584 MainThread INFO     Starting worker_2: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-06-30 22:02:57,584 MainThread INFO     Starting worker_3: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-06-30 22:02:57,586 MainThread INFO     Starting worker_4: RandomSelectAction, SetConfigAction
2022-06-30 22:02:57,587 MainThread INFO     Starting worker_5: CreateIndexAction, DropIndexAction, SetConfigAction, AddColumnAction
2022-06-30 22:02:57,588 MainThread INFO     Starting worker_6: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-06-30 22:02:57,588 MainThread INFO     Starting worker_7: CreateIndexAction, DropIndexAction, SetConfigAction, AddColumnAction
2022-06-30 22:02:57,589 MainThread INFO     Starting worker_8: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-06-30 22:02:57,590 MainThread INFO     Starting worker_9: RandomSelectAction, SetConfigAction
2022-06-30 22:03:07,600 MainThread INFO     Worker queries/s: [002.1][001.4][004.3][005.5][001.8][001.5][004.4][000.9][003.8][002.0]
[...]

FATAL file:

[deen@devp ~]$ cat yugabyte-data/node-1/disk-1/yb-data/tserver/logs/yb-tserver.FATAL.details.2022-06-30T22_54_31.pid3588947.txt
F20220630 22:54:31 ../../../../../../src/yb/yql/pggate/pggate.cc:1712] Check failed: _s.ok() Bad status: Network error (yb/rpc/yb_rpc.cc:556): Rpc timeout, passed: 15.101s, timeout: 15.000s, now: 557308.689s, last_read_time_: 557293.588s
    @     0x7f493ec0551c  google::LogDestination::LogToSinks()
    @     0x7f493ebffc4f  google::LogMessage::SendToLog()
    @     0x7f493ec00558  google::LogMessage::Flush()
    @     0x7f493ec02e5f  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f49435c5d88  yb::pggate::PgApiImpl::ClearSeparateDdlTxnMode()
    @           0x9c5a46  YBTxnDdlProcessUtility
    @           0x84326a  PortalRunUtility
    @           0x842927  PortalRunMulti
    @           0x8420c6  PortalRun
    @           0x83f8c6  yb_exec_simple_query_impl
    @           0x83fe66  yb_exec_query_wrapper_one_attempt
    @           0x83cee8  PostgresMain
    @           0x7acffc  BackendRun
    @           0x7ac450  ServerLoop
    @           0x7a8abb  PostmasterMain
    @           0x7127fd  PostgresServerProcessMain
    @           0x712ca2  main
    @     0x7f4942ba0825  __libc_start_main
    @           0x4b8569  _start

lst_2022-06-30_22:02:48_207194.zip

@def- def- added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jul 1, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jul 1, 2022
@def- def- added priority/high High Priority and removed priority/medium Medium priority issue labels Jul 7, 2022
@def-
Copy link
Contributor Author

def- commented Jul 7, 2022

Can't connect anymore after this happened:

2022-07-07 14:13:26,153 worker_1   ERROR    connection to server at "127.0.0.2", port 5433 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?

@def-
Copy link
Contributor Author

def- commented Jul 10, 2022

Happens without packed rows too.
Edit: https://drive.google.com/file/d/1noUxPN_kPYvJr2pMYxoYXpMEadfZ8L6m/view?usp=sharing available from within Yugabyte organization
Edit2: Still happens on 054d25b

@yugabyte-ci yugabyte-ci assigned tverona1 and unassigned m-iancu Jul 11, 2022
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jul 11, 2022
@def- def- changed the title [YSQL][LST] Check failed: _s.ok() Bad status: Network error (yb/rpc/yb_rpc.cc:556): Rpc timeout, passed: 15.101s, timeout: 15.000s, now: 557308.689s, last_read_time_: 557293.588s [YSQL][LST] FATAL: Check failed: _s.ok() Bad status: Network error (yb/rpc/yb_rpc.cc:556): Rpc timeout, passed: 15.101s, timeout: 15.000s, now: 557308.689s, last_read_time_: 557293.588s Jul 12, 2022
@kripasreenivasan kripasreenivasan added the qa_automation Bugs identified via itest-system, LST, Stress automation or causing automation failures label Sep 13, 2022
@yugabyte-ci yugabyte-ci added the kind/failing-test Tests and testing infra label Oct 12, 2022
@def-
Copy link
Contributor Author

def- commented Oct 19, 2022

Also happens without concurrent DDLs and tablegroups.

@yugabyte-ci yugabyte-ci added priority/medium Medium priority issue and removed priority/high High Priority labels Mar 2, 2023
@Karvy-yb Karvy-yb added priority/high High Priority and removed priority/medium Medium priority issue labels Mar 15, 2023
@yugabyte-ci yugabyte-ci assigned fizaaluthra and unassigned tverona1 Apr 5, 2023
fizaaluthra added a commit that referenced this issue Apr 26, 2023
Summary:
This diff fixes the error handling in `ClearSeparateDdlTxnMode`. Presently, we don't gracefully handle errors in the function -- we issue a `FATAL` if `ExitSeparateDdlTxnMode` fails.

Original diff that introduced the code: https://phabricator.dev.yugabyte.com/D13244

Test Plan: Manually tested.

Reviewers: dmitry

Reviewed By: dmitry

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D24574
@rthallamko3
Copy link
Contributor

@fizaaluthra , Should we backport this fix to stable branches like 2.16, 2.18 etc? cc @m-iancu , @tverona1

fizaaluthra added a commit that referenced this issue Sep 18, 2023
Summary:
This diff fixes the error handling in `ClearSeparateDdlTxnMode`. Presently, we don't gracefully handle errors in the function -- we issue a `FATAL` if `ExitSeparateDdlTxnMode` fails.

Original diff that introduced the code: https://phabricator.dev.yugabyte.com/D13244

Original commit: 5a6d54a / D24574
Jira: DB-2832

Test Plan: Jenkins

Reviewers: dmitry, tverona

Reviewed By: tverona

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D28574
fizaaluthra added a commit that referenced this issue Sep 18, 2023
Summary:
This diff fixes the error handling in `ClearSeparateDdlTxnMode`. Presently, we don't gracefully handle errors in the function -- we issue a `FATAL` if `ExitSeparateDdlTxnMode` fails.

Original diff that introduced the code: https://phabricator.dev.yugabyte.com/D13244

Original commit: 5a6d54a / D24574
Jira: DB-2832

Test Plan: Jenkins

Reviewers: dmitry, tverona

Reviewed By: tverona

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D28572
fizaaluthra added a commit that referenced this issue Sep 18, 2023
Summary:
This diff fixes the error handling in `ClearSeparateDdlTxnMode`. Presently, we don't gracefully handle errors in the function -- we issue a `FATAL` if `ExitSeparateDdlTxnMode` fails.

Original diff that introduced the code: https://phabricator.dev.yugabyte.com/D13244

Original commit: 5a6d54a / D24574
Jira: DB-2832

Test Plan: Jenkins

Reviewers: dmitry, tverona

Reviewed By: tverona

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D28573
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/high High Priority qa_automation Bugs identified via itest-system, LST, Stress automation or causing automation failures
Projects
None yet
Development

No branches or pull requests

8 participants