Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL][LST] Upgrade: Postgres Crash in FinishTransaction #12651

Open
def- opened this issue May 25, 2022 · 1 comment
Open

[YSQL][LST] Upgrade: Postgres Crash in FinishTransaction #12651

def- opened this issue May 25, 2022 · 1 comment
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/medium Medium priority issue qa_automation Bugs identified via itest-system, LST, Stress automation or causing automation failures qa_lst Bugs identified using lst automation

Comments

@def-
Copy link
Contributor

def- commented May 25, 2022

Jira Link: DB-542

Description

LST crashes postgres during upgrade from 2.8.5.0-b22 to 2.13.3.0-b41:

[New LWP 17365]
[New LWP 25009]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/home/yugabyte/yb-software/yugabyte-2.13.3.0-b41-centos-x86_64/linuxbrew/lib/libthread_db.so.1".
Core was generated by `postgres: yugabyte db_lst_272012 10.150.1.16(56472) idle in transaction       '.
Program terminated with signal 11, Segmentation fault.
#0  FinishTransaction (this=0x0, commit=..., ddl_mode=...) at ../../src/yb/yql/pggate/pg_client.cc:199
199	../../src/yb/yql/pggate/pg_client.cc: No such file or directory.

Thread 2 (Thread 0x7fb2dccda700 (LWP 25009)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
No locals.
#1  0x00007fb2eed627eb in std::__1::condition_variable::__do_timed_wait(std::__1::unique_lock<std::__1::mutex>&, std::__1::chrono::time_point<std::__1::chrono::system_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) () from /home/yugabyte/yb-software/yugabyte-2.13.3.0-b41-centos-x86_64/postgres/../lib/yb-thirdparty/libc++.so.1
No symbol table info available.
#2  0x00007fb2ea18ecc0 in wait_for<long long, std::ratio<1, 1000000000> > (this=0x7fb2ea293e18 <yb::(anonymous namespace)::LongOperationTrackerHelper::Instance()::result+72>, __lk=..., __d=...) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/__mutex_base:465
        __c_now = {__d_ = {__rep_ = 771427310323}}
        __now_count_ns = <optimized out>
        __d_ns_count = <optimized out>
#3  yb::(anonymous namespace)::LongOperationTrackerHelper::Execute() (this=<optimized out>) at ../../src/yb/util/debug/long_operation_tracker.cc:104
        first_entry_time = <optimized out>
        now = <optimized out>
        operation = <optimized out>
        lock = {__m_ = 0x7fb2ea293df0 <yb::(anonymous namespace)::LongOperationTrackerHelper::Instance()::result+32>, __owns_ = true}
#4  0x00007fb2ea18f22d in __invoke<void (yb::(anonymous namespace)::LongOperationTrackerHelper::*&)(), yb::(anonymous namespace)::LongOperationTrackerHelper *&, void> (__f=<optimized out>, __a0=@0x31d7b38: 0x7fb2ea293dd0 <yb::(anonymous namespace)::LongOperationTrackerHelper::Instance()::result>) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/type_traits:3635
No locals.
#5  __apply_functor<void (yb::(anonymous namespace)::LongOperationTrackerHelper::*)(), std::tuple<yb::(anonymous namespace)::LongOperationTrackerHelper *>, 0, std::tuple<> > (__f=<optimized out>, __bound_args=..., __args=<optimized out>) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/functional:2857
No locals.
#6  operator()<> (this=0x31d7b28) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/functional:2890
No locals.
#7  __invoke<std::__bind<void (yb::(anonymous namespace)::LongOperationTrackerHelper::*)(), yb::(anonymous namespace)::LongOperationTrackerHelper *>> (__f=<unknown type in /home/yugabyte/yb-software/yugabyte-2.13.3.0-b41-centos-x86_64/lib/yb/libyb_util.so, CU 0x14fc6c, DIE 0x1543c6>) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/type_traits:3694
No locals.
#8  __thread_execute<std::unique_ptr<std::__thread_struct>, std::__bind<void (yb::(anonymous namespace)::LongOperationTrackerHelper::*)(), yb::(anonymous namespace)::LongOperationTrackerHelper *>> (__t=...) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/thread:280
No locals.
#9  std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::__bind<void (yb::(anonymous namespace)::LongOperationTrackerHelper::*)(), yb::(anonymous namespace)::LongOperationTrackerHelper*> > >(void*) (__vp=0x31d7b20) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/thread:291
        __p = {__ptr_ = {<> = {__value_ = 0x31d7b20}, <> = {<> = {<No data fields>}, <No data fields>}, <No data fields>}}
#10 0x00007fb2ee8cb694 in start_thread (arg=0x7fb2dccda700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fb2dccda700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140406185371392, 7342040687536282378, 0, 140724752391903, 26, 140406185371392, -7313403235877741814, -7313293829549155574}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#11 0x00007fb2ee00841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread 1 (Thread 0x7fb2e994a1c0 (LWP 17365)):
#0  FinishTransaction (this=0x0, commit=..., ddl_mode=...) at ../../src/yb/yql/pggate/pg_client.cc:199
        req = <incomplete type>
        resp = <incomplete type>
#1  yb::pggate::PgClient::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>, yb::StronglyTypedBool<yb::pggate::DdlMode_Tag>) (this=<optimized out>, commit=..., ddl_mode=...) at ../../src/yb/yql/pggate/pg_client.cc:564
No locals.
#2  0x00007fb2eef9ff2c in yb::pggate::PgTxnManager::FinishTransaction(yb::StronglyTypedBool<yb::pggate::Commit_Tag>) (this=0x32d6960, commit=...) at ../../src/yb/yql/pggate/pg_txn_manager.cc:356
        vlocal__ = 0x7fb2e9f99970 <google::kLogSiteUninitialized>
        vlocal__ = 0x7fb2e9fa9ef4 <fLI::FLAGS_v>
        vlocal__ = 0x7fb2e9fa9ef4 <fLI::FLAGS_v>
        vlocal__ = 0x7fb2e9fa9ef4 <fLI::FLAGS_v>
        status = {state_ = {px = 0x0}}
#3  0x00007fb2eef9f854 in yb::pggate::PgTxnManager::~PgTxnManager() (this=0x32d6960) at ../../src/yb/yql/pggate/pg_txn_manager.cc:363
No locals.
#4  0x00007fb2eef9fa5e in yb::pggate::PgTxnManager::~PgTxnManager() (this=0x32d6960) at ../../src/yb/yql/pggate/pg_txn_manager.cc:148
No locals.
#5  0x00007fb2eef797ad in yb::pggate::PgSession::~PgSession() (this=0x3090f00) at ../../src/yb/gutil/ref_counted.h:226
No locals.
#6  0x00007fb2eef797fe in yb::pggate::PgSession::~PgSession() (this=0x3090f00) at ../../src/yb/yql/pggate/pg_session.cc:347
No locals.
#7  0x00007fb2eef8f47c in ~PgDmlRead (this=0x332e1a0) at ../../src/yb/yql/pggate/pg_dml_read.cc:105
No locals.
#8  ~PgSelect (this=0x332e1a0) at ../../src/yb/yql/pggate/pg_select.cc:38
No locals.
#9  yb::pggate::PgSelect::~PgSelect() (this=0x332e1a0) at ../../src/yb/yql/pggate/pg_select.cc:37
No locals.
#10 0x00007fb2eefa6094 in clear_and_dispose<std::default_delete<yb::pggate::PgMemctx::Registrable> > (this=0x30bdf08, disposer=...) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/memory:1423
        to_erase = <optimized out>
        itend = {static stateful_value_traits = false, members_ = {nodeptr_ = 0x30bdf10}}
        it = {static stateful_value_traits = false, members_ = {nodeptr_ = 0x30bdf10}}
#11 yb::pggate::PgMemctx::Clear() (this=0x30bded8) at ../../src/yb/yql/pggate/pg_memctx.cc:80
No locals.
#12 0x00007fb2eefa5ebc in yb::pggate::PgMemctx::~PgMemctx() (this=0x30bded8) at ../../src/yb/yql/pggate/pg_memctx.cc:29
No locals.
#13 0x00007fb2eefa7747 in __deallocate_node (__np=0x32ed0e0, this=<optimized out>) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/memory:2501
No locals.
#14 clear (this=<optimized out>) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/__hash_table:1826
        __bc = <optimized out>
#15 clear (this=<optimized out>) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20220505181632-c1907426fc-almalinux8-x86_64-clang12-linuxbrew/installed/uninstrumented/libcxx/include/c++/v1/unordered_map:1274
No locals.
#16 yb::pggate::ClearGlobalPgMemctxMap() () at ../../src/yb/yql/pggate/pg_memctx.cc:99
No locals.
#17 0x00007fb2eef4c5b5 in YBCDestroyPgGate () at ../../src/yb/yql/pggate/ybc_pggate.cc:144
        local_pgapi = 0x31b7180
        vlocal__ = 0x7fb2e9f99970 <google::kLogSiteUninitialized>
#18 0x000000000091d25f in YBOnPostgresBackendShutdown () at /nfusr/alma8-gcp-cloud/jenkins-worker-gphr1f/jenkins/jenkins-github-yugabyte-db-alma8-master-clang12-release-482/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:550
No locals.
#19 proc_exit (code=1) at ../../../../../../../src/postgres/src/backend/storage/ipc/ipc.c:152
No locals.
#20 0x0000000000aeca26 in errfinish (dummy=<optimized out>) at ../../../../../../../src/postgres/src/backend/utils/error/elog.c:579
        elevel = 21
        oldcontext = 0x3358000
        econtext = 0x0
#21 0x0000000000b1fd68 in HandleYBStatusAtErrorLevel (status=<optimized out>, error_level=<optimized out>) at ../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:459
        msg_buf = <optimized out>
        pg_err_code = <optimized out>
        txn_err_code = <optimized out>
#22 0x0000000000b20de3 in HandleYBStatus (status=0x2) at ../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:420
No locals.
#23 YBCRollbackSubTransaction (id=<optimized out>) at ../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:599
No locals.
#24 0x000000000059968c in AbortSubTransaction () at ../../../../../../../src/postgres/src/backend/access/transam/xact.c:5146
        s = 0x36b2608
#25 0x000000000059bf35 in AbortOutOfAnyTransaction () at ../../../../../../../src/postgres/src/backend/access/transam/xact.c:4737
        s = 0x36b2608
#26 0x0000000000b04169 in ShutdownPostgres (code=148938712, arg=52130824) at ../../../../../../../src/postgres/src/backend/utils/init/postinit.c:1268
No locals.
#27 0x000000000091d418 in shmem_exit (code=1) at ../../../../../../../src/postgres/src/backend/storage/ipc/ipc.c:243
No locals.
#28 0x000000000091d303 in proc_exit_prepare (code=1) at ../../../../../../../src/postgres/src/backend/storage/ipc/ipc.c:198
No locals.
#29 0x000000000091d24d in proc_exit (code=1) at ../../../../../../../src/postgres/src/backend/storage/ipc/ipc.c:108
No locals.
#30 0x0000000000aeca26 in errfinish (dummy=<optimized out>) at ../../../../../../../src/postgres/src/backend/utils/error/elog.c:579
        elevel = 21
        oldcontext = 0x31e0000
        econtext = 0x0
#31 0x000000000094ba64 in ProcessInterrupts () at ../../../../../../src/postgres/src/backend/tcop/postgres.c:2989
No locals.
#32 0x00000000007ac892 in secure_read (port=0x30501e0, ptr=0xb70440 <PqRecvBuffer>, len=8192) at /nfusr/alma8-gcp-cloud/jenkins-worker-gphr1f/jenkins/jenkins-github-yugabyte-db-alma8-master-clang12-release-482/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:553
        waitfor = <optimized out>
        n = <optimized out>
#33 0x00000000007be8f2 in pq_recvbuf () at ../../../../../../src/postgres/src/backend/libpq/pqcomm.c:1005
No locals.
#34 0x000000000094e38a in PostgresMain (argc=1, argv=<optimized out>, dbname=<optimized out>, username=0x31c7378 "yugabyte") at /nfusr/alma8-gcp-cloud/jenkins-worker-gphr1f/jenkins/jenkins-github-yugabyte-db-alma8-master-clang12-release-482/yugabyte-db/src/postgres/src/backend/libpq/../../../../../../src/postgres/src/backend/libpq/pqcomm.c:1048
        input_message = {data = 0x31e0118 "", len = 0, maxlen = 1024, cursor = 0}
        local_sigjmp_buf = {{__jmpbuf = {140724752393712, -7341365633198548214, 1, 52196520, 8192, 8388608, -7341365633345348854, 7342039689919896330}, __mask_was_saved = 1, __saved_mask = {__val = {0, 0, 0, 0, 0, 0, 0, 0, 140402749341697, 18374686479671623680, 0, 0, 0, 0, 0, 0}}}}
        send_ready_for_query = false
        disable_idle_in_transaction_timeout = false
        firstchar = <optimized out>
#35 0x00000000008ab34b in BackendRun (port=0x30501e0) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4555
        maxac = <optimized out>
        ac = <optimized out>
        av = 0x31c74a8
        i = <optimized out>
#36 0x00000000008aaa5a in ServerLoop () at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:4198
        readmask = {fds_bits = {40, 0 <repeats 15 times>}}
        last_lockfile_recheck_time = <optimized out>
        last_touch_time = <optimized out>
        nSockets = 6
#37 0x00000000008a70b1 in PostmasterMain (argc=<optimized out>, argv=0x3066780) at ../../../../../../src/postgres/src/backend/postmaster/postmaster.c:1423
        output_config_variable = 0x31e0468 "17465912)'::NUMRANGE)), (44.85425264836053)))) FROM t2 ORDER BY 3 ASC LIMIT 15 OFFSET 18;"
        listen_addr_saved = <optimized out>
        userDoption = 0x3066780 "@Z\f\003"
        opt = <optimized out>
        i = <optimized out>
        status = <optimized out>
#38 0x00000000007c79e3 in PostgresServerProcessMain (argc=23, argv=0x3066780) at ../../../../../../src/postgres/src/backend/main/main.c:234
        do_check_root = <optimized out>
#39 0x00000000004f5302 in main ()
No symbol table info available.

Related bug for materialized views: #11832 (but here no materialized views were used). I will add logs on Jira.

@def- def- added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels May 25, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels May 25, 2022
@def-
Copy link
Contributor Author

def- commented May 25, 2022

Also happened with master flag default_memory_limit_to_ram_ratio set to 0.2

@def- def- changed the title [YSQL][LST] Upgrade: Segmentation fault in FinishTransaction [YSQL][LST] Upgrade: Crash in FinishTransaction Jun 1, 2022
@def- def- changed the title [YSQL][LST] Upgrade: Crash in FinishTransaction [YSQL][LST] Upgrade: Postgres Crash in FinishTransaction Jul 12, 2022
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jul 27, 2022
@kripasreenivasan kripasreenivasan added the qa_automation Bugs identified via itest-system, LST, Stress automation or causing automation failures label Sep 13, 2022
@yugabyte-ci yugabyte-ci added the kind/failing-test Tests and testing infra label Oct 12, 2022
@Arjun-yb Arjun-yb added the qa_lst Bugs identified using lst automation label Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/medium Medium priority issue qa_automation Bugs identified via itest-system, LST, Stress automation or causing automation failures qa_lst Bugs identified using lst automation
Projects
None yet
Development

No branches or pull requests

5 participants