You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#7028 introduced a feature that the PD client reconnects to the PD cluster periodically in the PD client thread. However, if other threads call PD sync requests which hold a read lock and wait for the PD client thread completing the request, the deadlock occours because the PD client thread wants to acquire the write lock.
Some threads' backtrace:
addr-resolver
Thread 49 (Thread 0x7f2f30dff700 (LWP 19448)):
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7f2f304020ec) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f2f30402090, cond=0x7f2f304020c0) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7f2f304020c0, mutex=0x7f2f30402090) at pthread_cond_wait.c:638
#3 0x0000562317f28bb2 in futures::task_impl::std::ThreadNotify::park::hb9c6d2336011224b (self=0x7f2f30441010) at /rustc/16957bd4d3a5377263f76ed74c572aad8e4b7e59/src/libstd/sys/unix/condvar.rs:69
#4 0x000056231800dc04 in kvproto::protos::pdpb_grpc::PdClient::get_store_opt::h1d3ede4757ceaee5 (self=<optimized out>, req=<optimized out>, opt=...) at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/task_impl/std/mod.rs:237
#5 0x000056231805f47e in _$LT$pd_client..client..RpcClient$u20$as$u20$pd_client..PdClient$GT$::get_store::h1116aaaf5646e934 (self=<optimized out>, store_id=<optimized out>) at components/pd_client/src/client.rs:268
#6 0x0000562317d6992c in std::sys_common::backtrace::__rust_begin_short_backtrace::h911d4b4d33daad2e (f=...) at /home/jenkins/agent/workspace/tikv_ghpr_build_release/tikv/src/server/resolve.rs:76
#7 0x0000562317d68cfc in core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h8aa8753cfbf47b27 () at /rustc/16957bd4d3a5377263f76ed74c572aad8e4b7e59/src/libstd/thread/mod.rs:475
#8 0x000056231849b03a in _$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h13d34828db364579 () at /rustc/16957bd4d3a5377263f76ed74c572aad8e4b7e59/src/liballoc/boxed.rs:1078
#9 _$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hd51b619e0f884abf () at /rustc/16957bd4d3a5377263f76ed74c572aad8e4b7e59/src/liballoc/boxed.rs:1078
#10 std::sys::unix::thread::Thread::new::thread_start::h02c6e34c2c73f344 () at src/libstd/sys/unix/thread.rs:87
#11 0x00007f2f4ac03ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f2f4ab08daf in clone () from /lib/x86_64-linux-gnu/libc.so.6
pd-0
Thread 46 (Thread 0x7f2f333fc700 (LWP 18989)):
#0 futex_abstimed_wait (private=0, abstime=0x0, clockid=0, expected=2, futex_word=<optimized out>) at ../sysdeps/nptl/futex-internal.h:284
#1 __pthread_rwlock_wrlock_full (abstime=0x0, clockid=0, rwlock=0x7f2f3b899580) at pthread_rwlock_common.c:830
#2 __GI___pthread_rwlock_wrlock (rwlock=0x7f2f3b899580) at pthread_rwlock_wrlock.c:27
#3 0x000056231804f5b6 in _$LT$std..sync..rwlock..RwLock$LT$T$GT$$u20$as$u20$tikv_util..HandyRwLock$LT$T$GT$$GT$::wl::h2609cd093c320029 (self=0x7f2f32c09010) at /rustc/16957bd4d3a5377263f76ed74c572aad8e4b7e59/src/libstd/sys/unix/rwlock.rs:73
#4 0x000056231804bee2 in _$LT$core..future..from_generator..GenFuture$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hc8b2c9fb4410b36c (self=..., cx=0x7f2f333db538) at components/pd_client/src/util.rs:191
#5 0x000056231804ac2e in _$LT$futures_util..future..future..unit_error..UnitError$LT$Fut$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hd9b722a5e3118f92 (self=..., cx=0x7f2f333db538) at components/pd_client/src/client.rs:84
#6 0x000056231804a93f in _$LT$futures_util..compat..compat03as01..Compat$LT$Fut$GT$$u20$as$u20$futures..future..Future$GT$::poll::hf842b363148bae94 (self=0x7f2f4a43d0c0) at /rustc/16957bd4d3a5377263f76ed74c572aad8e4b7e59/src/libcore/future/future.rs:119
#7 0x0000562317f3ebe8 in grpcio::task::executor::poll::h72a0e0c492811446 (cq=<optimized out>, task=..., woken=<optimized out>) at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/future/mod.rs:113
#8 0x0000562317f3d9e8 in std::sys_common::backtrace::__rust_begin_short_backtrace::h02fc2e0ce18af3bb (f=...) at /rust/registry/src/github.com-1ecc6299db9ec823/grpcio-0.5.3/src/task/executor.rs:136
#9 0x0000562317f3b765 in core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hd6ffd35e8771791c () at /rustc/16957bd4d3a5377263f76ed74c572aad8e4b7e59/src/libstd/thread/mod.rs:475
#10 0x000056231849b03a in _$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::h13d34828db364579 () at /rustc/16957bd4d3a5377263f76ed74c572aad8e4b7e59/src/liballoc/boxed.rs:1078
#11 _$LT$alloc..boxed..Box$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$A$GT$$GT$::call_once::hd51b619e0f884abf () at /rustc/16957bd4d3a5377263f76ed74c572aad8e4b7e59/src/liballoc/boxed.rs:1078
#12 std::sys::unix::thread::Thread::new::thread_start::h02c6e34c2c73f344 () at src/libstd/sys/unix/thread.rs:87
#13 0x00007f2f4ac03ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007f2f4ab08daf in clone () from /lib/x86_64-linux-gnu/libc.so.6
What did happen?
Deadlock between the PD client thread and other threads calling PD sync requests.
The text was updated successfully, but these errors were encountered:
Search the log file covering the last_heartbeat_ts.
Search the last connected to PD leader in the log file. If there are no logs like the logs below, i.e., connected to PD leader is the last log of util.rs, it occurs.
Bug Report
What version of TiKV are you using?
v4.0.3, v4.0.4, v4.0.5 and master.
Steps to reproduce
#7028 introduced a feature that the PD client reconnects to the PD cluster periodically in the PD client thread. However, if other threads call PD sync requests which hold a read lock and wait for the PD client thread completing the request, the deadlock occours because the PD client thread wants to acquire the write lock.
Some threads' backtrace:
addr-resolver
pd-0
What did happen?
Deadlock between the PD client thread and other threads calling PD sync requests.
The text was updated successfully, but these errors were encountered: