Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic Regions] after br full restored to multirocksdb cluster failed and tikv restarted, tikv panic #14946

Closed
AkiraXie opened this issue Jun 14, 2023 · 5 comments

Comments

@AkiraXie
Copy link

Bug Report

What version of TiKV are you using?

sh-5.1# /tikv-server -V
TiKV
Release Version: 7.2.0-alpha
Edition: Community
Git Commit Hash: a24d9d6
Git Commit Branch: heads/refs/tags/v7.2.0-alpha
UTC Build Time: 2023-05-29 11:55:46
Rust Version: rustc 1.67.0-nightly (96ddd32c4 2022-11-14)
Enable Features: pprof-fp jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure
Profile: dist_release

What operating system and CPU are you using?

Steps to reproduce

  1. br restored to multirocksdb cluster
  2. restore failed because of checksum timeout
  3. edit tikv.end-point-request-max-handle-duration and tikv restarted

What did you expect?

What did happened?

"[lib.rs:497] ["corrupted sst [error=EngineTraits(Engine(Status { code: IoError, sub_code: None, sev: NoError, state: \"IO error: No such file or directory: while stat a file for size: /var/lib/tikv/data/import/5b8a2cd4-a37d-45fd-8773-875fb10a1e57_94143_55_679_write.sst: No such file or directory\" }))] [sst=uuid: 5B8A2CD4A37D45FD8773875FB10A1E57 range { start: 7480000000000113FF845F698000000000FF0000020400000000FF05F5E10007800000FF00000000000419ABFF7400000000000000FD end: 7480000000000113FF845F698000000000FF0000020400000000FF05F5E1030780004EFF869907A8000419ABFF7400000000000000FD } length: 108035 cf_name: \"write\" region_id: 94143 region_epoch { conf_ver: 55 version: 679 } cipher_iv: 37920A9CC24F03EA487BBCEAAF5A8C0D] [peer_id=94147] [region_id=94143]"] [backtrace=" 0: tikv_util::set_panic_hook::{{closure}}\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/tikv_util/src/lib.rs:496:18\n 1: <alloc::boxed::Box<F,A> as core::ops::function::Fn>::call\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2032:9\n std::panicking::rust_panic_with_hook\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:692:13\n 2: std::panicking::begin_panic_handler::{{closure}}\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:579:13\n 3: std::sys_common::backtrace::__rust_end_short_backtrace\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:137:18\n 4: rust_begin_unwind\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:575:5\n 5: core::panicking::panic_fmt\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:65:14\n 6: raftstore_v2::operation::command::write::ingest::<impl raftstore_v2::raft::apply::Apply<EK,R>>::apply_ingest\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/raftstore-v2/src/operation/command/write/ingest.rs:136:21\n raftstore_v2::operation::command::<impl raftstore_v2::raft::apply::Apply<EK,R>>::apply_entry::{{closure}}\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/raftstore-v2/src/operation/command/mod.rs:642:33\n 7: <core::future::from_generator::GenFuture as core::future::future::Future>::poll\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:91:19\n raftstore_v2::operation::command::<impl raftstore_v2::raft::apply::Apply<EK,R>>::apply_committed_entries::{{closure}}\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/raftstore-v2/src/operation/command/mod.rs:567:61\n <core::future::from_generator::GenFuture as core::future::future::Future>::poll\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:91:19\n raftstore_v2::fsm::apply::ApplyFsm<EK,R>::handle_all_tasks::{{closure}}\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/raftstore-v2/src/fsm/apply.rs:143:94\n 8: <core::future::from_generator::GenFuture as core::future::future::Future>::poll\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:91:19\n raftstore_v2::operation::command::<impl raftstore_v2::raft::peer::Peer<EK,ER>>::schedule_apply_fsm::{{closure}}\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/raftstore-v2/src/operation/command/mod.rs:161:61\n <core::future::from_generator::GenFuture as core::future::future::Future>::poll\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:91:19\n <tracker::tls::TrackedFuture as core::future::future::Future>::poll::{{closure}}\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/tracker/src/tls.rs:64:23\n std::thread::local::LocalKey::try_with\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:446:16\n std::thread::local::LocalKey::with\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:422:9\n <tracker::tls::TrackedFuture as core::future::future::Future>::poll\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/tracker/src/tls.rs:62:9\n tikv_util::yatp_pool::future_pool::PoolInner::spawn::{{closure}}\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/tikv_util/src/yatp_pool/future_pool.rs:167:27\n <core::future::from_generator::GenFuture as core::future::future::Future>::poll\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:91:19\n yatp::task::future::RawTask::poll\n at rust/git/checkouts/yatp-e704b73c3ee279b6/5523a9a/src/task/future.rs:59:9\n 9: yatp::task::future::TaskCell::poll\n at rust/git/checkouts/yatp-e704b73c3ee279b6/5523a9a/src/task/future.rs:103:9\n <yatp::task::future::Runner as yatp::pool::runner::Runner>::handle\n at rust/git/checkouts/yatp-e704b73c3ee279b6/5523a9a/src/task/future.rs:387:20\n 10: <tikv_util::yatp_pool::YatpPoolRunner as yatp::pool::runner::Runner>::handle\n at home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/tikv_util/src/yatp_pool/mod.rs:193:24\n yatp::pool::worker::WorkerThread<T,R>::run\n at rust/git/checkouts/yatp-e704b73c3ee279b6/5523a9a/src/pool/worker.rs:48:13\n yatp::pool::builder::LazyBuilder::build::{{closure}}\n at rust/git/checkouts/yatp-e704b73c3ee279b6/5523a9a/src/pool/builder.rs:114:25\n std::sys_common::backtrace::rust_begin_short_backtrace\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:121:18\n 11: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:551:17\n <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:271:9\n std::panicking::try::do_call\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:483:40\n std::panicking::try\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:447:19\n std::panic::catch_unwind\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:137:14\n std::thread::Builder::spawn_unchecked::{{closure}}\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:550:30\n core::ops::function::FnOnce::call_once{{vtable.shim}}\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:513:5\n 12: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2000:9\n <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2000:9\n std::sys::unix::thread::Thread::new::thread_start\n at rust/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/thread.rs:108:17\n 13: start_thread\n 14: __GI___clone\n"] [location=/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tikv/components/raftstore-v2/src/operation/command/write/ingest.rs:136] [thread_name=apply-0]"

@AkiraXie
Copy link
Author

AkiraXie commented Jun 14, 2023

/severity critical

@tonyxuqqi
Copy link
Contributor

checksum timeout threshold cannot be simply increased to a very large value.
This should be dependent on region size.

@AkiraXie
Copy link
Author

/feature developing

@3pointer
Copy link
Contributor

#15032 may fix it

@jebter jebter added the type/bug Type: Issue - Confirmed a bug label Jul 7, 2023
@tonyxuqqi
Copy link
Contributor

#15064 will fix it.

@AkiraXie AkiraXie changed the title after br full restored to multirocksdb cluster failed and tikv restarted, tikv panic [Dynamic Regions] after br full restored to multirocksdb cluster failed and tikv restarted, tikv panic Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants