Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diagnostic: non-stop cpu expansion causes index out of bounds error #15006

Closed
mornyx opened this issue Jun 26, 2023 · 4 comments · Fixed by #15007
Closed

diagnostic: non-stop cpu expansion causes index out of bounds error #15006

mornyx opened this issue Jun 26, 2023 · 4 comments · Fixed by #15007
Assignees
Labels
affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 affects-6.1 affects-6.5 affects-7.1 affects-7.2 severity/minor type/bug The issue is confirmed as a bug.

Comments

@mornyx
Copy link
Contributor

mornyx commented Jun 26, 2023

Bug Report

What version of TiKV are you using?

master

What operating system and CPU are you using?

Linux x86_64/aarch64

Steps to reproduce

  1. Boot a TiDB cluster.

  2. Execute the following query:

SELECT * FROM information_schema.cluster_hardware;
  1. Expand the number of TiKV host's CPU cores without restarting TiKV process.

  2. Re-execute the query in step-2.

Then TiKV crashed.

If step-3 is not easy to achieve, we can simulate it in the following way:

  1. Copy /proc/stat:
$ cp /proc/stat /tmp/proc/stat
  1. Edit sysinfo/src/linux/cpu.rs, temporarily use /tmp/proc/stat instead of /proc/stat:
pub(crate) fn refresh(&mut self, only_update_global_cpu: bool, refresh_kind: CpuRefreshKind) {
  // ...
  if need_cpu_usage_update {
      self.last_update = Some(Instant::now());
      let f = match File::open("/proc/stat") { // "/proc/stat" -> "/tmp/proc/stat"
          Ok(f) => f,
          Err(_e) => {
              sysinfo_debug!("failed to retrieve CPU information: {:?}", _e);
              return;
          }
      };
  // ...
  1. Rebuild and restart TiKV, then re-run SELECT * FROM information_schema.cluster_hardware;.

  2. Edit /tmp/proc/stat, add one CPU core:

cpu  888202 0 64366 33777873 6620 0 9253 0 0 0
cpu0 84111 0 6925 3369197 309 0 7126 0 0 0
cpu1 99737 0 8030 3364200 1975 0 1161 0 0 0
cpu2 102944 0 6564 3365370 338 0 254 0 0 0
cpu3 80441 0 6131 3387710 943 0 145 0 0 0
cpu4 79461 0 6371 3388871 531 0 94 0 0 0   <===== added line
...
  1. Re-run SELECT * FROM information_schema.cluster_hardware;.

What did you expect?

All going well.

What did happened?

TiKV crashed:

[2023/06/27 00:18:15.154 +08:00] [FATAL] [lib.rs:497] ["index out of bounds: the len is 4 but the index is 4"] [backtrace="   0: tikv_util::set_panic_hook::{{closure}}\n   1: std::panicking::rust_panic_with_hook\n   2: std::panicking::begin_panic_handler::{{closure}}\n   3: std::sys_common::backtrace::__rust_end_short_backtrace\n   4: rust_begin_unwind\n   5: core::panicking::panic_fmt\n   6: core::panicking::panic_bounds_check\n   7: sysinfo::linux::cpu::CpusWrapper::refresh\n   8: tikv::server::service::diagnostics::sys::hardware_info\n   9: tokio::runtime::task::core::CoreStage<T>::poll\n  10: tokio::runtime::task::raw::poll\n  11: tokio::runtime::scheduler::multi_thread::worker::Context::run_task\n  12: tokio::runtime::scheduler::multi_thread::worker::run\n  13: tokio::runtime::task::raw::poll\n  14: std::sys_common::backtrace::__rust_begin_short_backtrace\n  15: core::ops::function::FnOnce::call_once{{vtable.shim}}\n  16: std::sys::unix::thread::Thread::new::thread_start\n  17: <unknown>\n  18: <unknown>\n"]
@mornyx
Copy link
Contributor Author

mornyx commented Jun 26, 2023

/assign

@mornyx
Copy link
Contributor Author

mornyx commented Jun 27, 2023

/cc @kaixu120811

@aytrack
Copy link

aytrack commented Jun 30, 2023

/label type/bug
/label severity/minor

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jun 30, 2023

@aytrack: The label(s) type/bug, severity/minor cannot be applied. These labels are supported: challenge-program, compatibility-breaker, high-performance, hptc, wontfix, do-not-merge/cherry-pick-not-approved, needs-cherry-pick-release-5.2, needs-cherry-pick-release-5.3, needs-cherry-pick-release-5.4, needs-cherry-pick-release-6.0, needs-cherry-pick-release-6.1, needs-cherry-pick-release-6.2, needs-cherry-pick-release-6.3, needs-cherry-pick-release-6.4, needs-cherry-pick-release-6.5, needs-cherry-pick-release-6.6, needs-cherry-pick-release-7.0, needs-cherry-pick-release-7.1, needs-cherry-pick-release-7.2, affects-5.2, affects-5.3, affects-5.4, affects-6.0, affects-6.1, affects-6.2, affects-6.3, affects-6.4, affects-6.5, affects-6.6, affects-7.0, affects-7.1, affects-7.2, may-affects-5.2, may-affects-5.3, may-affects-5.4, may-affects-6.0, may-affects-6.1, may-affects-6.2, may-affects-6.3, may-affects-6.4, may-affects-6.5, may-affects-6.6, may-affects-7.0, may-affects-7.1, may-affects-7.2.

In response to this:

/label type/bug
/label severity/minor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@jebter jebter added type/bug The issue is confirmed as a bug. severity/minor labels Jun 30, 2023
ti-chi-bot bot added a commit that referenced this issue Jun 30, 2023
close #15006

fix index out of bounds error in sysinfo.

* update sysinfo version to the personal branch

Signed-off-by: mornyx <mornyx.z@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Jun 30, 2023
close tikv#15006

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Jun 30, 2023
close tikv#15006

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Jun 30, 2023
close tikv#15006

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Jun 30, 2023
close tikv#15006

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this issue Aug 14, 2023
close #15006

fix index out of bounds error in sysinfo.

* update sysinfo version to the personal branch

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: mornyx <mornyx.z@gmail.com>

Co-authored-by: Yexiang Zhang <mornyx.z@gmail.com>
Co-authored-by: mornyx <mornyx.z@gmail.com>
ti-chi-bot bot pushed a commit that referenced this issue Oct 13, 2023
close #15006

fix index out of bounds error in sysinfo.

* update sysinfo version to the personal branch

Co-authored-by: tonyxuqqi <tonyxuqi@outlook.com>
Co-authored-by: Yexiang Zhang <mornyx.z@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 affects-6.1 affects-6.5 affects-7.1 affects-7.2 severity/minor type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants