Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: task manager panic #919

Closed
lxl66566 opened this issue Jul 26, 2024 · 4 comments · May be fixed by #925
Closed

[Bug]: task manager panic #919

lxl66566 opened this issue Jul 26, 2024 · 4 comments · May be fixed by #925
Assignees
Labels
bug Something isn't working kind/flake Categorizes issue or PR as related to a flaky test. Stale

Comments

@lxl66566
Copy link
Collaborator

After fixed #907 in #918, the repeat test on shutdown_rpc_should_shutdown_the_cluster() will continue to fail on task manager panic. This panic is not introduced by #918, because before #918, there's already a panic case.

This bug cannot be reproduced stably. The average panic possibility maybe 1/525 (tested failure round: 231, 134, 1211).

Log

  1. failed3.log is the panic log before fix: shutdown cluster test timeout #918 change.
  2. panic8.log, panic9.log, panic10.log is the three panic record tested on fix: shutdown cluster test timeout #918 (commit 1607307)
Copy link

👋 Thanks for opening this issue!

Reply with the following command on its own line to get help or engage:

  • /contributing-agreement : to print Contributing Agreements.
  • /assignme : to assign this issue to you.

@lxl66566
Copy link
Collaborator Author

lxl66566 commented Aug 8, 2024

When cluster is running, all tasks should exist, so we can use unwrap in getting tasks from task manager. But when cluster shutdown, the tasks will be removed from task manager top-to-down.
The remove behavior is performed by cmd_worker worker_as(), which runs on another parallel thread, so the removal timing is unspecified. So getting tasks may get an None, indicates the task has been removed by worker_as.

@liangyuanpeng liangyuanpeng added the kind/flake Categorizes issue or PR as related to a flaky test. label Aug 29, 2024
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 29, 2024
Copy link

This issue was closed because it has been stalled for 14 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kind/flake Categorizes issue or PR as related to a flaky test. Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants