Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent tikv-server keeps starting and panic #5564

Open
disksing opened this issue Sep 29, 2019 · 3 comments

Comments

@disksing
Copy link
Collaborator

@disksing disksing commented Sep 29, 2019

Feature Request

Is your feature request related to a problem? Please describe:

In some cases, tikv will panic at startup (such as data corruption), and systemd of the ansible deployment will continue to restart it, which will output a large number of error logs.

Describe the feature you'd like:

Currently, TiKV supports creating a mark file to prevent itself from starting again when a critical error happens (see #3725). I think we can take advantage of this feature. When tikv-server is panic many times in a short time, or if it is panic for the same reason, write the mark file.

Describe alternatives you've considered:

There may be other approaches by configuring ansible or systemd, but I'm not sure.

Teachability, Documentation, Adoption, Migration Strategy:

@Hoverbear

This comment has been minimized.

Copy link
Member

@Hoverbear Hoverbear commented Sep 30, 2019

Typically the init system which is causing rapid restarts is responsible for doing this, no? In systemd's case there are a number of options to control restart behavior: https://www.freedesktop.org/software/systemd/man/systemd.service.html

We could use some like this to control behavior by exit code:
image

I think this is not our responsibility.

@disksing

This comment has been minimized.

Copy link
Collaborator Author

@disksing disksing commented Oct 8, 2019

@Hoverbear Absolutely! But the introduction of the mark file makes it seem to want to solve this kind of problem inside TiKV. Maybe we need to provide a good description of the exit code, then adjust tidb-ansible to better support the restart strategy.

@Hoverbear

This comment has been minimized.

Copy link
Member

@Hoverbear Hoverbear commented Oct 8, 2019

I think that would be an excellent way to solve this problem. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.