Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Is your feature request related to a problem? Please describe:
In some cases, tikv will panic at startup (such as data corruption), and systemd of the ansible deployment will continue to restart it, which will output a large number of error logs.
Describe the feature you'd like:
Currently, TiKV supports creating a mark file to prevent itself from starting again when a critical error happens (see #3725). I think we can take advantage of this feature. When tikv-server is panic many times in a short time, or if it is panic for the same reason, write the mark file.
Describe alternatives you've considered:
There may be other approaches by configuring ansible or systemd, but I'm not sure.
Teachability, Documentation, Adoption, Migration Strategy:
Typically the init system which is causing rapid restarts is responsible for doing this, no? In systemd's case there are a number of options to control restart behavior: https://www.freedesktop.org/software/systemd/man/systemd.service.html
I think this is not our responsibility.
@Hoverbear Absolutely! But the introduction of the mark file makes it seem to want to solve this kind of problem inside TiKV. Maybe we need to provide a good description of the exit code, then adjust tidb-ansible to better support the restart strategy.