-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ex-master cannot return come back after net-split #99
Comments
It looks like the old master has advanced its WAL position past the promotion point of a new master. I think it might happen during the network outage, when a new data has been written into the master's WAL, but has not been propagated to replicas (this is also possible in synchronous mode). You can configure Patroni to call pg_rewind in order to bring the former master up-to-date. |
Yep, you are right. But this broken node can became a master if there are no competitors
|
Well, if all other nodes in the cluster have died, then promoting a single leftover node to a master is a sane thing to do, isn't it? |
I'm not sure because this node can be outdated and can contain inconsistent data |
It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down. |
I'd like it to the be the task of Patroni to automate and make these decisions On Wed, Nov 18, 2015 at 6:05 AM, Oleksii Kliukin notifications@github.com
|
Is it entirely out of scope for Patroni cells to self administer this? The only orchestration can be external? On Wed, Nov 18, 2015 at 5:20 AM, katoquro notifications@github.com
|
You can use pg_rewind and avoid this problem altogether. |
There are other cases of a node that is unable to join the cluster (for instance, if replication username/password is incorrect). It's not possible/does not make much sense to detect every issue like this by Patroni - it should be a task of the monitoring system to realize that some replicas are potentially unhealthy and then a human interaction to fix it. |
@alexeyklyukin thanks for pointing me to pg_rewind |
Hello, I am testing different failover cases with patroni and I have some issues with subject.
ex-master have different
xlog_location
and cannot achieve new-masterAdditional info:
from zookeper:
ex-master:
new-master:
And there is no record for ex-master as replica of new-master
select * from pg_stat_replication; (on new-master)
and in the end some logs from ex-master after net split
The text was updated successfully, but these errors were encountered: