-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stanby Cluster questions #1151
Comments
Hello, |
@RafiaSabih thanks. Yes, I done the same but through changes via DCS. |
The standby cluster doesn't know anything about the primary. |
my configuration is
where 10.105.32.128 - is a ip of primary cluster. When i turn off primary cluster next errors occurs:
|
Yeah, since you mentioned the host, it is understandable to have such errors. With just the restore command there won't be such errors. Regarding your auto promotion, there is nothing like automatic promotion of standby cluster yet. Since, there is no way to know if the primary has died or the network issue is there. |
So may be its worth to add some kind of configurable timeout, after which, standby cluster can be promoted ? |
As far as I know, for the automatic promotion there is this pull request to decide if the promotion is possible using quorum commit. |
No way. Do it right or don't do it at all. In order to do it right there must be a global quorum across multiple data-centers (two is not enough). @RafiaSabih #672 has nothing to do with it. It is about making use of quorum commit feature in postgres. |
The problem is that you also should tell the former "primary" cluster to demote (otherwise, hello split-brain), and how are you going to do this while it's unavailable? So this has to be done via some fencing mechanism that is tied to the cluster participation in a consistency layer (i.e. RAFT) over multiple DCs. The current standby cluster implementation in Patroni does none of this. One reason is that running Etcd over multiple DCs with high-latency links may lead to significant performance impact and requires tuning, and other similar systems likely behave in a similar way. The other is that such a failover will force the DB clients to connect over high-latency links, which may break certain critical architectural assumptions of your applications. Yet another one is that it will not provide any additional reliability guarantees over what you can achieve with a single Patroni cluster spread over multiple DCs. Which leads to the point that If you need a failover between DCs with reasonable latencies (say. less than 50-100ms), you should simply run a single Patroni cluster spread over those DCs. Make sure you have at least 3 independent datacenters (given that you also run Etcd or a similar supported system there), otherwise you simply lose the majority once a DC goes down, rendering the whole setup pointless. |
@CyberDem0n @alexeyklyukin thanks for your answers, now everything is clear for me. |
Guys, hi !
I was playing around with standby cluster feature and got some questions about it.
What is the best way for "promoting" standby cluster ? In case if Primary DC is not available ?
What I did, remove standby cluster keys from DCS. Is it correct ?
The text was updated successfully, but these errors were encountered: