New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Behavior with Native Replication When a Data Node Failes and is Readded #3502
Comments
So, first, thank you for the work putting together a really helpful reproducible test case and script using Docker compose. However, upon closer examination, I've concluded that there's nothing wrong with the current behavior. I submitted a PR to your test repo to illustrate: TroyDey/timescale-multinode-test#1 The reason the second SELECT succeeds while the node is down is because the node is no longer a "primary replica node" for any chunks (given that it was removed and then added again). The "primary replica node" is the node that is responsible for serving queries. When you delete a data node, it will no longer be the primary for any chunks, so this is reassigned to another node. We do want to expose a way to reassign the designated primary replica node for a chunk, which would allow you to manually "fix" the first query. Note, however, that for INSERTs we always want to fail if a node is down (and it holds chunks) because we need to insert into all replica chunks to keep them consistent. I am therefore closing this issue. Please reopen if I missed something. |
Thank you very much for looking into this, and your explanation makes sense and was along the lines of what I was expecting. To ensure I got it right, my understanding is:
Based on the above my two options appear to be either:
Any other suggestions? I know it is a very challenging problem to solve, but curious if you have plans to do automatic fail over of the primary to another viable replica/data node? Thanks again for taking the time to look into this and provide an explanation! |
Correct.
Correct.
Correct. We've had ideas for providing a function to reassign primary status to other nodes without having to fully delete a data node (e.g., during restarts or short down times), but the detection of failed nodes I imagine would still be left to an external monitoring system. Such a system is probably better at hooking into lifecycle events of nodes, e.g., in cloud providers or on-premise systems, than any DB-internal monitoring.
Yes, those are the options so far. Note that deleting the data node also immediately restores write capabilities. In the future we'd like to support writes even though nodes are down, but that would either entail (1) syncing up chunks that have been written to during downtime, or (2) deleting the chunks written to during downtime, followed by re-replication.
I think you covered most of it.
As stated above, we believe this is better handled by an external monitoring system. We aim to provide APIs for such a monitoring system to hook into, however, for actions during events (e.g., to reassign primary status or delete a data node). But it is not entirely unlikely that we will provide an internal option for monitoring data nodes if we can figure out how to do it in a good way, but this is not a priority for us right now given that external monitoring systems already exist and are doing a good job (e.g., you get it for free in orchestration systems like Kubernetes).
My pleasure! |
Relevant system information:
Describe the bug
When using native replication has unexpected behavior when a data node fails.
To Reproduce
I have created a Github repo with a script that will launch a docker environment and run the following high level scenario.
To run the test:
clone timescale-multinode-test
I have provided more details in the README and comments of run-test.sh
Expected behavior
Actual behavior
The text was updated successfully, but these errors were encountered: