Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quorum based failover #2668

Open
wants to merge 73 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
2223553
Introduce quorum field in the /sync key
CyberDem0n May 11, 2023
ea019ba
Adapt SyncHandler interfaces for quorum commit
CyberDem0n May 11, 2023
f5f0adb
Compatibility with future synchronous_mode=quorum
CyberDem0n May 11, 2023
e97d2f0
Implement synchronous_mode=quorum
CyberDem0n May 11, 2023
d799be9
update REST API
CyberDem0n May 11, 2023
7284416
Behave tests
CyberDem0n May 11, 2023
dbfe844
Update documentation
CyberDem0n May 11, 2023
f298921
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n May 23, 2023
8f60b18
Delay _process_quorum_replication by loop_wait seconds after promote
CyberDem0n May 23, 2023
a5e1c53
Fix citus tests. Metadata sync could be slow after coordinator switch
CyberDem0n May 23, 2023
db8061a
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n May 30, 2023
2dafb37
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jul 7, 2023
b0d8b21
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jul 13, 2023
e6d251b
Limit time spent in _process_quorum_replication by loop_wait seconds
CyberDem0n Jul 18, 2023
300740c
Please codacy
CyberDem0n Jul 18, 2023
ad4bea7
Apply suggestions from code review
CyberDem0n Jul 20, 2023
666a483
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jul 20, 2023
893e460
Address review feedback
CyberDem0n Jul 20, 2023
2e9b6b2
Apply suggestions from code review
CyberDem0n Jul 21, 2023
0ef094f
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jul 21, 2023
7114a07
remove unrelated change
CyberDem0n Jul 21, 2023
7d11d9d
Merge branch 'feature/quorum-commit' of github.com:zalando/patroni in…
CyberDem0n Jul 21, 2023
768c9eb
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jul 25, 2023
5dfe5e0
Revert unwanted change
CyberDem0n Jul 25, 2023
bf7f076
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jul 25, 2023
3ee0238
Add more examples of sync and quorum modes
CyberDem0n Jul 25, 2023
aa0c321
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jul 26, 2023
bea97d1
Apply suggestions from code review
CyberDem0n Jul 26, 2023
7aca74e
Merge branch 'feature/quorum-commit' of github.com:zalando/patroni in…
CyberDem0n Jul 26, 2023
3bf7095
Address review feedback
CyberDem0n Jul 26, 2023
a48ef03
Please pyright
CyberDem0n Jul 26, 2023
e2805fd
Apply suggestions from code review
CyberDem0n Jul 31, 2023
538d621
Address review feedback
CyberDem0n Jul 31, 2023
1a0549d
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jul 31, 2023
d6e3f25
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Aug 8, 2023
74b89d7
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Aug 8, 2023
f5cb888
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Aug 14, 2023
c7fbd35
Apply suggestions from code review
CyberDem0n Aug 17, 2023
735a9ee
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Aug 17, 2023
ef8aa21
Address code review feedback
CyberDem0n Aug 17, 2023
8f3c6d2
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Aug 17, 2023
ce7fce3
Please sphinx
CyberDem0n Aug 17, 2023
f51309d
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Aug 18, 2023
8e24d72
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Aug 22, 2023
4d26435
Address review feedback
CyberDem0n Aug 23, 2023
79b4098
Apply suggestions from code review
CyberDem0n Aug 24, 2023
3a602f0
Address review feedback
CyberDem0n Aug 24, 2023
1ea5d6b
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Sep 11, 2023
5f65b56
more f-strings
CyberDem0n Sep 11, 2023
a95d59c
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Sep 11, 2023
61c3d7c
Fix citus.rst
CyberDem0n Sep 11, 2023
67612f5
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Sep 12, 2023
d1dff78
Rename methods in unit tests to match names of methods we are testing
CyberDem0n Sep 12, 2023
9fccc05
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Sep 14, 2023
73a5c9a
Apply suggestions from code review
CyberDem0n Sep 14, 2023
0f6e069
Address feedback
CyberDem0n Sep 15, 2023
8612d55
Merge branch 'feature/quorum-commit' of github.com:zalando/patroni in…
CyberDem0n Sep 15, 2023
b4c783d
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Sep 25, 2023
2f6678e
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Sep 26, 2023
5c6b34a
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Oct 10, 2023
f329891
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Oct 23, 2023
7794f9c
Apply suggestions from code review
CyberDem0n Oct 24, 2023
a9e1d67
Update sync.py
CyberDem0n Oct 24, 2023
94e128c
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Oct 25, 2023
ebdc197
Apply suggestions from code review
CyberDem0n Oct 25, 2023
13cc86f
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Nov 24, 2023
193f5f1
Merge branch 'feature/quorum-commit' of github.com:zalando/patroni in…
CyberDem0n Nov 24, 2023
91a6059
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Nov 29, 2023
3b367d6
Revert unexpected change
CyberDem0n Nov 29, 2023
59ecfb1
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jan 5, 2024
bda07fa
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Apr 2, 2024
0da448b
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Apr 3, 2024
4055d66
Merge branch 'master' of github.com:zalando/patroni into feature/quor…
CyberDem0n Jun 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/dynamic_configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In order to change the dynamic configuration you can use either ``patronictl edi
- **max\_timelines\_history**: maximum number of timeline history items kept in DCS. Default value: 0. When set to 0, it keeps the full history in DCS.
- **primary\_start\_timeout**: the amount of time a primary is allowed to recover from failures before failover is triggered (in seconds). Default is 300 seconds. When set to 0 failover is done immediately after a crash is detected if possible. When using asynchronous replication a failover can cause lost transactions. Worst case failover time for primary failure is: loop\_wait + primary\_start\_timeout + loop\_wait, unless primary\_start\_timeout is zero, in which case it's just loop\_wait. Set the value according to your durability/availability tradeoff.
- **primary\_stop\_timeout**: The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled. When set to > 0 and the synchronous_mode is enabled, Patroni sends SIGKILL to the postmaster if the stop operation is running for more than the value set by primary\_stop\_timeout. Set the value according to your durability/availability tradeoff. If the parameter is not set or set <= 0, primary\_stop\_timeout does not apply.
- **synchronous\_mode**: turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election. Synchronous mode makes sure that successfully committed transactions will not be lost at failover, at the cost of losing availability for writes when Patroni cannot ensure transaction durability. See :ref:`replication modes documentation <replication_modes>` for details.
- **synchronous\_mode**: turns on synchronous replication mode. Possible values: ``off``, ``on``, ``quorum``. In this mode the leader takes care about management of ``synchronous_standby_names`` and and only the last known leader or one of synchronous replicas are allowed to participate in leader race. Synchronous mode makes sure that successfully committed transactions will not be lost at failover, at the cost of losing availability for writes when Patroni cannot ensure transaction durability. See :ref:`replication modes documentation <replication_modes>` for details.
CyberDem0n marked this conversation as resolved.
Show resolved Hide resolved
CyberDem0n marked this conversation as resolved.
Show resolved Hide resolved
- **synchronous\_mode\_strict**: prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the primary. See :ref:`replication modes documentation <replication_modes>` for details.
- **failsafe\_mode**: Enables :ref:`DCS Failsafe Mode <dcs_failsafe_mode>`. Defaults to `false`.
- **postgresql**:
Expand Down Expand Up @@ -80,4 +80,4 @@ Note: **slots** is a hashmap while **ignore_slots** is an array. For example:
plugin: test_decoding
- name: ignored_physical_slot_name
type: physical
...
...
48 changes: 40 additions & 8 deletions docs/replication_modes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,21 @@ Replication modes

Patroni uses PostgreSQL streaming replication. For more information about streaming replication, see the `Postgres documentation <http://www.postgresql.org/docs/current/static/warm-standby.html#STREAMING-REPLICATION>`__. By default Patroni configures PostgreSQL for asynchronous replication. Choosing your replication schema is dependent on your business considerations. Investigate both async and sync replication, as well as other HA solutions, to determine which solution is best for you.


Asynchronous mode durability
----------------------------
============================

In asynchronous mode the cluster is allowed to lose some committed transactions to ensure availability. When the primary server fails or becomes unavailable for any other reason Patroni will automatically promote a sufficiently healthy standby to primary. Any transactions that have not been replicated to that standby remain in a "forked timeline" on the primary, and are effectively unrecoverable [1]_.

The amount of transactions that can be lost is controlled via ``maximum_lag_on_failover`` parameter. Because the primary transaction log position is not sampled in real time, in reality the amount of lost data on failover is worst case bounded by ``maximum_lag_on_failover`` bytes of transaction log plus the amount that is written in the last ``ttl`` seconds (``loop_wait``/2 seconds in the average case). However typical steady state replication delay is well under a second.

By default, when running leader elections, Patroni does not take into account the current timeline of replicas, what in some cases could be undesirable behavior. You can prevent the node not having the same timeline as a former primary become the new leader by changing the value of ``check_timeline`` parameter to ``true``.


PostgreSQL synchronous replication
----------------------------------
==================================

You can use Postgres's `synchronous replication <http://www.postgresql.org/docs/current/static/warm-standby.html#SYNCHRONOUS-REPLICATION>`__ with Patroni. Synchronous replication ensures consistency across a cluster by confirming that writes are written to a secondary before returning to the connecting client with a success. The cost of synchronous replication: reduced throughput on writes. This throughput will be entirely based on network performance.
You can use Postgres's `synchronous replication <http://www.postgresql.org/docs/current/static/warm-standby.html#SYNCHRONOUS-REPLICATION>`__ with Patroni. Synchronous replication ensures consistency across a cluster by confirming that writes are written to a secondary before returning to the connecting client with a success. The cost of synchronous replication: increased latency and reduced throughput on writes. This throughput will be entirely based on network performance.

In hosted datacenter environments (like AWS, Rackspace, or any network you do not control), synchronous replication significantly increases the variability of write performance. If followers become inaccessible from the leader, the leader effectively becomes read-only.

Expand All @@ -33,10 +35,11 @@ When using PostgreSQL synchronous replication, use at least three Postgres data

Using PostgreSQL synchronous replication does not guarantee zero lost transactions under all circumstances. When the primary and the secondary that is currently acting as a synchronous replica fail simultaneously a third node that might not contain all transactions will be promoted.


.. _synchronous_mode:

Synchronous mode
----------------
================

For use cases where losing committed transactions is not permissible you can turn on Patroni's ``synchronous_mode``. When ``synchronous_mode`` is turned on Patroni will not promote a standby unless it is certain that the standby contains all transactions that may have returned a successful commit status to client [2]_. This means that the system may be unavailable for writes even though some servers are available. System administrators can still use manual failover commands to promote a standby even if it results in transaction loss.

Expand All @@ -55,16 +58,27 @@ up.

You can ensure that a standby never becomes the synchronous standby by setting ``nosync`` tag to true. This is recommended to set for standbys that are behind slow network connections and would cause performance degradation when becoming a synchronous standby.

Synchronous mode can be switched on and off via Patroni REST interface. See :ref:`dynamic configuration <dynamic_configuration>` for instructions.
Synchronous mode can be switched on and off using ``patronictl edit-config`` command or via Patroni REST interface. See :ref:`dynamic configuration <dynamic_configuration>` for instructions.

Note: Because of the way synchronous replication is implemented in PostgreSQL it is still possible to lose transactions even when using ``synchronous_mode_strict``. If the PostgreSQL backend is cancelled while waiting to acknowledge replication (as a result of packet cancellation due to client timeout or backend failure) transaction changes become visible for other backends. Such changes are not yet replicated and may be lost in case of standby promotion.


Synchronous Replication Factor
------------------------------
The parameter ``synchronous_node_count`` is used by Patroni to manage number of synchronous standby databases. It is set to 1 by default. It has no effect when ``synchronous_mode`` is set to off. When enabled, Patroni manages precise number of synchronous standby databases based on parameter ``synchronous_node_count`` and adjusts the state in DCS & synchronous_standby_names as members join and leave.
==============================

The parameter ``synchronous_node_count`` is used by Patroni to manage number of synchronous standby databases. It is set to 1 by default. It has no effect when ``synchronous_mode`` is set to off. When enabled, Patroni manages precise number of synchronous standby databases based on parameter ``synchronous_node_count`` and adjusts the state in DCS & synchronous_standby_names as members join and leave. If the parameter is set to the value higher than the number of eligible nodes it will be automatically reduced by Patroni down to 1.
CyberDem0n marked this conversation as resolved.
Show resolved Hide resolved


Maximum lag on synchronous node
===============================

By default Patroni sticks to a node that is declared as ``synchronous`` according to the ``pg_stat_replication`` even when there are other nodes ahead of it. It is done to minimize the number of changes of ``synchronous_standby_names``. To change this behavior one may use ``maximum_lag_on_syncnode`` parameter. It controls how much the replica can lag in to be allowed chosen as synchronous.
CyberDem0n marked this conversation as resolved.
Show resolved Hide resolved

Patroni utilizes the max replica LSN if there is more than one standby, otherwise it will use leader's current wal LSN. Default is ``-1``, and Patroni will not take action to swap synchronous unhealthy standby when the value is set to 0 or below. Please set the value high enough so Patroni won't swap synchrounous standbys fequently during high transaction volume.
CyberDem0n marked this conversation as resolved.
Show resolved Hide resolved


Synchronous mode implementation
-------------------------------
===============================

When in synchronous mode Patroni maintains synchronization state in the DCS, containing the latest primary and current synchronous standby databases. This state is updated with strict ordering constraints to ensure the following invariants:

Expand All @@ -79,6 +93,24 @@ Patroni will only assign one or more synchronous standby nodes based on ``synchr
On each HA loop iteration Patroni re-evaluates synchronous standby nodes choice. If the current list of synchronous standby nodes are connected and has not requested its synchronous status to be removed it remains picked. Otherwise the cluster member available for sync that is furthest ahead in replication is picked.


.. _quorum_mode:

Quorum commit mode
==================

Starting from PostgreSQL v10 Patroni supports quorum-based synchronous replication.

In this mode Patroni maintains synchronization state in the DCS, containing the latest known primary, number of nodes required for quorum and nodes currently eligible to vote on quorum. In steady state the nodes voting on quorum are the leader and all synchronous standbys. This state is updated with strict ordering constraints with regards to node promotion and ``synchronous_standby_names`` to ensure that at all times any subset of voters that can achieve quorum is contained to have at least one node having the latest successful commit.
CyberDem0n marked this conversation as resolved.
Show resolved Hide resolved

On each iteration of HA loop Patroni re-evaluates synchronous standby choices and quorum based on node availability and requested cluster configuration. In PostgreSQL versions above 9.6 all eligible nodes are added as synchronous standbys as soon as their replication catches up to leader.
CyberDem0n marked this conversation as resolved.
Show resolved Hide resolved

Quorum commit helps to reduce worst case latencies even during normal operation as a higher latency of replicating to one standby can be compensated by other standbys.
CyberDem0n marked this conversation as resolved.
Show resolved Hide resolved

The quorum-based synchronous mode could be enabled by setting ``synchronous_mode`` to ``quorum`` using ``patronictl edit-config`` command or via Patroni REST interface. See :ref:`dynamic configuration <dynamic_configuration>` for instructions.

Other parameters, like ``synchronous_node_count``, ``maximum_lag_on_syncnode``, and ``synchronous_mode_strict`` continue to work the same way as with ``synchronous_mode=on``.


.. [1] The data is still there, but recovering it requires a manual recovery effort by data recovery specialists. When Patroni is allowed to rewind with ``use_pg_rewind`` the forked timeline will be automatically erased to rejoin the failed primary with the cluster.

.. [2] Clients can change the behavior per transaction using PostgreSQL's ``synchronous_commit`` setting. Transactions with ``synchronous_commit`` values of ``off`` and ``local`` may be lost on fail over, but will not be blocked by replication delays.
5 changes: 5 additions & 0 deletions docs/rest_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ For all health check ``GET`` requests Patroni returns a JSON document with the s

- ``GET /read-only-sync``: like the above endpoint, but also includes the primary.

- ``GET /quorum``: returns HTTP status code **200** only when this Patroni node is listed as a quorum node in ``synchronous_standby_names`` on the primary.

- ``GET /asynchronous`` or ``GET /async``: returns HTTP status code **200** only when the Patroni node is running as an asynchronous standby.


Expand Down Expand Up @@ -143,6 +145,9 @@ Retrieve the Patroni metrics in Prometheus format through the ``GET /metrics`` e
# HELP patroni_sync_standby Value is 1 if this node is a sync standby replica, 0 otherwise.
# TYPE patroni_sync_standby gauge
patroni_sync_standby{scope="batman"} 0
# HELP patroni_quorum_standby Value is 1 if this node is a quorum standby replica, 0 otherwise.
# TYPE patroni_quorum_standby gauge
patroni_quorum_standby{scope="batman"} 0
# HELP patroni_xlog_received_location Current location of the received Postgres transaction log, 0 if this node is not a replica.
# TYPE patroni_xlog_received_location counter
patroni_xlog_received_location{scope="batman"} 0
Expand Down
2 changes: 1 addition & 1 deletion features/citus.feature
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Feature: citus
And replication works from postgres1 to postgres0 after 15 seconds
And postgres1 is registered in the postgres2 as the primary in group 0 after 5 seconds
And "sync" key in a group 0 in DCS has sync_standby=postgres0 after 15 seconds
When I run patronictl.py switchover batman --group 0 --candidate postgres0 --force
When I run patronictl.py failover batman --group 0 --candidate postgres0 --force
Then postgres0 role is the primary after 10 seconds
And replication works from postgres0 to postgres1 after 15 seconds
And postgres0 is registered in the postgres2 as the primary in group 0 after 5 seconds
Expand Down
68 changes: 68 additions & 0 deletions features/quorum_commit.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
Feature: quorum commit
Check basic workfrlows when quorum commit is enabled

Scenario: check enable quorum commit and that the only leader promotes after restart
Given I start postgres0
Then postgres0 is a leader after 10 seconds
And there is a non empty initialize key in DCS after 15 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"ttl": 20, "synchronous_mode": "quorum"}
Then I receive a response code 200
And sync key in DCS has leader=postgres0 after 20 seconds
And sync key in DCS has quorum=0 after 2 seconds
And synchronous_standby_names on postgres0 is set to "_empty_str_" after 2 seconds
When I shut down postgres0
And sync key in DCS has leader=postgres0 after 2 seconds
When I start postgres0
Then postgres0 role is the primary after 10 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"synchronous_mode_strict": true}
Then synchronous_standby_names on postgres0 is set to "ANY 1 (*)" after 10 seconds

Scenario: check failover with one quorum standby
Given I start postgres1
Then sync key in DCS has sync_standby=postgres1 after 10 seconds
And synchronous_standby_names on postgres0 is set to "ANY 1 (postgres1)" after 2 seconds
When I shut down postgres0
Then postgres1 role is the primary after 10 seconds
And sync key in DCS has quorum=0 after 10 seconds
Then synchronous_standby_names on postgres1 is set to "ANY 1 (*)" after 10 seconds
When I start postgres0
Then sync key in DCS has leader=postgres1 after 10 seconds
Then sync key in DCS has sync_standby=postgres0 after 10 seconds
And synchronous_standby_names on postgres1 is set to "ANY 1 (postgres0)" after 2 seconds

Scenario: check behavior with three nodes and different replication factor
Given I start postgres2
Then sync key in DCS has sync_standby=postgres0,postgres2 after 10 seconds
And sync key in DCS has quorum=1 after 2 seconds
And synchronous_standby_names on postgres1 is set to "ANY 1 (postgres0,postgres2)" after 2 seconds
When I issue a PATCH request to http://127.0.0.1:8009/config with {"synchronous_node_count": 2}
Then sync key in DCS has quorum=0 after 10 seconds
And synchronous_standby_names on postgres1 is set to "ANY 2 (postgres0,postgres2)" after 2 seconds

Scenario: switch from quorum replication to good old multisync and back
Given I issue a PATCH request to http://127.0.0.1:8009/config with {"synchronous_mode": true, "synchronous_node_count": 1}
And I shut down postgres0
Then synchronous_standby_names on postgres1 is set to "postgres2" after 10 seconds
And sync key in DCS has sync_standby=postgres2 after 10 seconds
Then sync key in DCS has quorum=0 after 2 seconds
When I issue a PATCH request to http://127.0.0.1:8009/config with {"synchronous_mode": "quorum"}
And I start postgres0
Then synchronous_standby_names on postgres1 is set to "ANY 1 (postgres0,postgres2)" after 10 seconds
And sync key in DCS has sync_standby=postgres0,postgres2 after 10 seconds
Then sync key in DCS has quorum=1 after 2 seconds

Scenario: REST API and patronictl
Given I run patronictl.py list batman
Then I receive a response returncode 0
And I receive a response output "Quorum Standby"
And Status code on GET http://127.0.0.1:8008/quorum is 200 after 3 seconds
And Status code on GET http://127.0.0.1:8010/quorum is 200 after 3 seconds

Scenario: nosync node is removed from voters and synchronous_standby_names
Given I add tag nosync true to postgres2 config
When I issue an empty POST request to http://127.0.0.1:8010/reload
Then I receive a response code 202
And sync key in DCS has quorum=0 after 10 seconds
And sync key in DCS has sync_standby=postgres0 after 10 seconds
And synchronous_standby_names on postgres1 is set to "ANY 1 (postgres0)" after 2 seconds
And Status code on GET http://127.0.0.1:8010/quorum is 503 after 10 seconds
3 changes: 2 additions & 1 deletion features/steps/citus.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def check_registration(context, name1, name2, role, group, time_limit):
except Exception:
pass
time.sleep(1)
assert False, "Node {0} is not registered in pg_dist_node on the node {1}".format(name1, name2)
assert False, "Worker {0} is not registered in pg_dist_node on the coordinator {1} after {2} seconds"\
.format(name1, name2, time_limit)


@step('I create a distributed table on {name:w}')
Expand Down