Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in sync mode select only syncStandby as switchover candidate #2278

Merged
merged 4 commits into from Apr 6, 2023

Conversation

FxKu
Copy link
Member

@FxKu FxKu commented Mar 31, 2023

fixes #2276

On a two node clusters when performing a rolling update, it turned out that the recreated replica is chosen as switchover target before Patroni assigns it the synchronous status. Therefore, the switchover will fail with the following error:

2023-03-30 16:05:24,606 INFO: Updating synchronous privilege temporarily from ['test-sync-mode-0'] to []
2023-03-30 16:05:24,653 INFO: Assigning synchronous standby status to []
server signaled
2023-03-30 16:05:24,828 INFO: no action. I am (test-sync-mode-1), the leader with the lock
2023-03-30 16:05:34,670 INFO: no action. I am (test-sync-mode-1), the leader with the lock
2023-03-30 16:05:42.106 UTC [36] LOG {ticks: 0, maint: 0, retry: 0}
2023-03-30 16:05:44,682 INFO: no action. I am (test-sync-mode-1), the leader with the lock
2023-03-30 16:05:50,665 INFO: received failover request with leader=test-sync-mode-1 candidate=test-sync-mode-0 scheduled_at=None
2023-03-30 16:05:50,679 INFO: Got response from test-sync-mode-0 http://10.2.24.81:8008/patroni: {"state": "running", "postmaster_start_time": "2023-03-30 16:05:49.540527+00:00", "role": "replica", "server_version": 140007, "xlog": {"received_location": 134368768, "replayed_location": 134368768, "replayed_timestamp": "2023-03-30 16:05:24.756947+00:00", "paused": false}, "timeline": 2, "dcs_last_seen": 1680192350, "database_system_identifier": "7216365118537531469", "patroni": {"version": "3.0.1", "scope": "test-sync-mode"}}
2023-03-30 16:05:50,730 INFO: Lock owner: test-sync-mode-1; I am test-sync-mode-1
2023-03-30 16:05:50,778 WARNING: Failover candidate=test-sync-mode-0 does not match with sync_standbys=None
2023-03-30 16:05:50,778 WARNING: manual failover: members list is empty
2023-03-30 16:05:50,778 WARNING: manual failover: no healthy members found, failover is not possible
2023-03-30 16:05:50,778 INFO: Cleaning up failover key
2023-03-30 16:05:50,829 INFO: Assigning synchronous standby status to ['test-sync-mode-0']
server signaled

It would only work on the next sync. Therefore, this PR suggest to include a check if a SyncStandby candidate was found in the retry loop that calls the members endpoint. If synchronous mode is enabled the members call should be repeated when there's no syncStandby candidate.

@FxKu FxKu added this to the 1.9.1 milestone Mar 31, 2023
@FxKu FxKu changed the title in sync mode select only syncStandby as swicthover candidate in sync mode select only syncStandby as switchover candidate Mar 31, 2023
@idanovinda
Copy link
Member

👍

1 similar comment
@FxKu
Copy link
Member Author

FxKu commented Apr 6, 2023

👍

@FxKu FxKu merged commit 1105228 into master Apr 6, 2023
9 checks passed
@FxKu FxKu deleted the fix-sync-mode-failover branch April 6, 2023 10:04
FxKu added a commit that referenced this pull request Apr 6, 2023
* in sync mode select only syncStandby as swicthover candidate
* do not exit retry with err
* unit test: use error from reading byte stream twice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failover fails with synchronous_mode and 2 instances
2 participants