Skip to content

Commit

Permalink
Demote immediately if failed to update leader lock
Browse files Browse the repository at this point in the history
If the Etcd node partitioned from rest of the cluster it is still
possible to read from it (though it returns some stale information),
but it is not possible to write into it.
Previously Patroni was trying to fetch the new cluster view from DCS in
order to figure out is it still the leader or not and Etcd is always
returning stale info where the node still owns the leader key, but with
negative TTL.
This weird bug clearly shows how dangerous premature optimization is.
  • Loading branch information
Alexander Kukushkin committed Sep 20, 2016
1 parent 453e686 commit e538e32
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
3 changes: 2 additions & 1 deletion patroni/ha.py
Expand Up @@ -434,7 +434,8 @@ def process_healthy_cluster(self):
else:
# Either there is no connection to DCS or someone else acquired the lock
logger.error('failed to update leader lock')
self.load_cluster_from_dcs()
self.demote(delete_leader=False)
return 'demoted self because failed to update leader lock in DCS'
else:
logger.info('does not have lock')
return self.follow('demoting self because i do not have the lock and i was a leader',
Expand Down
2 changes: 1 addition & 1 deletion tests/test_ha.py
Expand Up @@ -231,7 +231,7 @@ def test_demote_because_update_lock_failed(self):
self.ha.cluster.is_unlocked = false
self.ha.has_lock = true
self.ha.update_lock = false
self.assertEquals(self.ha.run_cycle(), 'demoting self because i do not have the lock and i was a leader')
self.assertEquals(self.ha.run_cycle(), 'demoted self because failed to update leader lock in DCS')

def test_follow(self):
self.ha.cluster.is_unlocked = false
Expand Down

0 comments on commit e538e32

Please sign in to comment.