Ex-master cannot return come back after net-split #99

katoquro · 2015-11-17T15:51:13Z

Hello, I am testing different failover cases with patroni and I have some issues with subject.

ex-master have different xlog_location and cannot achieve new-master
Additional info:
from zookeper:
ex-master:

{
    "conn_url": "postgres://replicator:replicator@172.17.0.94:5432/postgres",
    "api_url": "http://172.17.0.94:8019/patroni",
    "tags": {},
    "state": "running",
    "role": "replica",
    "xlog_location": 167772304
}

new-master:

{
    "conn_url": "postgres://replicator:replicator@172.17.0.98:5432/postgres",
    "api_url": "http://172.17.0.98:8059/patroni",
    "tags": {},
    "state": "running",
    "role": "master",
    "xlog_location": 318767632
}

And there is no record for ex-master as replica of new-master
select * from pg_stat_replication; (on new-master)

6465    16384   replicator  walreceiver 172.17.0.102        47561   2015-11-17 18:34:09     streaming   0/13000060  0/13000060  0/13000060  0/13000060  1   sync
2368    16384   replicator  walreceiver 172.17.0.97     36181   2015-11-17 18:24:23     streaming   0/13000060  0/13000060  0/13000060  0/13000060  1   potential
2374    16384   replicator  walreceiver 172.17.0.95     57300   2015-11-17 18:24:23     streaming   0/13000060  0/13000060  0/13000060  0/13000060  1   potential
2373    16384   replicator  walreceiver 172.17.0.100        53910   2015-11-17 18:24:23     streaming   0/13000060  0/13000060  0/13000060  0/13000060  1   potential

and in the end some logs from ex-master after net split

2015-11-17 15:30:27,512 INFO: Zookeeper session lost, state: EXPIRED_SESSION
2015-11-17 15:30:28,227 INFO: Connecting to 172.17.0.93:2181
2015-11-17 15:30:28,230 INFO: Zookeeper connection established, state: CONNECTED
172.17.0.101 - - [17/Nov/2015 15:30:28] "OPTIONS / HTTP/1.0" 503 -
2015-11-17 15:30:28,916 INFO: Lock owner: node5; I am node1
2015-11-17 15:30:28,916 INFO: does not have lock
2015-11-17 15:30:28,926 INFO: established a new patroni connection to the postgres cluster
2015-11-17 15:30:28,928 INFO: closed patroni connection to the postgresql cluster
2015-11-17 15:30:28,996 INFO: no action.  i am a secondary and i am following a leader
2015-11-17 15:30:29,014 INFO: Lock owner: node5; I am node1
2015-11-17 15:30:29,014 INFO: changing primary_conninfo and restarting in progress
LOG:  received fast shutdown request
waiting for server to shut down...LOG:  aborting any active transactions
.FATAL:  terminating connection due to administrator command
LOG:  shutting down
LOG:  database system is shut down
 done
server stopped
waiting for server to start....LOG:  database system was shut down in recovery at 2015-11-17 15:30:29 UTC
LOG:  entering standby mode
LOG:  consistent recovery state reached at 0/A000090
LOG:  record with zero length at 0/A000090
LOG:  database system is ready to accept read only connections
LOG:  fetching timeline history file for timeline 2 from primary server
LOG:  fetching timeline history file for timeline 3 from primary server
FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/A000000 on timeline 1 is not in this server's history
    DETAIL:  This server's history forked from timeline 1 at 0/9CC0688.

LOG:  new timeline 3 forked off current database system timeline 1 before current recovery point 0/A000090
2015-11-17 15:30:30,909 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/patroni/api.py", line 231, in get_postgresql_status
    pg_is_in_recovery() AND pg_is_xlog_replay_paused()""", retry=retry)[0]
  File "build/bdist.linux-x86_64/egg/patroni/api.py", line 217, in query
    return self.server.query(sql, *params)
  File "build/bdist.linux-x86_64/egg/patroni/api.py", line 287, in query
    raise PostgresConnectionException('connection problems')
PostgresConnectionException: 'connection problems'
172.17.0.101 - - [17/Nov/2015 15:30:30] "OPTIONS / HTTP/1.0" 503 -
 done
server started
172.17.0.101 - - [17/Nov/2015 15:30:32] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:34] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:36] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:38] "OPTIONS / HTTP/1.0" 503 -
2015-11-17 15:30:38,997 INFO: established a new patroni connection to the postgres cluster
2015-11-17 15:30:39,009 INFO: Lock owner: node5; I am node1
2015-11-17 15:30:39,009 INFO: does not have lock
2015-11-17 15:30:39,013 INFO: no action.  i am a secondary and i am following a leader
172.17.0.101 - - [17/Nov/2015 15:30:40] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:42] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:44] "OPTIONS / HTTP/1.0" 503 -
LOG:  new timeline 3 forked off current database system timeline 1 before current recovery point 0/A000090
FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/A000000 on timeline 1 is not in this server's history
    DETAIL:  This server's history forked from timeline 1 at 0/9CC0688.

The text was updated successfully, but these errors were encountered:

alexeyklyukin · 2015-11-18T09:32:01Z

It looks like the old master has advanced its WAL position past the promotion point of a new master. I think it might happen during the network outage, when a new data has been written into the master's WAL, but has not been propagated to replicas (this is also possible in synchronous mode). You can configure Patroni to call pg_rewind in order to bring the former master up-to-date.

katoquro · 2015-11-18T12:06:46Z

Yep, you are right. But this broken node can became a master if there are no competitors

2015-11-18 12:05:50,027 INFO: Lock owner: node5; I am node5
2015-11-18 12:05:50,028 INFO: no action.  i am the leader with the lock
172.17.0.101 - - [18/Nov/2015 12:05:52] "OPTIONS / HTTP/1.0" 200 -

alexeyklyukin · 2015-11-18T13:03:33Z

Well, if all other nodes in the cluster have died, then promoting a single leftover node to a master is a sane thing to do, isn't it?

katoquro · 2015-11-18T13:20:40Z

I'm not sure because this node can be outdated and can contain inconsistent data
Such cases can corrupt logic on the clients so it will better to shutdown such nodes.

alexeyklyukin · 2015-11-18T14:05:19Z

It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.

drnic · 2015-11-18T15:23:51Z

I'd like it to the be the task of Patroni to automate and make these decisions

On Wed, Nov 18, 2015 at 6:05 AM, Oleksii Kliukin notifications@github.com
wrote:

It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.

Reply to this email directly or view it on GitHub:
#99 (comment)

drnic · 2015-11-18T15:26:48Z

Is it entirely out of scope for Patroni cells to self administer this? The only orchestration can be external?

On Wed, Nov 18, 2015 at 5:20 AM, katoquro notifications@github.com
wrote:

I'm not sure because this node can be outdated and can contain inconsistent data

Such cases can corrupt logic on the clients so it will better to shutdown such nodes.

Reply to this email directly or view it on GitHub:
#99 (comment)

alexeyklyukin · 2015-11-18T15:59:41Z

You can use pg_rewind and avoid this problem altogether.

alexeyklyukin · 2015-11-18T16:04:52Z

There are other cases of a node that is unable to join the cluster (for instance, if replication username/password is incorrect). It's not possible/does not make much sense to detect every issue like this by Patroni - it should be a task of the monitoring system to realize that some replicas are potentially unhealthy and then a human interaction to fix it.

drnic · 2015-11-19T17:28:45Z

@alexeyklyukin thanks for pointing me to pg_rewind

alexeyklyukin closed this as completed Nov 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ex-master cannot return come back after net-split #99

Ex-master cannot return come back after net-split #99

katoquro commented Nov 17, 2015

alexeyklyukin commented Nov 18, 2015

katoquro commented Nov 18, 2015

alexeyklyukin commented Nov 18, 2015

katoquro commented Nov 18, 2015

alexeyklyukin commented Nov 18, 2015

drnic commented Nov 18, 2015

It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.

drnic commented Nov 18, 2015

Such cases can corrupt logic on the clients so it will better to shutdown such nodes.

alexeyklyukin commented Nov 18, 2015

alexeyklyukin commented Nov 18, 2015

drnic commented Nov 19, 2015

Ex-master cannot return come back after net-split #99

Ex-master cannot return come back after net-split #99

Comments

katoquro commented Nov 17, 2015

alexeyklyukin commented Nov 18, 2015

katoquro commented Nov 18, 2015

alexeyklyukin commented Nov 18, 2015

katoquro commented Nov 18, 2015

alexeyklyukin commented Nov 18, 2015

drnic commented Nov 18, 2015

It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.

drnic commented Nov 18, 2015

Such cases can corrupt logic on the clients so it will better to shutdown such nodes.

alexeyklyukin commented Nov 18, 2015

alexeyklyukin commented Nov 18, 2015

drnic commented Nov 19, 2015