Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ex-master cannot return come back after net-split #99

Closed
katoquro opened this issue Nov 17, 2015 · 10 comments
Closed

Ex-master cannot return come back after net-split #99

katoquro opened this issue Nov 17, 2015 · 10 comments

Comments

@katoquro
Copy link

Hello, I am testing different failover cases with patroni and I have some issues with subject.

ex-master have different xlog_location and cannot achieve new-master
Additional info:
from zookeper:
ex-master:

{
    "conn_url": "postgres://replicator:replicator@172.17.0.94:5432/postgres",
    "api_url": "http://172.17.0.94:8019/patroni",
    "tags": {},
    "state": "running",
    "role": "replica",
    "xlog_location": 167772304
}

new-master:

{
    "conn_url": "postgres://replicator:replicator@172.17.0.98:5432/postgres",
    "api_url": "http://172.17.0.98:8059/patroni",
    "tags": {},
    "state": "running",
    "role": "master",
    "xlog_location": 318767632
}

And there is no record for ex-master as replica of new-master
select * from pg_stat_replication; (on new-master)

6465    16384   replicator  walreceiver 172.17.0.102        47561   2015-11-17 18:34:09     streaming   0/13000060  0/13000060  0/13000060  0/13000060  1   sync
2368    16384   replicator  walreceiver 172.17.0.97     36181   2015-11-17 18:24:23     streaming   0/13000060  0/13000060  0/13000060  0/13000060  1   potential
2374    16384   replicator  walreceiver 172.17.0.95     57300   2015-11-17 18:24:23     streaming   0/13000060  0/13000060  0/13000060  0/13000060  1   potential
2373    16384   replicator  walreceiver 172.17.0.100        53910   2015-11-17 18:24:23     streaming   0/13000060  0/13000060  0/13000060  0/13000060  1   potential

and in the end some logs from ex-master after net split

2015-11-17 15:30:27,512 INFO: Zookeeper session lost, state: EXPIRED_SESSION
2015-11-17 15:30:28,227 INFO: Connecting to 172.17.0.93:2181
2015-11-17 15:30:28,230 INFO: Zookeeper connection established, state: CONNECTED
172.17.0.101 - - [17/Nov/2015 15:30:28] "OPTIONS / HTTP/1.0" 503 -
2015-11-17 15:30:28,916 INFO: Lock owner: node5; I am node1
2015-11-17 15:30:28,916 INFO: does not have lock
2015-11-17 15:30:28,926 INFO: established a new patroni connection to the postgres cluster
2015-11-17 15:30:28,928 INFO: closed patroni connection to the postgresql cluster
2015-11-17 15:30:28,996 INFO: no action.  i am a secondary and i am following a leader
2015-11-17 15:30:29,014 INFO: Lock owner: node5; I am node1
2015-11-17 15:30:29,014 INFO: changing primary_conninfo and restarting in progress
LOG:  received fast shutdown request
waiting for server to shut down...LOG:  aborting any active transactions
.FATAL:  terminating connection due to administrator command
LOG:  shutting down
LOG:  database system is shut down
 done
server stopped
waiting for server to start....LOG:  database system was shut down in recovery at 2015-11-17 15:30:29 UTC
LOG:  entering standby mode
LOG:  consistent recovery state reached at 0/A000090
LOG:  record with zero length at 0/A000090
LOG:  database system is ready to accept read only connections
LOG:  fetching timeline history file for timeline 2 from primary server
LOG:  fetching timeline history file for timeline 3 from primary server
FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/A000000 on timeline 1 is not in this server's history
    DETAIL:  This server's history forked from timeline 1 at 0/9CC0688.

LOG:  new timeline 3 forked off current database system timeline 1 before current recovery point 0/A000090
2015-11-17 15:30:30,909 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/patroni/api.py", line 231, in get_postgresql_status
    pg_is_in_recovery() AND pg_is_xlog_replay_paused()""", retry=retry)[0]
  File "build/bdist.linux-x86_64/egg/patroni/api.py", line 217, in query
    return self.server.query(sql, *params)
  File "build/bdist.linux-x86_64/egg/patroni/api.py", line 287, in query
    raise PostgresConnectionException('connection problems')
PostgresConnectionException: 'connection problems'
172.17.0.101 - - [17/Nov/2015 15:30:30] "OPTIONS / HTTP/1.0" 503 -
 done
server started
172.17.0.101 - - [17/Nov/2015 15:30:32] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:34] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:36] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:38] "OPTIONS / HTTP/1.0" 503 -
2015-11-17 15:30:38,997 INFO: established a new patroni connection to the postgres cluster
2015-11-17 15:30:39,009 INFO: Lock owner: node5; I am node1
2015-11-17 15:30:39,009 INFO: does not have lock
2015-11-17 15:30:39,013 INFO: no action.  i am a secondary and i am following a leader
172.17.0.101 - - [17/Nov/2015 15:30:40] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:42] "OPTIONS / HTTP/1.0" 503 -
172.17.0.101 - - [17/Nov/2015 15:30:44] "OPTIONS / HTTP/1.0" 503 -
LOG:  new timeline 3 forked off current database system timeline 1 before current recovery point 0/A000090
FATAL:  could not start WAL streaming: ERROR:  requested starting point 0/A000000 on timeline 1 is not in this server's history
    DETAIL:  This server's history forked from timeline 1 at 0/9CC0688.
@alexeyklyukin
Copy link
Contributor

It looks like the old master has advanced its WAL position past the promotion point of a new master. I think it might happen during the network outage, when a new data has been written into the master's WAL, but has not been propagated to replicas (this is also possible in synchronous mode). You can configure Patroni to call pg_rewind in order to bring the former master up-to-date.

@katoquro
Copy link
Author

Yep, you are right. But this broken node can became a master if there are no competitors

2015-11-18 12:05:50,027 INFO: Lock owner: node5; I am node5
2015-11-18 12:05:50,028 INFO: no action.  i am the leader with the lock
172.17.0.101 - - [18/Nov/2015 12:05:52] "OPTIONS / HTTP/1.0" 200 -

@alexeyklyukin
Copy link
Contributor

Well, if all other nodes in the cluster have died, then promoting a single leftover node to a master is a sane thing to do, isn't it?

@katoquro
Copy link
Author

I'm not sure because this node can be outdated and can contain inconsistent data
Such cases can corrupt logic on the clients so it will better to shutdown such nodes.

@alexeyklyukin
Copy link
Contributor

It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.

@drnic
Copy link
Contributor

drnic commented Nov 18, 2015

I'd like it to the be the task of Patroni to automate and make these decisions

On Wed, Nov 18, 2015 at 6:05 AM, Oleksii Kliukin notifications@github.com
wrote:

It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.

Reply to this email directly or view it on GitHub:
#99 (comment)

@drnic
Copy link
Contributor

drnic commented Nov 18, 2015

Is it entirely out of scope for Patroni cells to self administer this? The only orchestration can be external?

On Wed, Nov 18, 2015 at 5:20 AM, katoquro notifications@github.com
wrote:

I'm not sure because this node can be outdated and can contain inconsistent data

Such cases can corrupt logic on the clients so it will better to shutdown such nodes.

Reply to this email directly or view it on GitHub:
#99 (comment)

@alexeyklyukin
Copy link
Contributor

You can use pg_rewind and avoid this problem altogether.

@alexeyklyukin
Copy link
Contributor

There are other cases of a node that is unable to join the cluster (for instance, if replication username/password is incorrect). It's not possible/does not make much sense to detect every issue like this by Patroni - it should be a task of the monitoring system to realize that some replicas are potentially unhealthy and then a human interaction to fix it.

@drnic
Copy link
Contributor

drnic commented Nov 19, 2015

@alexeyklyukin thanks for pointing me to pg_rewind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants