Maximum lag for replica access (REST API enhancement) #1249

vitabaks · 2019-10-24T08:32:26Z

Dear colleagues!

Today we have one wonderful parameter such as:
maximum_lag_on_failover: the maximum bytes a follower may lag to be able to participate in leader election.

I ask you to implement the parameter of the maximum lag of the replica from the master, which will allow more detailed control of read access to the replicas in the cluster.

Example (something like this):
maximum_lag_on_replica: the maximum bytes (default 1048576) of lag that on replica can be in order to allow access to the databases in this replica.

maximum_lag_on_replica_delay: this is the time in milliseconds (default 100 ms) during which the Patroni REST API will continue to returning a response code "200". To ignore momentary lag surges, if appropriate.

The logic is as follows:
If the value of "maximum_lag_on_replica" and "maximum_lag_on_replica_delay" exceeds the specified threshold, the Patroni REST API immediately stops returning a response code "200" for /replica and /async endpoints.

I use HAProxy to perform Patroni REST API checks and to provide read-only access for applications.
An example of schema (TypeA) and configurations that I use:
https://github.com/vitabaks/postgresql_cluster (if links are not allowed you can delete it)

Thanks!

The text was updated successfully, but these errors were encountered:

CyberDem0n · 2019-10-24T08:54:11Z

maximum_lag_on_replica

Patroni on the master publishes its WAL position to the DCS (/optime/leader) once per loop_wait, it means that the value could be quite outdated. In most cases you will see that replica is actually ahead of the master. Asking it from the master for every check looks like a huge overkill.

Implementing it should not be hard, but you'll have to understand that it will not work so precise as you expect from it.

maximum_lag_on_replica_delay

The only way to get the time delay is to execute now()-pg_catalog.pg_last_xact_replay_timestamp(), but it doesn't work very reliably. Basically replay_timestamp will not grow if there is no activity on the master.

In any case feel free to implement it and open a PR.

vitabaks · 2019-10-24T10:05:42Z

Implementing it should not be hard, but you'll have to understand that it will not work so precise as you expect from it.

Yes you are right! But it seems to me that this is a very necessary function. Although not easy to implement as we would like.
This leads to thought, but what if we reduce loop_wait to 1-5 second?

Basically replay_timestamp will not grow if there is no activity on the master.

Yes. I check replication lag in seconds with the following query (on replica servers):

select case pg_is_in_recovery()
  when 't'
    then (select case
            when pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
              then 0
              else extract(epoch from now() - pg_last_xact_replay_timestamp())
            end)
    else '0'
  end as lag_delay;

What I understood from our conversation:

maximum_lag_on_replica - not so difficult to implement if you read the WAL position to the DCS (/ optime / leader), although it depends heavily on loop_wait.
maximum_lag_on_replica_delay - is already a little more difficult to implement, as it requires Patroni to send additional checks to PostgreSQL. Although at first glance it should not be too much load on Patroni and the database.
You don’t have time for this :)
Well, well, I hope someone can take on this task. Or give more recommendations ...

ants · 2019-10-24T12:18:59Z

Trying to measure delay in Patroni seems like it will not provide any useful guarantees and at best results in a system that mostly works, but fails under any kind of adverse conditions.

Perhaps you should take a look at helping to push this patch along: https://commitfest.postgresql.org/23/1589/

mszpulak · 2020-01-21T14:13:51Z

I had today situation when one of replica went down for a longer period of time. When it was back online, master already recycled necesary WAL (FATAL: could not receive data from WAL stream: ERROR: requested WAL segment has already been removed) , but patroni reported this replica as Running with Lag 5GB. Why ? In this case checking of replica lag seems to be mandatory. Or maybe better checking of postgres state.

mszpulak · 2020-01-21T14:59:53Z

whats about: select client_addr, write_lag, flush_lag, replay_lag from pg_stat_replication;

vitabaks · 2020-09-03T09:19:02Z

Great news!

Patroni version 2.0.0 adds Enhanced GET / replica and GET / async REST API health-checks (#1599):
Checks now support optional keyword ?lag=<max-lag> and will respond with 200 only if the lag is smaller than the supplied value.

But, it doesn't take into account yet spikes of replication lag (see maximum_lag_on_replica_delay).

vitabaks mentioned this issue Feb 3, 2024

Add patroni_maximum_lag_on_replica variable vitabaks/postgresql_cluster#569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum lag for replica access (REST API enhancement) #1249

Maximum lag for replica access (REST API enhancement) #1249

vitabaks commented Oct 24, 2019

CyberDem0n commented Oct 24, 2019

vitabaks commented Oct 24, 2019 •

edited

ants commented Oct 24, 2019

mszpulak commented Jan 21, 2020

mszpulak commented Jan 21, 2020

vitabaks commented Sep 3, 2020 •

edited

Maximum lag for replica access (REST API enhancement) #1249

Maximum lag for replica access (REST API enhancement) #1249

Comments

vitabaks commented Oct 24, 2019

CyberDem0n commented Oct 24, 2019

vitabaks commented Oct 24, 2019 • edited

ants commented Oct 24, 2019

mszpulak commented Jan 21, 2020

mszpulak commented Jan 21, 2020

vitabaks commented Sep 3, 2020 • edited

vitabaks commented Oct 24, 2019 •

edited

vitabaks commented Sep 3, 2020 •

edited