Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Failed to drop replication slot" after patroni upgrade #2046

Closed
kvaikla opened this issue Aug 31, 2021 · 5 comments
Closed

"Failed to drop replication slot" after patroni upgrade #2046

kvaikla opened this issue Aug 31, 2021 · 5 comments

Comments

@kvaikla
Copy link

kvaikla commented Aug 31, 2021

OS: RH 7.9
Pg: 13.3
Patroni: 2.1.0
Barman: 2.12

After patroni upgrade 1.6.5 -> 2.1.0, patroni logs errors:

Aug 31 17:39:22 pgrep1.energia.sise patroni[18285]: 2021-08-31 17:39:22,412 INFO: Lock owner: pgrep1; I am pgrep1
Aug 31 17:39:22 pgrep1.energia.sise patroni[18285]: 2021-08-31 17:39:22,520 ERROR: Failed to drop replication slot 'barman'
Aug 31 17:39:22 pgrep1.energia.sise patroni[18285]: 2021-08-31 17:39:22,626 INFO: no action. I am (pgrep1) the leader with the lock

Backup, replica, failover etc works fine. No errors in pg server log.
Solts in master (pgrep1) server:

postgres=# select * from pg_replication_slots ;
-[ RECORD 1 ]-------+-----------
slot_name           | pgrep2
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | t
active_pid          | 90559
xmin                |
catalog_xmin        |
restart_lsn         | 4/EE0001E8
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       |
-[ RECORD 2 ]-------+-----------
slot_name           | barman
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | t
active_pid          | 94142
xmin                |
catalog_xmin        |
restart_lsn         | 4/EE000000
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       |

Barman receivewal command

barman    94141  94136  0 17:25 ?        00:00:00 /usr/pgsql-13/bin/pg_receivewal --dbname=dbname=replication host=pgrep1.energia.sise options=-cdatestyle=iso port=5432 replication=true user=barman application_name=barman_receive_wal --verbose --no-loop --no-password --directory=/pgdata/backup/pgrep1:5432/streaming --slot=barman

pg config

$ patronictl show-config
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    archive_command: null
    archive_mode: 'off'
    hot_standby: 'on'
    log_connections: 'on'
    log_disconnections: 'on'
    log_temp_files: '1'
    max_replication_slots: 8
    max_wal_senders: 8
    pg_partman_bgw.dbname: kvpg
    pg_partman_bgw.interval: 300
    pg_partman_bgw.role: postgres
    shared_preload_libraries: pg_partman_bgw
    wal_keep_segments: 8
    wal_level: replica
    wal_log_hints: 'on'
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
ttl: 30

Is it bug or misconfiguration?

@CyberDem0n
Copy link
Collaborator

Patroni is trying to drop any unrecognized replication slot. It has always been like this, just the latest release does it more aggressive than before.. If you want to avoid it either you have to define slots or ignore_slots. More details in the documentation: https://patroni.readthedocs.io/en/latest/SETTINGS.html

@kvaikla
Copy link
Author

kvaikla commented Sep 1, 2021

@CyberDem0n thnx

@kvaikla kvaikla closed this as completed Sep 1, 2021
@bradnicholson
Copy link
Contributor

@CyberDem0n - it's also doing this with the replication slot that pg_basebackup opens when Patroni calls it to create a replica. I assume it should not be doing that as this is all within Patroni's control.

This is from the initial replica bootstrap on a new cluster.

Relevant bit from patroni.yml

  create_replica_method:
     - basebackup
  basebackup:
     - max-rate: '125M'
patroni_test-m-0 db 2021-09-03T12:56:37.265466291Z 2021-09-03 12:56:37,256 INFO: Lock owner: patroni_test-m-0; I am patroni_test-m-0
patroni_test-m-0 db 2021-09-03T12:56:37.265536285Z 2021-09-03 12:56:37,264 ERROR: Failed to drop replication slot 'pg_basebackup_338'
patroni_test-m-0 db 2021-09-03T12:56:37.270047589Z 2021-09-03 12:56:37,269 INFO: no action. I am (patroni_test-m-0) the leader with the lock

CyberDem0n pushed a commit that referenced this issue Sep 10, 2021
Starting from v10 pg_basebackup creates a temporary replication slot for
WAL streaming and Patroni was trying to drop it like unknown.
Another option would be running pg_basebackup with
`--slot=current_node_name` option, but unfortunately at the moment when
pg_basebackup is executed we don't yet know the major version
(the `--slot` option was added in v9.6).

Ref #2046 (comment)
@CyberDem0n
Copy link
Collaborator

@bradnicholson sorry about that, #2055 should fix the problem.
But, currently in order to mitigate it you can add --slot=$current_node_name to basebackup options.

CyberDem0n added a commit that referenced this issue Sep 17, 2021
Starting from v10 `pg_basebackup` creates a temporary replication slot for WAL streaming and Patroni was trying to drop it because the slot name looks unknown. In order to fix it, we skip all temporary slots when querying `pg_stat_replication_slots` view.

Another option to solve the problem would be running `pg_basebackup` with `--slot=current_node_name` option, but unfortunately at the moment when `pg_basebackup` is executed, we don't yet know the major version (the `--slot` option was added in v9.6).

Ref: #2046 (comment)
@angelorso007
Copy link

Hello all,

I am having this error too but barman is working fine.

How can I add this option "--slot=$current_node_name"?
Is it barman config or the cluster config?

Can you please wtrite down an example on this post?

Thank you
Angelo

Please add

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants