Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log status on change only #1800

Closed
stephan2012 opened this issue Dec 14, 2020 · 4 comments
Closed

Log status on change only #1800

stephan2012 opened this issue Dec 14, 2020 · 4 comments

Comments

@stephan2012
Copy link

stephan2012 commented Dec 14, 2020

Describe the bug

Patroni currently floods logs with status messages every five seconds:

2020-12-14 17:11:58,430 INFO: Lock owner: my-application-pg-0; I am my-applcation-pg-1
2020-12-14 17:11:58,430 INFO: does not have lock
2020-12-14 17:11:58,431 INFO: no action.  i am a secondary and i am following a leader

Emitting messages on status changes (or errors) only would not only reduce the number of log events but also make the log more clear.

To Reproduce
Setup any replicated database.

Expected behavior
Emit a log message only if there is news …

Screenshots
N/A

Environment

  • Patroni version: 1.6.5
  • PostgreSQL version: Postgres 11
  • DCS (and its version): N/A

Patroni configuration file
Generated by the Postgres Operator, N/A here

patronictl show-config

loop_wait: 5
maximum_lag_on_failover: 33554432
postgresql:
  parameters:
    archive_mode: 'on'
    archive_timeout: 1800s
    autovacuum_analyze_scale_factor: 0.02
    autovacuum_max_workers: 5
    autovacuum_vacuum_scale_factor: 0.05
    checkpoint_completion_target: 0.9
    hot_standby: 'on'
    log_autovacuum_min_duration: 0
    log_checkpoints: 'on'
    log_connections: 'on'
    log_disconnections: 'on'
    log_line_prefix: '%t [%p]: [%l-1] %c %x %d %u %a %h '
    log_lock_waits: 'on'
    log_min_duration_statement: 500
    log_statement: ddl
    log_temp_files: 0
    max_connections: '130'
    max_replication_slots: 10
    max_wal_senders: 10
    tcp_keepalives_idle: 900
    tcp_keepalives_interval: 100
    track_functions: all
    wal_keep_segments: 8
    wal_level: hot_standby
    wal_log_hints: 'on'
  use_pg_rewind: true
  use_slots: true
retry_timeout: 5
ttl: 10

Have you checked Patroni logs?
N/A

Have you checked PostgreSQL logs?
N/A

Have you tried to use GitHub issue search?
Yes.

Additional context
Add any other context about the problem here.

@CyberDem0n
Copy link
Member

Let's use correct words, it doesn't flood logs, but provides update on heartbeat status. In case of problems such heartbeat logs are extremely useful.
In your case it happens every 5 seconds. Yes, it might be too often, but it could be changed by increasing the value of the loop_wait. The loop_wait parameter doesn't affect the speed of failure detection.
Besides that, the ttl=10 seconds is very small. It leaves no time to recover from minor network issues and increasing chances of false positives by a lot. I would recommend setting ttl smaller than 20.
Last, but not least, you violating the rule: loop_wait + 2*retry_timeout <= ttl.

Doing something smart with logs like keeping them in memory and discarding after certain time if everything is fine might be an option, but I have neither time nor wish to work on it. In other words if you are not planning to implement it, I don't see any reason to keep the issue open, especially that there is already a few duplicates already opened.

@stephan2012
Copy link
Author

@CyberDem0n Thanks for your advice, much appreciated. I will reconsider the settings.

When my issue is a duplicate, please let’s just link the original and close this one. However, closing an issue for the sake of closing because nobody can work on it at the moment is the probably the wrong way.

@CyberDem0n
Copy link
Member

please let’s just link the original

Currently open

Closed

There might be more.

However, closing an issue for the sake of closing because nobody can work on at the moment is the probably the wrong way.

Well, first of all, this issue is a duplicate. At the moment is a very vague definition. Project exists for more than 5 years. The #621 was opened nearly 3 years ago, and I am pretty sure there are more similar issues, but they are closed. Since nobody volunteered to work on a feature for 3 years it doesn't look very important. Many OS projects even have a policy of closing issues automatically if they get no activity for a couple of months and I totally understand them.

@stephan2012
Copy link
Author

Closing in favor of #621 and #1154.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants