Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error messages in the db logfile during recovery based on not existing prefetch wal files. #1328

Closed
LuMrh opened this issue Aug 31, 2022 · 5 comments

Comments

@LuMrh
Copy link

LuMrh commented Aug 31, 2022

Database name

PostgreSQL 14

Issue description

Hello all,
i am testing v2.0.0 now and have noticed some unexpected behaviour to compare with the v0.2.15.
It applies to any recovery cases where prefetch of wal files was done. If the prefetch files don't exist and also are not needed for the recovery (because of they have not yet been written at all) the ERROR messages are created in the database log file, even if the recovery works well in the end.

This behaviour is misleading. Would it be possible to change it?

Test Case

I have attached the logfile of full db recovery. In my example, the following wal files are available:
000000010000000000000002 - 000000010000000000000009
00000001000000000000000A - 00000001000000000000000E

And for recovery needed files are:
00000001000000000000000A - 00000001000000000000000E

Log file:
2022-08-26 12:34:03.167 UTC [1] @/app= LOG: starting PostgreSQL 14.5 (Ubuntu 14.5-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
2022-08-26 12:34:03.188 UTC [1] @/app= LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-08-26 12:34:03.188 UTC [1] @/app= LOG: listening on IPv6 address "::", port 5432
2022-08-26 12:34:03.193 UTC [1] @/app= LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-08-26 12:34:03.199 UTC [347] @/app= LOG: database system was interrupted; last known up at 2022-08-26 12:27:20 UTC
2022-08-26 12:34:03.199 UTC [347] @/app= LOG: creating missing WAL directory "pg_wal/archive_status"
ERROR: 2022/08/26 12:34:03.576308 Archive '00000002.history' does not exist.
2022-08-26 12:34:03.577 UTC [347] @/app= LOG: starting archive recovery
2022-08-26 12:34:03.741 UTC [347] @/app= LOG: restored log file "00000001000000000000000A" from archive
2022-08-26 12:34:03.758 UTC [347] @/app= LOG: redo starts at 0/A000028
2022-08-26 12:34:03.760 UTC [347] @/app= LOG: consistent recovery state reached at 0/A000100
2022-08-26 12:34:03.761 UTC [1] @/app= LOG: database system is ready to accept read-only connections
INFO: 2022/08/26 12:34:03.801336 WAL-prefetch file: 00000001000000000000000B
INFO: 2022/08/26 12:34:03.811534 WAL-prefetch file: 00000001000000000000000C
INFO: 2022/08/26 12:34:03.828519 WAL-prefetch file: 00000001000000000000000D
INFO: 2022/08/26 12:34:03.846557 WAL-prefetch file: 00000001000000000000000E
INFO: 2022/08/26 12:34:03.868861 WAL-prefetch file: 00000001000000000000000F
INFO: 2022/08/26 12:34:03.879043 WAL-prefetch file: 000000010000000000000010
INFO: 2022/08/26 12:34:03.889120 WAL-prefetch file: 000000010000000000000011
INFO: 2022/08/26 12:34:03.904654 WAL-prefetch file: 000000010000000000000012
INFO: 2022/08/26 12:34:03.910094 WAL-prefetch file: 000000010000000000000013
INFO: 2022/08/26 12:34:03.920341 WAL-prefetch file: 000000010000000000000014
2022-08-26 12:34:04.163 UTC [347] @/app= LOG: restored log file "00000001000000000000000B" from archive
INFO: 2022/08/26 12:34:04.173748 FILE PATH: 00000001000000000000000A.lz4
INFO: 2022/08/26 12:34:04.312620 WAL-prefetch file: 000000010000000000000015
ERROR: 2022/08/26 12:34:04.312966 Archive '000000010000000000000011' does not exist.
ERROR: 2022/08/26 12:34:04.314511 Archive '000000010000000000000010' does not exist.
ERROR: 2022/08/26 12:34:04.317160 Archive '000000010000000000000012' does not exist.
ERROR: 2022/08/26 12:34:04.317193 Archive '000000010000000000000013' does not exist.
ERROR: 2022/08/26 12:34:04.317225 Archive '00000001000000000000000F' does not exist.
ERROR: 2022/08/26 12:34:04.317248 Archive '000000010000000000000014' does not exist.
ERROR: 2022/08/26 12:34:04.655881 Archive '000000010000000000000015' does not exist.
2022-08-26 12:34:04.694 UTC [347] @/app= LOG: restored log file "00000001000000000000000C" from archive
INFO: 2022/08/26 12:34:04.743265 WAL-prefetch file: 00000001000000000000000F
INFO: 2022/08/26 12:34:04.762662 WAL-prefetch file: 000000010000000000000010
INFO: 2022/08/26 12:34:04.773563 WAL-prefetch file: 000000010000000000000011
INFO: 2022/08/26 12:34:04.783810 WAL-prefetch file: 000000010000000000000012
INFO: 2022/08/26 12:34:04.794095 WAL-prefetch file: 000000010000000000000013
INFO: 2022/08/26 12:34:04.804350 WAL-prefetch file: 000000010000000000000014
INFO: 2022/08/26 12:34:04.814608 WAL-prefetch file: 000000010000000000000015
INFO: 2022/08/26 12:34:04.824809 WAL-prefetch file: 000000010000000000000016
INFO: 2022/08/26 12:34:05.005484 FILE PATH: 00000001000000000000000B.lz4
ERROR: 2022/08/26 12:34:05.008761 Archive '000000010000000000000016' does not exist.
ERROR: 2022/08/26 12:34:05.022951 Archive '000000010000000000000012' does not exist.
ERROR: 2022/08/26 12:34:05.024437 Archive '000000010000000000000013' does not exist.
ERROR: 2022/08/26 12:34:05.035152 Archive '00000001000000000000000F' does not exist.
ERROR: 2022/08/26 12:34:05.039956 Archive '000000010000000000000011' does not exist.
ERROR: 2022/08/26 12:34:05.059941 Archive '000000010000000000000014' does not exist.
ERROR: 2022/08/26 12:34:05.060069 Archive '000000010000000000000015' does not exist.
ERROR: 2022/08/26 12:34:05.094050 Archive '000000010000000000000010' does not exist.
2022-08-26 12:34:05.099 UTC [347] @/app= LOG: restored log file "00000001000000000000000D" from archive
INFO: 2022/08/26 12:34:05.136627 WAL-prefetch file: 00000001000000000000000F
INFO: 2022/08/26 12:34:05.155564 WAL-prefetch file: 000000010000000000000010
INFO: 2022/08/26 12:34:05.165660 WAL-prefetch file: 000000010000000000000011
INFO: 2022/08/26 12:34:05.176534 WAL-prefetch file: 000000010000000000000012
INFO: 2022/08/26 12:34:05.186937 WAL-prefetch file: 000000010000000000000013
INFO: 2022/08/26 12:34:05.197366 WAL-prefetch file: 000000010000000000000014
INFO: 2022/08/26 12:34:05.207809 WAL-prefetch file: 000000010000000000000015
INFO: 2022/08/26 12:34:05.218187 WAL-prefetch file: 000000010000000000000016
INFO: 2022/08/26 12:34:05.228355 WAL-prefetch file: 000000010000000000000017
INFO: 2022/08/26 12:34:05.239567 FILE PATH: 00000001000000000000000C.lz4
ERROR: 2022/08/26 12:34:05.337732 Archive '000000010000000000000011' does not exist.
ERROR: 2022/08/26 12:34:05.338473 Archive '000000010000000000000010' does not exist.
ERROR: 2022/08/26 12:34:05.344805 Archive '000000010000000000000014' does not exist.
ERROR: 2022/08/26 12:34:05.349384 Archive '000000010000000000000012' does not exist.
ERROR: 2022/08/26 12:34:05.364154 Archive '000000010000000000000016' does not exist.
ERROR: 2022/08/26 12:34:05.372071 Archive '000000010000000000000017' does not exist.
ERROR: 2022/08/26 12:34:05.391991 Archive '000000010000000000000015' does not exist.
ERROR: 2022/08/26 12:34:05.404442 Archive '000000010000000000000013' does not exist.
ERROR: 2022/08/26 12:34:05.406101 Archive '00000001000000000000000F' does not exist.
2022-08-26 12:34:05.576 UTC [347] @/app= LOG: restored log file "00000001000000000000000E" from archive
INFO: 2022/08/26 12:34:05.605258 WAL-prefetch file: 00000001000000000000000F
INFO: 2022/08/26 12:34:05.622174 WAL-prefetch file: 000000010000000000000010
INFO: 2022/08/26 12:34:05.633116 WAL-prefetch file: 000000010000000000000011
INFO: 2022/08/26 12:34:05.644159 WAL-prefetch file: 000000010000000000000012
INFO: 2022/08/26 12:34:05.654501 WAL-prefetch file: 000000010000000000000013
INFO: 2022/08/26 12:34:05.665580 WAL-prefetch file: 000000010000000000000014
INFO: 2022/08/26 12:34:05.675647 WAL-prefetch file: 000000010000000000000015
INFO: 2022/08/26 12:34:05.685827 WAL-prefetch file: 000000010000000000000016
INFO: 2022/08/26 12:34:05.699023 WAL-prefetch file: 000000010000000000000017
INFO: 2022/08/26 12:34:05.709165 WAL-prefetch file: 000000010000000000000018
ERROR: 2022/08/26 12:34:05.790104 Archive '00000001000000000000000F' does not exist.
ERROR: 2022/08/26 12:34:05.792435 Archive '000000010000000000000011' does not exist.
ERROR: 2022/08/26 12:34:05.798479 Archive '000000010000000000000010' does not exist.
ERROR: 2022/08/26 12:34:05.812251 Archive '000000010000000000000013' does not exist.
ERROR: 2022/08/26 12:34:05.812881 Archive '000000010000000000000012' does not exist.
ERROR: 2022/08/26 12:34:05.819048 Archive '000000010000000000000014' does not exist.
ERROR: 2022/08/26 12:34:05.830715 Archive '000000010000000000000015' does not exist.
ERROR: 2022/08/26 12:34:05.852798 Archive '000000010000000000000016' does not exist.
ERROR: 2022/08/26 12:34:05.858599 Archive '000000010000000000000017' does not exist.
ERROR: 2022/08/26 12:34:05.864005 Archive '000000010000000000000018' does not exist.
INFO: 2022/08/26 12:34:05.868747 FILE PATH: 00000001000000000000000D.lz4
ERROR: 2022/08/26 12:34:06.125907 Archive '00000001000000000000000F' does not exist.
2022-08-26 12:34:06.127 UTC [347] @/app= LOG: redo done at 0/EDAE908 system usage: CPU: user: 0.47 s, system: 0.09 s, elapsed: 2.36 s
2022-08-26 12:34:06.127 UTC [347] @/app= LOG: last completed transaction was at log time 2022-08-26 12:28:41.862881+00
INFO: 2022/08/26 12:34:06.163207 FILE PATH: 00000001000000000000000E.lz4
2022-08-26 12:34:06.314 UTC [347] @/app= LOG: restored log file "00000001000000000000000E" from archive
INFO: 2022/08/26 12:34:06.343765 WAL-prefetch file: 00000001000000000000000F
INFO: 2022/08/26 12:34:06.362509 WAL-prefetch file: 000000010000000000000010
INFO: 2022/08/26 12:34:06.373235 WAL-prefetch file: 000000010000000000000011
INFO: 2022/08/26 12:34:06.383503 WAL-prefetch file: 000000010000000000000012
INFO: 2022/08/26 12:34:06.393769 WAL-prefetch file: 000000010000000000000013
INFO: 2022/08/26 12:34:06.404023 WAL-prefetch file: 000000010000000000000014
INFO: 2022/08/26 12:34:06.414257 WAL-prefetch file: 000000010000000000000015
INFO: 2022/08/26 12:34:06.424524 WAL-prefetch file: 000000010000000000000016
INFO: 2022/08/26 12:34:06.434973 WAL-prefetch file: 000000010000000000000017
INFO: 2022/08/26 12:34:06.445257 WAL-prefetch file: 000000010000000000000018
ERROR: 2022/08/26 12:34:06.619386 Archive '000000010000000000000012' does not exist.
ERROR: 2022/08/26 12:34:06.624435 Archive '000000010000000000000016' does not exist.
ERROR: 2022/08/26 12:34:06.629405 Archive '00000001000000000000000F' does not exist.
ERROR: 2022/08/26 12:34:06.631286 Archive '000000010000000000000011' does not exist.
ERROR: 2022/08/26 12:34:06.636221 Archive '000000010000000000000013' does not exist.
ERROR: 2022/08/26 12:34:06.638299 Archive '000000010000000000000018' does not exist.
ERROR: 2022/08/26 12:34:06.643973 Archive '000000010000000000000015' does not exist.
ERROR: 2022/08/26 12:34:06.644541 Archive '000000010000000000000014' does not exist.
ERROR: 2022/08/26 12:34:06.648378 Archive '000000010000000000000010' does not exist.
ERROR: 2022/08/26 12:34:06.676487 Archive '000000010000000000000017' does not exist.
ERROR: 2022/08/26 12:34:06.739745 Archive '00000002.history' does not exist.

2022-08-26 12:34:06.741 UTC [347] @/app= LOG: selected new timeline ID: 2
2022-08-26 12:34:06.767 UTC [347] @/app= LOG: archive recovery complete
INFO: 2022/08/26 12:34:06.910203 FILE PATH: 00000001000000000000000E.lz4
ERROR: 2022/08/26 12:34:07.009364 Archive '00000001.history' does not exist.
2022-08-26 12:34:07.016 UTC [365] @/app= LOG: checkpoint starting: end-of-recovery immediate wait
2022-08-26 12:34:07.309 UTC [365] @/app= LOG: checkpoint complete: wrote 4470 buffers (4.1%); 0 WAL file(s) added, 0 removed, 5 recycled; write=0.226 s, sync=0.055 s, total=0.295 s; sync files=30, longest=0.049 s, average=0.002 s; distance=81920 kB, estimate=81920 kB
INFO: 2022/08/26 12:34:07.530236 FILE PATH: 00000002.history.lz4
2022-08-26 12:34:07.699 UTC [1] @/app= LOG: database system is ready to accept connections
.......

@tos-ilex
Copy link

We seem to be experiencing the same issue with wal-g version v2.0.1. We take a base backup at 01:00. When restoring to 07:00 of the same day, wal-g tries to prefetch multiple archives that don't exist in storage. What's worse is that it also seems to exit with an unsuccessful code, telling us that recovery failed even though as far as I understand, everything is fine, there just weren't that many transactions between backup and recovery time.

I assume this is due to #1195 ?

@usernamedt
Copy link
Member

We seem to be experiencing the same issue with wal-g version v2.0.1. We take a base backup at 01:00. When restoring to 07:00 of the same day, wal-g tries to prefetch multiple archives that don't exist in storage. What's worse is that it also seems to exit with an unsuccessful code, telling us that recovery failed even though as far as I understand, everything is fine, there just weren't that many transactions between backup and recovery time.

I assume this is due to #1195 ?

Hi! Try to disable wal-prefetch and check if the error reproduces. I think that your problem is not related to the prefetch log messages, but to #1195.

@denisSurkov
Copy link
Contributor

denisSurkov commented May 6, 2023

@usernamedt , is it safe to say that wal-prefetch has to fetch at least 1 wal file (the one that we pass as argument to wal-fetch)?

If wal-g has download concurrency >= 2 (WALG_DOWNLOAD_CONCURRENCY, minimum 1), then it's nice to prefetch additional files? But if prefetching them fails its fine to ignore it.

If so, then to solve this problem, i can refactor HandleWALPrefetch so it ignores failures from additional files (after first loop iteration).


After some further research, I think wal-prefetch doesnt have to log errors at all (maybe change log level to debug). It's internal use only command and wal-fetch has manual downloading required file after timeout.

@usernamedt
Copy link
Member

usernamedt commented May 8, 2023

Hi!

@usernamedt , is it safe to say that wal-prefetch has to fetch at least 1 wal file (the one that we pass as argument to wal-fetch)?

No, wal-fetch handles the download of the WAL file name provided as an argument. wal-prefetch job is to try to download next N files, where N is the WALG_DOWNLOAD_CONCURRENCY setting value.

After some further research, I think wal-prefetch doesnt have to log errors at all (maybe change log level to debug). It's internal use only command and wal-fetch has manual downloading required file after timeout.

I guess this is the correct way to solve this issue. Debug log level for this error message should work fine. Furthermore, I guess that all wal-prefetch output should be redirected to DebugLogger to avoid confusion or we should add a custom prefix for all wal-prefetch-generated log messages.

@usernamedt
Copy link
Member

Fixed in #1471

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants