New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator.backup: update wal_prefetch() so that failed downloads are deleted #144
Conversation
…eleted Resolves the issue detailed at: https://groups.google.com/forum/#!topic/wal-e/8wwiDkrNLXQ wherein (standby_mode = off) PostgreSQL recoveries were failing when --prefetch > 0. Specifically, in cases where do_lzop_get() failed to fetch an archive file (as would be the case when trying to fetch the non-existent WAL file following the latest one in the archive) prefetch was creating an empty archive file and then allowing AtomicDownload to link that file into place. This causes PostgreSQL's recovery to fail. PostgreSQL then shuts down and can't be started up until you remove recovery.conf.
Logs from a successful restore, where standby_mode = off and --prefetch had the default of > 0.
|
Yay Merged. |
Hi, the same problem with wal-fetch, when standby_mode = off I use upstream wal-e, but do not understand, how to use prefetch - there is no documentation and i do not understand differences between fetch and prefetch |
@evtuhovich prefetch is a clever, and expedient, feature, and it exists because of how PostgreSQL fetches WAL archives. Basically, PostgreSQL runs the restore_command for one WAL file after another. This works, but it means that you will wait to download WAL archive 2 until after you've downloaded (and loaded?) archive 1. You thus will delay your download of all the WAL archives, in the worst case, by the slowest download. Hence, prefetch, which actually daemonizes a new process when the restore_command is executed, and that process goes out and tries to fetch future WAL files so that they'll be ready when it's time to restore them. (And of course, there's bookkeeping involved to make sure this process doesn't conflict with any other prefetch processes outstanding...) |
@hblanks, thanx for answer, i'll try to use it |
Resolves the issue in v0.8c1 detailed at:
wherein (standby_mode = off) PostgreSQL recoveries were failing when
--prefetch > 0. Specifically, in cases where do_lzop_get() failed to
fetch an archive file (as would be the case when trying to fetch the
non-existent WAL file following the latest one in the archive) prefetch
was creating an empty archive file and then allowing AtomicDownload to
link that file into place. This causes PostgreSQL's recovery to fail.
PostgreSQL then shuts down and can't be started up until you remove
recovery.conf.
Sorry to not include a test for this; I've had difficulties getting tox up and
running in my environment. My best guess would be to add a test to
test_blackbox.py along the lines of test_wal_fetch_non_existent(). I'm
not certain, however.