Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returns error code 74 on WAL fetch when WAL file does not exist #1195

Merged
merged 2 commits into from Feb 17, 2022

Conversation

sebasmannem
Copy link
Contributor

@sebasmannem sebasmannem commented Jan 5, 2022

Database name

Postgres

Pull request description

Describe what this PR fix

When running wal-g in restore_command, postgres has 3 ways to handle a exit code:
0: No problem, next WAL file...
1 - 125: End of timeline? Ok, lets stop recovery and go online

=126: Ouch, big problem. Better not proceed, but error out with a FAIL instead

It would be great if wal-g would be able to distinguish between these cases with a proper exit code too.
Mostly it is so, but config errors, tcp errors, dns errors, etc have the same error code as 'archive does not exist (both 1).

This small change will allow wal-g to return exit code 74 when the wal file does not exist.

Please provide steps to reproduce (if it's a bug)

  • create a postgres backup with wal archive backups
  • restore a file that already exists and print the exit code wal-g wal-fetch 000000010000000300000099 /tmp/000000010000000300000099. It is 1.
  • restore a file with missing config: WALG_S3_PREFIX=123 wal-g wal-fetch 000000010000000300000099 /tmp/000000010000000300000099. It is also 1. We should distinguish, but this not different to Postgres either...
  • restore a file that does not exist int the backupset wal-g wal-fetch 00000001000000030000009A /tmp/00000001000000030000009Aand print the exit code. It is also 1, but should be distinguishable

Please see #1126 for more info.

Even with this fix, we still need Postgres to act differently too.
We have requested a change but it is not considered by the PG community yet.
For now we could make this work with a bash script:

#!/bin/bash
wal-g wal-fetch "$2" "$3"
RET=$?
if [ $RET -eq 1 ]; then
  # whoops, no good. exit code 254 halts postgres with a FAILURE state
  exit 254
fi
exit $RET

@sebasmannem sebasmannem requested a review from a team as a code owner January 5, 2022 18:33
@sebasmannem sebasmannem self-assigned this Jan 5, 2022
@sebasmannem sebasmannem marked this pull request as draft January 5, 2022 19:31
@sebasmannem sebasmannem changed the title Fixes: #1126 Returns error code 74 on WAL fetch when WAL file does not exist Feb 17, 2022
@sebasmannem sebasmannem force-pushed the wal_fetch_exit_codes branch 2 times, most recently from 896c5b2 to e471a50 Compare February 17, 2022 13:13
@sebasmannem sebasmannem marked this pull request as ready for review February 17, 2022 13:24
Copy link
Collaborator

@x4m x4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t like that we advise to use shell script on top of WAL-G. WAL-G should be that script. Though, let’s merge, I think it’s still for good :)

@x4m x4m merged commit 3107d83 into wal-g:master Feb 17, 2022
@sebasmannem
Copy link
Contributor Author

Yes.
One of my hesitations too.
It is just nasty that we need to set to 126+ to get the behavior we should have from postgres.
If this works as we hope, I will see if I can expand the solution.

@sebasmannem sebasmannem deleted the wal_fetch_exit_codes branch February 18, 2022 07:22
@haslersn
Copy link

haslersn commented May 2, 2022

For PostgreSQL that should be any error code between 126 and 255, which can be achieved with a simple wrapper script.

Where is this documented?

Also, why do we necessarily want to stop PostgreSQL, if the restore_command running wal-g exits with an error? I thought PostgreSQL will next try to restore WAL from the pg_wal directory, then from the primary server, and then from the archive location again. In a cycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants