New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to test recovery using restore points #3024
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3024 +/- ##
=======================================
Coverage 90.28% 90.28%
=======================================
Files 213 213
Lines 35004 35004
=======================================
Hits 31602 31602
Misses 3402 3402 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt this be exercised somewhere in CI?
scripts/test_restore_points.sh
Outdated
NAME=$1 | ||
PORT=$2 | ||
RESTORE_POINT_NAME=$3 | ||
pg_ctl init -D "${STORAGE_DIR}/${NAME}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the script support PG tools in a non-standard path by prefixing like this:
pg_ctl init -D "${STORAGE_DIR}/${NAME}" | |
${PG_BIN}/pg_ctl init -D "${STORAGE_DIR}/${NAME}" |
Then allow overriding PG_BIN
above. This applies to all psql
calls below as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I can do this, but I believe we don't do that for our other scripts. Personally I would prefer to change PATH
to point to proper pg directory instead before running this script, if someone need to change the version.
('2018-07-01 09:11', 90, 2.7), | ||
('2018-07-01 08:01', 29, 1.5); | ||
|
||
SELECT pg_create_restore_point('${TEST_RESTORE_POINT_3}'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to also insert date after the restore point to see that this is not included in the restored instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do it, but that's also checked when restoring from the restore points 1 and 2
9680b6d
to
d40e389
Compare
2ea1ee4
to
8874a85
Compare
@erimatnor @svenklemm I've added cron test here to see that it actually works, I'll remove it before merging |
.github/workflows/cron-tests.yaml
Outdated
strategy: | ||
fail-fast: false | ||
env: | ||
PG_VERSION: 12.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want a matrix here and test PG11 and PG13 too
8874a85
to
a84423c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that the regression test succeeded. However, I see some errors in the log, which I don't feel well about:
2021-03-17 14:43:27.520 GMT [110] FATAL: database "sn" does not exist
cp: cannot stat '/testdir/wal/sn/00000002.history': No such file or directory
cp: cannot stat '/testdir/wal/sn/00000001.history': No such file or directory
Not ciritical:
sh: locale: not found
2021-03-17 14:43:25.331 UTC [67] WARNING: no usable system locales were found
a84423c
to
19f02b7
Compare
I believe they are related to the way how postgresql copy files using archive/restore command. And |
@pmwkaa The error is FATAL. How can |
@pmwkaa I tried to run
What am I doing wrong? Since it is inside container, I am not sure why it shouldn't just work. The entire log:
|
PG11 used different configuration file for recovery, which is not supported by this script |
I guess I can add support for PG11, if people think it is necessary |
At least it should check and return a meaningful error if the local installation is not correct version of PG. I think it is good to test PG11 too, since PG11 is still supported and users should be able to restore. |
19f02b7
to
24a7d88
Compare
@k-rus Found out the case of the @svenklemm @erimatnor Added support for PG11 and enabled additional debug output during the execution for better readability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, although I left some comments about potential to improve code reuse and reduce the number of times we build the same Docker image in tests.
} | ||
|
||
docker rm -f timescaledb-rp 2>/dev/null || true | ||
IMAGE_NAME=rp_test TAG_NAME=latest bash ${SCRIPT_DIR}/docker-build.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are now building this Docker image over-and-over again in different workflows, which seems inefficient. We should probably see if we can build it once, cache, and reuse across different tests.
If it is difficult to do, we can do it later but just noting the duplicate work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we build them against different PG version, not sure what we can do here
set -e | ||
set -o pipefail | ||
|
||
SCRIPT_DIR=$(dirname $0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The beginning of this script seems to be copy-pasting things from other scripts. I guess there's an opportunity to ensure better code-reuse here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you meant here, but I guess we can tell the same for other scripts in the directory, the only thing that was reused is the postgresql server startup wait logic. Maybe we should address this as a cumulative task which include other scripts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
See also my comments
.github/workflows/cron-tests.yaml
Outdated
|
||
backup_and_restore: | ||
name: Backup and restore | ||
runs-on: ubuntu-18.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to PR #3029 should the test run against 20.04
?
runs-on: ubuntu-18.04 | |
runs-on: ubuntu-20.04 |
# function | ||
status="$?" | ||
set +e # do not exit immediately on failure in cleanup handler | ||
# docker rm -vf timescaledb-valgrind 2>/dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line seems to be leftover. Can you remove it?
24a7d88
to
6db4f2b
Compare
6db4f2b
to
28dc896
Compare
Test PostgreSQL restore point recovery for single node and multi node cluster, make sure that single and multi node results match after recovery.