-
Notifications
You must be signed in to change notification settings - Fork 852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add update test for repair table script #2838
Conversation
25d220f
to
a705c96
Compare
Codecov Report
@@ Coverage Diff @@
## master #2838 +/- ##
==========================================
+ Coverage 90.07% 90.22% +0.14%
==========================================
Files 212 212
Lines 34772 34725 -47
==========================================
+ Hits 31322 31329 +7
+ Misses 3450 3396 -54
Continue to review full report at Codecov.
|
d610732
to
bb9de19
Compare
1166dbf
to
9be2f04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think its a good idea to drop constraints in the update test workflow. If you really need to drop them then your test should be completely separate from that workflow.
I am not sure what you're suggesting. We want to test the repair that is being done as part of the update, so we should run It is possible to mirror the existing update test script with only this test (essentially doing it the same way) and create a bunch of new docker containers, but that would double the execution time with no obvious advantage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. I agree with Sven that it is better to have a separate test path for this outside of the usual update tests. The metadata fix is ideally a one off task and not needed across upgrades.
ORDER BY dimension_slice_id LIMIT 1 | ||
); | ||
|
||
DELETE FROM _timescaledb_catalog.dimension_slice WHERE id IN ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest removing the metadata for all the slices for one of the cases (worst case scenario).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sql/updates/1.7.1--1.7.2.sql
Outdated
@@ -16,59 +16,66 @@ WITH | |||
-- All dimension slices that are mentioned in the chunk_constraint | |||
-- table but are missing from the dimension_slice table. | |||
missing_slices AS ( | |||
SELECT dimension_slice_id, | |||
SELECT ch.hypertable_id, | |||
di.id as dimension_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest fixing indentation (probably a tabs vs spaces issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran untabify on the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look too much in detail at the test itself, but I have a high-level concern/comment.
If the previous update script didn't work for all cases, do we know that the update always failed? The case I am concerned about is a successful update without a full repair. In that case we need another repair for 2.0.0--2.0.1.
# We need to run the post repair script to make sure that the | ||
# constraint is on the clean rerun as well since the setup script can | ||
# remove it. | ||
if [[ "${TEST_VERSION}" > "v6" ]] || [[ "${TEST_VERSION}" = "v6" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the repair test be tied to specific TimescaleDB versions? I.e., we want to make sure that any TimescaleDB versions that proceed the repair update script will have the error, but no other. Is this handled by "v6"? I worry about the case when we also run repair tests for future upgrades where the issue no longer exists and the update script won't run (e.g., 2.0.1 -> 2.1.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check just make sure that we restore the constraint if it was dropped. The constraint is always dropped in v6 or later, hence needs to be restored in v6 and later. Inside the repair setup, there is a check that we are upgrading from a version before 2.0.0, which guarantees that the latest repair is done. If we are upgrading from 2.0.1 to 2.1.0, no dimension slices will be dropped, but the foreign key constraint still need to be restored.
This is because there is no easy way to check what we're upgrading from and to, only what version we are currently running as.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow completely, but it sounds like the repair test activates only in cases we are upgrading from a pre 2.0 release.
One concern I still have is that we now do a separate CI run for the repair, which means we run all update tests from 2.0.1 to future releases multiple times, since the repair doesn't happen there. It seems somewhat unnecessary to run these update tests again. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, repair tests are only activated when upgrading from pre-2.0 to 2.0. The check has to be updated when merging with 1.7 branch since repair tests are only running on 1.7.1 to later.
No, you got it right. Right now we run all update tests with and without repair because there are repairs that happen during update. For branches where there is no repair, we need to disable these tests, but that is a later problem. For the time being, we will at least upgrade from 1.7.x branch, and there will always be a repair done until 1.7 EOL.
Both of the repairs pick out all dimension slices that are missing, but handle them incorrectly and can then trigger an error while processing them (this is what the changes in the commit do), so it does not seem likely that it can have missed some dimension slices. We do not have any "post-update" checks in the update scripts though: is this something that we need to add? Right now we only have integrity checks in the update test post checks. |
650a4e7
to
fc74199
Compare
8cb5ea0
to
d2ca58e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, but have some concerns that we are running some update tests multiple times.
.github/workflows/update-test.yaml
Outdated
@@ -27,9 +30,9 @@ jobs: | |||
- name: Checkout TimescaleDB | |||
uses: actions/checkout@v2 | |||
|
|||
- name: Update tests ${{ matrix.pg }} | |||
- name: Update tests ${{ matrix.pg }} ${{ matrix.kind }}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: Update tests ${{ matrix.pg }} ${{ matrix.kind }}} | |
- name: Update tests ${{ matrix.pg }} ${{ matrix.kind }} |
include: | ||
- pg: 11.10 | ||
pg_major: 11 | ||
- pg: 12.5 | ||
pg_major: 12 | ||
- opt: "-r" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to run this for both PG versions? Maybe it is enough to just run with PG12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are optimizations to do, but I would prefer to defer that to a separate PR.
# We need to run the post repair script to make sure that the | ||
# constraint is on the clean rerun as well since the setup script can | ||
# remove it. | ||
if [[ "${TEST_VERSION}" > "v6" ]] || [[ "${TEST_VERSION}" = "v6" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow completely, but it sounds like the repair test activates only in cases we are upgrading from a pre 2.0 release.
One concern I still have is that we now do a separate CI run for the repair, which means we run all update tests from 2.0.1 to future releases multiple times, since the repair doesn't happen there. It seems somewhat unnecessary to run these update tests again. Or am I missing something?
runs-on: 'ubuntu-18.04' | ||
strategy: | ||
matrix: | ||
pg: ["11.10","12.5"] | ||
opt: ["", "-r"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think restricting the repair tests to upgrades between 1.7.xx -> 2.0 , 2.0.1 is sufficient and we don't need to run this for the entire matrix. I don't mind deferring that change to a later PR. But do think it is necessary so that update tests don't unnecessarily consume too much time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I picked this as an easy solution for now. We can tune it over time and should probably also avoid running a full test when we try the repair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the actual breaking of the dimension slices only happen if you upgrade from pre-2.0. There are checks in the SQL files that control this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. I think this should have a follow up PR to restrict the repair tests to specific timescale version upgrades. It is not necessary to test the repair script against all TS versions to verify that it works as expected.
d2ca58e
to
2f6c8ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is supposed to operate on different versions i dont think this should piggyback on the existing versioning in the update tests. I would suggest removing the toplevel matrix option and instead start the repair tests in scripts/test_updates_pg1[12].sh for the versions where it is needed
The repair runs on all updates from pre-2.0 updates to 2.0.x so even if the check is moved into the shell scripts, it will execute for all tests, with the exception of 2.0.0 to 2.0.1 (all others to 2.0.x). There are other optimizations that I think we should do to avoid duplicating tests, but I would prefer to take this as a separate PR since the extra test being executed is not a significant problem and we have changes that we want to have in 2.0.1. |
The commit fixes two bugs in the repair scripts that could prevent an update in rare circumstances. For the 1.7.1--1.7.2 repair script: if there were several missing dimension slices in different hypertables with the same column name, the repair script would be confused on what constraint had what type and generate an error. For the 2.0.0-rc1--2.0.0-rc2 repair script: if a partition constraint was broken, it would generate an error rather than repairing the dimension slices because BIGINT_MIN would be cast to a double float and then an attempt would be made to cast it back to bigint, causing an overflow error. This commit also creates an update repair test that breaks a few tables for pre-2.0 versions to ensure that the repair script actually fixes them. The integrity check for the update tests already contain a check that dimension slices are valid, so there is no need to add a test for that. This commit adds an extra dimension in the workflow to test updates with repair and run that separately. It also changes the update test scripts to by default run without repair tests and add the additional option `-r` for running repair tests in addition to the normal tests. Fixes timescale#2824
2f6c8ad
to
0ff29c3
Compare
This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * #2772 Always validate existing database and extension * #2780 Fix config enum entries for remote data fetcher * #2806 Add check for dropped chunk on update * #2828 Improve cagg watermark caching * #2838 Fix catalog repair in update script * #2842 Do not mark job as started when setting next_start field * #2845 Fix continuous aggregate privileges during upgrade * #2851 Fix nested loop joins that involve compressed chunks * #2860 Fix projection in ChunkAppend nodes * #2861 Remove compression stat update from update script * #2865 Apply volatile function quals at decompresschunk node * #2866 Avoid partitionwise planning of partialize_agg * #2868 Fix corruption in gapfill plan * #2874 Fix partitionwise agg crash due to uninitialized memory **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @markatosi for reporting a segfault with partitionwise aggregates enabled * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows * @WarriorOfWire for reporting the bug with gapfill queries not being able to find pathkey item to sort
This commit creates an update repair test that breaks a few tables for
pre-2.0 versions to ensure that the repair script actually fixes them.
The integrity check for the update tests already contain a check that
dimension slices are valid, so there is no need to add a test for that.
In addition, the commit fixes two bugs in the repair scripts that could
prevent an update in rare circumstances.
For the 1.7.1--1.7.2 repair script: if there were several missing
dimension slices in different hypertables with the same column name,
the repair script would be confused on what contraint had what type and
generate an error.
For the 2.0.0-rc1--2.0.0-rc2 repair script: if a partition constraint
was broken, it would generate an error rather than repairing the
dimension slices because
BIGINT_MIN
would be cast to a double floatand then an attempt would be made to cast it back to
bigint
, causingan overflow error.
Fixes #2824