Add update test for repair table script #2838

mkindahl · 2021-01-18T12:38:34Z

This commit creates an update repair test that breaks a few tables for
pre-2.0 versions to ensure that the repair script actually fixes them.
The integrity check for the update tests already contain a check that
dimension slices are valid, so there is no need to add a test for that.

In addition, the commit fixes two bugs in the repair scripts that could
prevent an update in rare circumstances.

For the 1.7.1--1.7.2 repair script: if there were several missing
dimension slices in different hypertables with the same column name,
the repair script would be confused on what contraint had what type and
generate an error.

For the 2.0.0-rc1--2.0.0-rc2 repair script: if a partition constraint
was broken, it would generate an error rather than repairing the
dimension slices because BIGINT_MIN would be cast to a double float
and then an attempt would be made to cast it back to bigint, causing
an overflow error.

Fixes #2824

codecov · 2021-01-18T13:14:59Z

Codecov Report

Merging #2838 (0ff29c3) into master (126f1c8) will increase coverage by 0.14%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2838      +/-   ##
==========================================
+ Coverage   90.07%   90.22%   +0.14%     
==========================================
  Files         212      212              
  Lines       34772    34725      -47     
==========================================
+ Hits        31322    31329       +7     
+ Misses       3450     3396      -54

Impacted Files	Coverage Δ
tsl/src/nodes/gapfill/planner.c	`96.89% <ø> (-0.02%)`	⬇️
src/plan_expand_hypertable.c	`94.32% <100.00%> (+0.06%)`	⬆️
src/plan_partialize.c	`97.91% <100.00%> (+0.04%)`	⬆️
src/planner.c	`93.54% <100.00%> (ø)`
tsl/src/nodes/decompress_chunk/decompress_chunk.c	`94.05% <100.00%> (+0.50%)`	⬆️
tsl/src/nodes/decompress_chunk/qual_pushdown.c	`91.33% <100.00%> (+0.72%)`	⬆️
src/import/planner.c	`70.30% <0.00%> (+11.12%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e86bbe...0ff29c3. Read the comment docs.

svenklemm

I dont think its a good idea to drop constraints in the update test workflow. If you really need to drop them then your test should be completely separate from that workflow.

mkindahl · 2021-01-21T15:08:49Z

I dont think its a good idea to drop constraints in the update test workflow. If you really need to drop them then your test should be completely separate from that workflow.

I am not sure what you're suggesting.

We want to test the repair that is being done as part of the update, so we should run ALTER EXTENSION to test that all works. To delete some dimension slices it is necessary to remove the foreign key constraint but it is not possible to re-add it until after the update is done.

It is possible to mirror the existing update test script with only this test (essentially doing it the same way) and create a bunch of new docker containers, but that would double the execution time with no obvious advantage.

gayyappan

Changes look good to me. I agree with Sven that it is better to have a separate test path for this outside of the usual update tests. The metadata fix is ideally a one off task and not needed across upgrades.

gayyappan · 2021-01-21T21:44:03Z

test/sql/updates/setup.repair.sql

+   ORDER BY dimension_slice_id LIMIT 1
+);
+
+DELETE FROM _timescaledb_catalog.dimension_slice WHERE id IN (


suggest removing the metadata for all the slices for one of the cases (worst case scenario).

erimatnor · 2021-01-22T13:02:42Z

sql/updates/1.7.1--1.7.2.sql

@@ -16,59 +16,66 @@ WITH
   -- All dimension slices that are mentioned in the chunk_constraint
   -- table but are missing from the dimension_slice table.
   missing_slices AS (
-      SELECT dimension_slice_id,
+      SELECT ch.hypertable_id,
+      	     di.id as dimension_id,


Suggest fixing indentation (probably a tabs vs spaces issue).

Ran untabify on the files.

erimatnor

Didn't look too much in detail at the test itself, but I have a high-level concern/comment.

If the previous update script didn't work for all cases, do we know that the update always failed? The case I am concerned about is a successful update without a full repair. In that case we need another repair for 2.0.0--2.0.1.

erimatnor · 2021-01-22T13:09:35Z

scripts/test_update_from_tag.sh

+# We need to run the post repair script to make sure that the
+# constraint is on the clean rerun as well since the setup script can
+# remove it.
+if [[ "${TEST_VERSION}" > "v6" ]] || [[ "${TEST_VERSION}" = "v6" ]]; then


Shouldn't the repair test be tied to specific TimescaleDB versions? I.e., we want to make sure that any TimescaleDB versions that proceed the repair update script will have the error, but no other. Is this handled by "v6"? I worry about the case when we also run repair tests for future upgrades where the issue no longer exists and the update script won't run (e.g., 2.0.1 -> 2.1.0)

This check just make sure that we restore the constraint if it was dropped. The constraint is always dropped in v6 or later, hence needs to be restored in v6 and later. Inside the repair setup, there is a check that we are upgrading from a version before 2.0.0, which guarantees that the latest repair is done. If we are upgrading from 2.0.1 to 2.1.0, no dimension slices will be dropped, but the foreign key constraint still need to be restored.

This is because there is no easy way to check what we're upgrading from and to, only what version we are currently running as.

Not sure I follow completely, but it sounds like the repair test activates only in cases we are upgrading from a pre 2.0 release.

One concern I still have is that we now do a separate CI run for the repair, which means we run all update tests from 2.0.1 to future releases multiple times, since the repair doesn't happen there. It seems somewhat unnecessary to run these update tests again. Or am I missing something?

Yes, repair tests are only activated when upgrading from pre-2.0 to 2.0. The check has to be updated when merging with 1.7 branch since repair tests are only running on 1.7.1 to later.

No, you got it right. Right now we run all update tests with and without repair because there are repairs that happen during update. For branches where there is no repair, we need to disable these tests, but that is a later problem. For the time being, we will at least upgrade from 1.7.x branch, and there will always be a repair done until 1.7 EOL.

mkindahl · 2021-01-22T13:59:06Z

Didn't look too much in detail at the test itself, but I have a high-level concern/comment.

If the previous update script didn't work for all cases, do we know that the update always failed? The case I am concerned about is a successful update without a full repair. In that case we need another repair for 2.0.0--2.0.1.

Both of the repairs pick out all dimension slices that are missing, but handle them incorrectly and can then trigger an error while processing them (this is what the changes in the commit do), so it does not seem likely that it can have missed some dimension slices.

We do not have any "post-update" checks in the update scripts though: is this something that we need to add? Right now we only have integrity checks in the update test post checks.

erimatnor

Approving, but have some concerns that we are running some update tests multiple times.

erimatnor · 2021-01-27T13:26:00Z

.github/workflows/update-test.yaml

@@ -27,9 +30,9 @@ jobs:
    - name: Checkout TimescaleDB
      uses: actions/checkout@v2

-    - name: Update tests ${{ matrix.pg }}
+    - name: Update tests ${{ matrix.pg }} ${{ matrix.kind }}}


Suggested change

- name: Update tests ${{ matrix.pg }} ${{ matrix.kind }}}

- name: Update tests ${{ matrix.pg }} ${{ matrix.kind }}

erimatnor · 2021-01-27T13:26:46Z

.github/workflows/update-test.yaml

        include:
          - pg: 11.10
            pg_major: 11
          - pg: 12.5
            pg_major: 12
+          - opt: "-r"


Do we need to run this for both PG versions? Maybe it is enough to just run with PG12?

There are optimizations to do, but I would prefer to defer that to a separate PR.

erimatnor · 2021-01-27T13:30:48Z

scripts/test_update_from_tag.sh

+# We need to run the post repair script to make sure that the
+# constraint is on the clean rerun as well since the setup script can
+# remove it.
+if [[ "${TEST_VERSION}" > "v6" ]] || [[ "${TEST_VERSION}" = "v6" ]]; then


Not sure I follow completely, but it sounds like the repair test activates only in cases we are upgrading from a pre 2.0 release.

One concern I still have is that we now do a separate CI run for the repair, which means we run all update tests from 2.0.1 to future releases multiple times, since the repair doesn't happen there. It seems somewhat unnecessary to run these update tests again. Or am I missing something?

gayyappan · 2021-01-27T18:58:23Z

.github/workflows/update-test.yaml

    runs-on: 'ubuntu-18.04'
    strategy:
      matrix:
        pg: ["11.10","12.5"]
+        opt: ["", "-r"]


I think restricting the repair tests to upgrades between 1.7.xx -> 2.0 , 2.0.1 is sufficient and we don't need to run this for the entire matrix. I don't mind deferring that change to a later PR. But do think it is necessary so that update tests don't unnecessarily consume too much time.

Agree. I picked this as an easy solution for now. We can tune it over time and should probably also avoid running a full test when we try the repair.

Also note that the actual breaking of the dimension slices only happen if you upgrade from pre-2.0. There are checks in the SQL files that control this.

gayyappan

Changes look good. I think this should have a follow up PR to restrict the repair tests to specific timescale version upgrades. It is not necessary to test the repair script against all TS versions to verify that it works as expected.

svenklemm

Since this is supposed to operate on different versions i dont think this should piggyback on the existing versioning in the update tests. I would suggest removing the toplevel matrix option and instead start the repair tests in scripts/test_updates_pg1[12].sh for the versions where it is needed

mkindahl · 2021-01-27T20:13:46Z

Since this is supposed to operate on different versions i dont think this should piggyback on the existing versioning in the update tests. I would suggest removing the toplevel matrix option and instead start the repair tests in scripts/test_updates_pg1[12].sh for the versions where it is needed

The repair runs on all updates from pre-2.0 updates to 2.0.x so even if the check is moved into the shell scripts, it will execute for all tests, with the exception of 2.0.0 to 2.0.1 (all others to 2.0.x). There are other optimizations that I think we should do to avoid duplicating tests, but I would prefer to take this as a separate PR since the extra test being executed is not a significant problem and we have changes that we want to have in 2.0.1.

The commit fixes two bugs in the repair scripts that could prevent an update in rare circumstances. For the 1.7.1--1.7.2 repair script: if there were several missing dimension slices in different hypertables with the same column name, the repair script would be confused on what constraint had what type and generate an error. For the 2.0.0-rc1--2.0.0-rc2 repair script: if a partition constraint was broken, it would generate an error rather than repairing the dimension slices because BIGINT_MIN would be cast to a double float and then an attempt would be made to cast it back to bigint, causing an overflow error. This commit also creates an update repair test that breaks a few tables for pre-2.0 versions to ensure that the repair script actually fixes them. The integrity check for the update tests already contain a check that dimension slices are valid, so there is no need to add a test for that. This commit adds an extra dimension in the workflow to test updates with repair and run that separately. It also changes the update test scripts to by default run without repair tests and add the additional option `-r` for running repair tests in addition to the normal tests. Fixes timescale#2824

@alex88

This maintenance release contains bugfixes since the 2.0.0 release. We deem it high priority for upgrading. In particular the fixes contained in this maintenance release address issues in continuous aggregates, compression, JOINs with hypertables and when upgrading from previous versions. **Bugfixes** * #2772 Always validate existing database and extension * #2780 Fix config enum entries for remote data fetcher * #2806 Add check for dropped chunk on update * #2828 Improve cagg watermark caching * #2838 Fix catalog repair in update script * #2842 Do not mark job as started when setting next_start field * #2845 Fix continuous aggregate privileges during upgrade * #2851 Fix nested loop joins that involve compressed chunks * #2860 Fix projection in ChunkAppend nodes * #2861 Remove compression stat update from update script * #2865 Apply volatile function quals at decompresschunk node * #2866 Avoid partitionwise planning of partialize_agg * #2868 Fix corruption in gapfill plan * #2874 Fix partitionwise agg crash due to uninitialized memory **Thanks** * @alex88 for reporting an issue with joined hypertables * @brian-from-quantrocket for reporting an issue with extension update and dropped chunks * @dhodyn for reporting an issue when joining compressed chunks * @markatosi for reporting a segfault with partitionwise aggregates enabled * @PhilippJust for reporting an issue with add_job and initial_start * @sgorsh for reporting an issue when using pgAdmin on windows * @WarriorOfWire for reporting the bug with gapfill queries not being able to find pathkey item to sort

mkindahl force-pushed the repair_script_tests branch 2 times, most recently from 25d220f to a705c96 Compare January 18, 2021 12:48

mkindahl force-pushed the repair_script_tests branch 7 times, most recently from d610732 to bb9de19 Compare January 20, 2021 11:26

mkindahl marked this pull request as ready for review January 20, 2021 11:39

mkindahl requested a review from a team as a code owner January 20, 2021 11:39

mkindahl requested review from pmwkaa, k-rus, svenklemm and erimatnor and removed request for a team January 20, 2021 11:39

mkindahl force-pushed the repair_script_tests branch 3 times, most recently from 1166dbf to 9be2f04 Compare January 21, 2021 08:55

svenklemm requested changes Jan 21, 2021

View reviewed changes

gayyappan reviewed Jan 21, 2021

View reviewed changes

erimatnor reviewed Jan 22, 2021

View reviewed changes

mkindahl force-pushed the repair_script_tests branch 3 times, most recently from 650a4e7 to fc74199 Compare January 25, 2021 09:55

mkindahl force-pushed the repair_script_tests branch 4 times, most recently from 8cb5ea0 to d2ca58e Compare January 27, 2021 10:52

mkindahl requested review from erimatnor, gayyappan and svenklemm January 27, 2021 11:00

erimatnor approved these changes Jan 27, 2021

View reviewed changes

gayyappan reviewed Jan 27, 2021

View reviewed changes

gayyappan approved these changes Jan 27, 2021

View reviewed changes

mkindahl force-pushed the repair_script_tests branch from d2ca58e to 2f6c8ad Compare January 27, 2021 19:12

svenklemm requested changes Jan 27, 2021

View reviewed changes

mkindahl requested a review from svenklemm January 27, 2021 20:13

svenklemm approved these changes Jan 28, 2021

View reviewed changes

mkindahl force-pushed the repair_script_tests branch from 2f6c8ad to 0ff29c3 Compare January 28, 2021 13:49

mkindahl merged commit c716325 into timescale:master Jan 28, 2021

mkindahl deleted the repair_script_tests branch January 28, 2021 14:04

mkindahl mentioned this pull request Jan 28, 2021

Fix repair in update scripts #2875

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add update test for repair table script #2838

Add update test for repair table script #2838

mkindahl commented Jan 18, 2021 •

edited

Loading

codecov bot commented Jan 18, 2021 •

edited

Loading

svenklemm left a comment

mkindahl commented Jan 21, 2021

gayyappan left a comment

gayyappan Jan 21, 2021

mkindahl Jan 22, 2021

erimatnor Jan 22, 2021

mkindahl Jan 22, 2021

erimatnor left a comment

erimatnor Jan 22, 2021 •

edited

Loading

mkindahl Jan 22, 2021

erimatnor Jan 27, 2021

mkindahl Jan 27, 2021

mkindahl commented Jan 22, 2021 •

edited

Loading

erimatnor left a comment

erimatnor Jan 27, 2021

erimatnor Jan 27, 2021

mkindahl Jan 27, 2021

erimatnor Jan 27, 2021

gayyappan Jan 27, 2021

mkindahl Jan 27, 2021

mkindahl Jan 27, 2021

gayyappan left a comment

svenklemm left a comment

mkindahl commented Jan 27, 2021

	- name: Update tests ${{ matrix.pg }} ${{ matrix.kind }}}
	- name: Update tests ${{ matrix.pg }} ${{ matrix.kind }}

Add update test for repair table script #2838

Add update test for repair table script #2838

Conversation

mkindahl commented Jan 18, 2021 • edited Loading

codecov bot commented Jan 18, 2021 • edited Loading

Codecov Report

svenklemm left a comment

Choose a reason for hiding this comment

mkindahl commented Jan 21, 2021

gayyappan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor left a comment

Choose a reason for hiding this comment

erimatnor Jan 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkindahl commented Jan 22, 2021 • edited Loading

erimatnor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gayyappan left a comment

Choose a reason for hiding this comment

svenklemm left a comment

Choose a reason for hiding this comment

mkindahl commented Jan 27, 2021

mkindahl commented Jan 18, 2021 •

edited

Loading

codecov bot commented Jan 18, 2021 •

edited

Loading

erimatnor Jan 22, 2021 •

edited

Loading

mkindahl commented Jan 22, 2021 •

edited

Loading