Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add invalidations for incomplete aggregates #2403

Merged
merged 1 commit into from Sep 25, 2020

Conversation

WireBaron
Copy link
Contributor

As part of the 2.0 continous aggregate changes, we are removing the
continuous_aggs_completed_threshold table. However, this may result
in currently running aggregates being considered complete even if
their completed threshold hadn't reached the invalidation threshold.
This change fixes this by adding an entry to the invalidation log
for any such aggregates.

Fixes #2314

@WireBaron WireBaron requested review from a team, pmwkaa, k-rus and gayyappan and removed request for a team September 15, 2020 23:59
@erimatnor
Copy link
Contributor

@WireBaron Is it possible to test this somehow? Perhaps in an update script test that compares refreshes in a new region on an updated instance vs a freshly installed one?

@@ -303,6 +303,14 @@ SELECT pg_catalog.pg_extension_config_dump('_timescaledb_config.bgw_job', 'WHERE
GRANT SELECT ON _timescaledb_config.bgw_job TO PUBLIC;
GRANT SELECT ON _timescaledb_config.bgw_job_id_seq TO PUBLIC;

-- capture invalidations for any caggs which are not yet complete
INSERT INTO _timescaledb_catalog.continuous_aggs_materialization_invalidation_log
SELECT materialization_id, BIGINT '-9223372036854775808', complete.watermark, invalid.watermark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new invalidated region is [complete.watermark, infinity]?

Copy link
Contributor

@gayyappan gayyappan Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need a test for this in test/sql/updates.

@codecov
Copy link

codecov bot commented Sep 18, 2020

Codecov Report

Merging #2403 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2403      +/-   ##
==========================================
- Coverage   90.10%   90.10%   -0.01%     
==========================================
  Files         213      213              
  Lines       34336    34356      +20     
==========================================
+ Hits        30940    30956      +16     
- Misses       3396     3400       +4     
Impacted Files Coverage Δ
src/process_utility.c 93.70% <100.00%> (+0.04%) ⬆️
tsl/src/continuous_aggs/invalidation.c 98.18% <100.00%> (+0.02%) ⬆️
src/loader/bgw_message_queue.c 84.51% <0.00%> (-3.23%) ⬇️
src/utils.h 71.42% <0.00%> (+26.98%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ecfefc...dc07552. Read the comment docs.

Comment on lines 309 to 313
INSERT INTO inval_test
SELECT tm.tval, loc.lval, temp
FROM (SELECT generate_series(now() + '30 seconds'::interval, now() + '4 hours'::interval, '30 seconds') AS tval) tm
CROSS JOIN (select 'loc_' || generate_series(1,3,1) AS lval) loc,
generate_series(70.0, 90.0, 1440) temp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this query does not look right, the now() will guarantee that this test will be flaky and the generate_series(70.0, 90.0, 1440) just returns a single row so you could just replace it by 70.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above about now() just being used to generate a set number of rows above the invalidation threshold. If you still think this will be flaky, I can change this to an arbitrarily far in the future data. Fixed the generate series.

Comment on lines 271 to 275
INSERT INTO inval_test
SELECT tm.tval, loc.lval, temp
FROM (SELECT generate_series(now() - '4 hours'::interval, now(), '30 seconds') AS tval) tm
CROSS JOIN (select 'loc_' || generate_series(1,3,1) AS lval) loc,
generate_series(50.0, 70.0, 1443) temp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this query does not look right, the now() will guarantee that this test will be flaky and the generate_series(50.0, 90.0, 1443) just returns a single row so you could just replace it by 50.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using now() here to ensure that the data exists within the refresh interval for the continuous agg we're about to commit. I suppose I could just set the refresh window to be near infinite, but in this case it should be safe to use now() since all we care about is the number of rows created. The generate_series is definitely wrong though, fixed it to 20.0 / 1443.

WITH ( timescaledb.continuous, timescaledb.materialized_only=true,
timescaledb.refresh_lag='-2 hour',
timescaledb.ignore_invalidation_older_than = '24 hour',
timescaledb.refresh_interval='20 minutes' )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better to set max_interval_per_job to a big value if the intent here is to get all the data processed by the REFRESH. e.g. max_interval_per_job of 72 hours etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this parameter.


REFRESH MATERIALIZED VIEW mat_inval;

INSERT INTO inval_test
Copy link
Contributor

@gayyappan gayyappan Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time range is not in the original data set . So don't think you have any invalidations for the last completed threshold range.
The inserts here should be of 2 kinds.

  1. generate invalidations: a subset of the range from the insert at line 271.
  2. New data (if you are testing this case as well) for a new range like at line 311.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually intended to all be new data. I believe we already have other tests for existing invalidations, what this test is intended to verify is that we generate a new invalidation that covers new data when updating to 2.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the refresh and query post upgrade done int test-rerun.v6.sql?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I needed to move that from the post script to the rerun script, as the post script isn't called on the rerun container (the post was just run on the upgraded container and a dump/restore of the upgraded container, both of which had the wrong data and hence no diff).

@WireBaron WireBaron marked this pull request as draft September 22, 2020 22:36
@WireBaron WireBaron force-pushed the cagg_comp_update branch 2 times, most recently from df6120d to 84fc706 Compare September 22, 2020 23:52
@WireBaron WireBaron marked this pull request as ready for review September 23, 2020 00:00
Copy link
Contributor

@erimatnor erimatnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving.

The update test is a little bit hard to follow, but I assume you have tried the update test without adding the invalidation during upgrade with a failed result?


REFRESH MATERIALIZED VIEW mat_inval;

INSERT INTO inval_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the refresh and query post upgrade done int test-rerun.v6.sql?

@@ -263,3 +263,49 @@ BEGIN
END $$;

REFRESH MATERIALIZED VIEW mat_ignoreinval;

-- test proper invalidations after update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you modify the comment here? This test case is for inserts after a REFRESH. (We have been using the term invalidations to refer to changes to the raw data that would invalidate a previously computed materialization). These inserts (lines 309-311) are not logged into the invalidation_log catalog tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's confusing as the terminology has slightly changed in 2.0 (we now also have invalidations for regions that have never been refreshed). I'm going with test new data beyond the invalidation threshold is properly handled here, let me know if that works.

@@ -303,6 +303,11 @@ SELECT pg_catalog.pg_extension_config_dump('_timescaledb_config.bgw_job', 'WHERE
GRANT SELECT ON _timescaledb_config.bgw_job TO PUBLIC;
GRANT SELECT ON _timescaledb_config.bgw_job_id_seq TO PUBLIC;

-- set up future invalidation for current caggs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- set up future invalidation for current caggs
-- Add entry to materialization invalidation log to indicate that [watermark, +infinity) is invalid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -303,6 +303,11 @@ SELECT pg_catalog.pg_extension_config_dump('_timescaledb_config.bgw_job', 'WHERE
GRANT SELECT ON _timescaledb_config.bgw_job TO PUBLIC;
GRANT SELECT ON _timescaledb_config.bgw_job_id_seq TO PUBLIC;

-- set up future invalidation for current caggs
INSERT INTO _timescaledb_catalog.continuous_aggs_materialization_invalidation_log
SELECT materialization_id, BIGINT '-9223372036854775808', watermark, 9223372036854775807
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a new cagg that has been never refreshed , do we need an entry in the cont agg materialization log that says [-infinity, +infinity) is invalid? (cc @erimatnor )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like that's pretty tight window. I believe the background job gets launched as soon as we create the materialized view in 1.7.

@WireBaron WireBaron force-pushed the cagg_comp_update branch 4 times, most recently from 1c8e07e to 2d90304 Compare September 24, 2020 23:41
@WireBaron
Copy link
Contributor Author

I did add a new invalidation for (-inf, inf) when updating 1.7 caggs that have no completion_threshold. Unfortunately I was unable to add a test to reliably demonstrate the correctness of this change in the update test.

@erimatnor
Copy link
Contributor

One question was whether we should add an invalidation for [-Inf, ignore_invalidation_older_than] to allow refreshing of the region where invalidations previously had been ignored. Need not be in this PR, though.

@WireBaron
Copy link
Contributor Author

One question was whether we should add an invalidation for [-Inf, ignore_invalidation_older_than] to allow refreshing of the region where invalidations previously had been ignored. Need not be in this PR, though.

Given how long this has been in review, I'm going to go ahead and push this now that it's finally cleared to go. I'll add a new issue for this older region though.

As part of the 2.0 continous aggregate changes, we are removing the
continuous_aggs_completed_threshold table.  However, this may result
in currently running aggregates being considered complete even if
their completed threshold hadn't reached the invalidation threshold.
This change fixes this by adding an entry to the invalidation log
for any such aggregates.

Fixes timescale#2314
@WireBaron WireBaron merged commit e793082 into timescale:master Sep 25, 2020
@WireBaron WireBaron deleted the cagg_comp_update branch September 25, 2020 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add invalidation from completed_threshold and onward during upgrade
4 participants