Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass join related structs to the cagg rte #5441

Merged
merged 1 commit into from Apr 13, 2023

Conversation

RafiaSabih
Copy link
Contributor

@RafiaSabih RafiaSabih commented Mar 14, 2023

In case of joins in the continuous aggregates, pass the required structs to the new rte created. These values are required by the planner to finally query the materialized view.

Fixes #5433, https://github.com/timescale/Support-Dev-Collab/issues/886

@RafiaSabih RafiaSabih self-assigned this Mar 14, 2023
@github-actions
Copy link

@konskov, @svenklemm: please review this pull request.

Powered by pull-review

Copy link
Contributor

@fabriziomello fabriziomello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add an update step to rebuild caggs definitions in order to fix already stored QueryTree in pg_rewrite catalog table.

@codecov
Copy link

codecov bot commented Mar 17, 2023

Codecov Report

Merging #5441 (f11b704) into main (09565ac) will decrease coverage by 0.09%.
The diff coverage is 49.39%.

@@            Coverage Diff             @@
##             main    #5441      +/-   ##
==========================================
- Coverage   90.48%   90.40%   -0.09%     
==========================================
  Files         229      229              
  Lines       53870    53926      +56     
==========================================
+ Hits        48746    48750       +4     
- Misses       5124     5176      +52     
Impacted Files Coverage Δ
tsl/src/continuous_aggs/create.c 84.68% <49.39%> (-1.35%) ⬇️

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@RafiaSabih RafiaSabih force-pushed the cagg_joins_fix branch 9 times, most recently from 002b73c to aa23684 Compare March 22, 2023 11:08
@RafiaSabih RafiaSabih force-pushed the cagg_joins_fix branch 2 times, most recently from a243b75 to 7cf1ba8 Compare March 24, 2023 12:27
Comment on lines 47 to 61
IF ts_version >= '2.10.0' THEN
CREATE PROCEDURE _timescaledb_internal.post_update_cagg_try_repair(
cagg_view REGCLASS, force_rebuild boolean
) AS '@MODULE_PATHNAME@', 'ts_cagg_try_repair' LANGUAGE C;
END IF;
FOR vname, materialized_only IN select format('%I.%I', cagg.user_view_schema, cagg.user_view_name)::regclass, cagg.materialized_only from _timescaledb_catalog.continuous_agg cagg
LOOP
IF ts_version >= '2.10.0' THEN
SET log_error_verbosity TO VERBOSE;
CALL _timescaledb_internal.post_update_cagg_try_repair(vname, true);
END IF;
END LOOP;
IF ts_version >= '2.10.0' THEN
DROP PROCEDURE IF EXISTS _timescaledb_internal.post_update_cagg_try_repair;
END IF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IF ts_version >= '2.10.0' THEN
CREATE PROCEDURE _timescaledb_internal.post_update_cagg_try_repair(
cagg_view REGCLASS, force_rebuild boolean
) AS '@MODULE_PATHNAME@', 'ts_cagg_try_repair' LANGUAGE C;
END IF;
FOR vname, materialized_only IN select format('%I.%I', cagg.user_view_schema, cagg.user_view_name)::regclass, cagg.materialized_only from _timescaledb_catalog.continuous_agg cagg
LOOP
IF ts_version >= '2.10.0' THEN
SET log_error_verbosity TO VERBOSE;
CALL _timescaledb_internal.post_update_cagg_try_repair(vname, true);
END IF;
END LOOP;
IF ts_version >= '2.10.0' THEN
DROP PROCEDURE IF EXISTS _timescaledb_internal.post_update_cagg_try_repair;
END IF;
IF ts_version >= '2.10.0' THEN
CREATE PROCEDURE _timescaledb_internal.post_update_cagg_try_repair(
cagg_view REGCLASS, force_rebuild BOOLEAN
) AS '@MODULE_PATHNAME@', 'ts_cagg_try_repair' LANGUAGE C;
FOR vname, materialized_only IN select format('%I.%I', cagg.user_view_schema, cagg.user_view_name)::regclass, cagg.materialized_only from _timescaledb_catalog.continuous_agg cagg
LOOP
IF ts_version >= '2.10.0' THEN
SET log_error_verbosity TO VERBOSE;
CALL _timescaledb_internal.post_update_cagg_try_repair(vname, true);
END IF;
END LOOP;
DROP PROCEDURE IF EXISTS _timescaledb_internal.post_update_cagg_try_repair(REGCLASS, BOOLEAN);
END IF;

Comment on lines 191 to 194
\if :has_cagg_joins
\ir setup.repair.cagg_joins.sql
\endif
\ir setup.repair.cagg.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
\if :has_cagg_joins
\ir setup.repair.cagg_joins.sql
\endif
\ir setup.repair.cagg.sql
\ir setup.repair.cagg.sql
\if :has_cagg_joins
\ir setup.repair.cagg_joins.sql
\endif

We're adding a new repair test so make sense to be added at the end.

Comment on lines 18 to 189
);

-- Break the first time dimension on each table. These are different
-- depending on the time type for the table and we need to check all
-- versions.
DELETE FROM _timescaledb_catalog.dimension_slice WHERE id IN (
SELECT dimension_slice_id FROM slices
WHERE hypertable = 'repair_test_int'::regclass AND column_name = 'time'
ORDER BY dimension_slice_id LIMIT 1
);

DELETE FROM _timescaledb_catalog.dimension_slice WHERE id IN (
SELECT dimension_slice_id FROM slices
WHERE hypertable = 'repair_test_timestamp'::regclass AND column_name = 'time'
ORDER BY dimension_slice_id LIMIT 1
);

DELETE FROM _timescaledb_catalog.dimension_slice WHERE id IN (
SELECT dimension_slice_id FROM slices
WHERE hypertable = 'repair_test_timestamptz'::regclass AND column_name = 'time'
ORDER BY dimension_slice_id LIMIT 1
);

DELETE FROM _timescaledb_catalog.dimension_slice WHERE id IN (
SELECT dimension_slice_id FROM slices
WHERE hypertable = 'repair_test_date'::regclass AND column_name = 'time'
ORDER BY dimension_slice_id LIMIT 1
);

-- Delete all dimension slices for one table to break it seriously. It
-- should still be repaired.
DELETE FROM _timescaledb_catalog.dimension_slice WHERE id IN (
SELECT dimension_slice_id FROM slices
WHERE hypertable = 'repair_test_extra'::regclass
);

-- Break the partition constraints on some of the tables. The
-- partition constraints look the same in all tables so we create a
-- mix of tables with no missing partition constraint slices, just one
-- missing partition constraint dimension slice, and several missing
-- partition constraint dimension slices.
DELETE FROM _timescaledb_catalog.dimension_slice WHERE id IN (
SELECT dimension_slice_id FROM slices
WHERE hypertable = 'repair_test_timestamp'::regclass AND column_name = 'tag'
ORDER BY dimension_slice_id LIMIT 1
);

DELETE FROM _timescaledb_catalog.dimension_slice WHERE id IN (
SELECT dimension_slice_id FROM slices
WHERE hypertable = 'repair_test_date'::regclass AND column_name = 'tag'
ORDER BY dimension_slice_id
);

\echo **** Expected repairs ****
WITH unparsed_slices AS (
SELECT dimension_id,
dimension_slice_id,
hypertable,
constraint_name,
column_type,
column_name,
(SELECT SUBSTRING(constraint_expr, $$>=\s*'?([\w\d\s:+-]+)'?$$)) AS range_start,
(SELECT SUBSTRING(constraint_expr, $$<\s*'?([\w\d\s:+-]+)'?$$)) AS range_end
FROM slices
)
SELECT DISTINCT
dimension_slice_id,
dimension_id,
CASE
WHEN column_type = 'timestamptz'::regtype THEN
EXTRACT(EPOCH FROM range_start::timestamptz)::bigint * 1000000
WHEN column_type = 'timestamp'::regtype THEN
EXTRACT(EPOCH FROM range_start::timestamp)::bigint * 1000000
WHEN column_type = 'date'::regtype THEN
EXTRACT(EPOCH FROM range_start::date)::bigint * 1000000
ELSE
CASE
WHEN range_start IS NULL
THEN (-9223372036854775808)::bigint
ELSE range_start::bigint
END
END AS range_start,
CASE
WHEN column_type = 'timestamptz'::regtype THEN
EXTRACT(EPOCH FROM range_end::timestamptz)::bigint * 1000000
WHEN column_type = 'timestamp'::regtype THEN
EXTRACT(EPOCH FROM range_end::timestamp)::bigint * 1000000
WHEN column_type = 'date'::regtype THEN
EXTRACT(EPOCH FROM range_end::date)::bigint * 1000000
ELSE
CASE WHEN range_end IS NULL
THEN 9223372036854775807::bigint
ELSE range_end::bigint
END
END AS range_end
FROM unparsed_slices
WHERE dimension_slice_id NOT IN (SELECT id FROM _timescaledb_catalog.dimension_slice);

DROP VIEW slices;
\endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO don't need to indent this code... it just make the review harder due to a big diff output

Copy link
Contributor

@fabriziomello fabriziomello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor reviews but LGTM. Approved!

Comment on lines 1 to 23
-- This file and its contents are licensed under the Apache License 2.0.
-- Please see the included NOTICE for copyright information and
-- LICENSE-APACHE for a copy of the license.

SELECT current_setting('server_version_num')::int >= 130000 AS has_cagg_join_using \gset
\d+ cagg_joins_upgrade_test_with_realtime
SELECT * FROM cagg_joins_upgrade_test_with_realtime ORDER BY bucket;
\d+ cagg_joins_upgrade_test
SELECT * FROM cagg_joins_upgrade_test ORDER BY bucket;
\d+ cagg_joins_where
SELECT * FROM cagg_joins_where ORDER BY bucket;
\if :has_cagg_join_using
\d+ cagg_joins_upgrade_test_with_realtime_using
SELECT * FROM cagg_joins_upgrade_test_with_realtime_using ORDER BY bucket;
\d+ cagg_joins_upgrade_test_using
SELECT * FROM cagg_joins_upgrade_test_using ORDER BY bucket;
\endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indentation is strange. Hard to read and understand.

test/sql/updates/post.repair.sql Show resolved Hide resolved
WHERE ht_cagg_joins.city = nt_cagg_joins.location
GROUP BY 1,3;

-- Only test joins with cusing clause for postgresql versions above 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- Only test joins with cusing clause for postgresql versions above 12
-- Only test joins with using clause for postgresql versions above 12

Comment on lines 10 to 23
CREATE TABLE repair_test_int(time integer not null, temp float8, tag integer, color integer);
CREATE TABLE repair_test_timestamptz(time timestamptz not null, temp float8, tag integer, color integer);
CREATE TABLE repair_test_extra(time timestamptz not null, temp float8, tag integer, color integer);
CREATE TABLE repair_test_timestamp(time timestamp not null, temp float8, tag integer, color integer);
CREATE TABLE repair_test_date(time date not null, temp float8, tag integer, color integer);

-- We only break the dimension slice table if there is repair that is
-- going to be done, but we create the tables regardless so that we
-- can compare the databases.
SELECT create_hypertable('repair_test_int', 'time', 'tag', 2, chunk_time_interval => '3'::bigint);
SELECT create_hypertable('repair_test_timestamptz', 'time', 'tag', 2, chunk_time_interval => '1 day'::interval);
SELECT create_hypertable('repair_test_extra', 'time', 'tag', 2, chunk_time_interval => '1 day'::interval);
SELECT create_hypertable('repair_test_timestamp', 'time', 'tag', 2, chunk_time_interval => '1 day'::interval);
SELECT create_hypertable('repair_test_date', 'time', 'tag', 2, chunk_time_interval => '1 day'::interval);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change the indentation of the code, it just makes the review harder. The \if should be used sparingly, so readability should not be an issue in these cases. Now I have no chance of seeing if you have made any other changes to the code apart from changing the indentation.

FROM pg_extension
WHERE extname = 'timescaledb' \gset

\if :test_repair_dimension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you excluding the repair tables when you are not testing the dimension repair? The comment says that the tables are created regardless, but that they are only "broken" if repair should be tested.

tsl/src/continuous_aggs/create.c Show resolved Hide resolved
Comment on lines +3099 to +3109
if (!finalized)
mattablecolumninfo_addinternal(&mattblinfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only when the table is not finalized? Please add a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also part of existing code not something done in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the previous line did not have that check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean for mattablecomuninfo_init?
The same sequence of steps are done in cagg_create(), the explicit comments are also there.
Basically we need internal column info only when not finalized, mattbl will there in any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RafiaSabih we've changed the behavior of the cagg_rebuild_view_definition because before it rebuild just then the view is in the old format (aka finalized=false). Now we enable rebuild the view definition for both finalized and not finalized caggs and also added a parameter to force the rebuild... so is good to add comments about it.

view_query = finalizequery_get_select_query(&fqi,
mattblinfo.matcollist,
&mataddress,
mat_ht->fd.table_name.data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
mat_ht->fd.table_name.data);
NameStr(mat_ht->fd.table_name));

tsl/src/continuous_aggs/create.c Show resolved Hide resolved
MAX(temperature),
MIN(temperature),
name
FROM conditions JOIN devices ON conditions.device_id = devices.device_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we support joins with several conditions? If so, should you add a test for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment, this is a test for caggs with joins with more conditions. What we support is additional condition in where clause. Did you mean to add this also in repair test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking about joins with multiple ON conditions, for example, FROM conditions JOIN devices ON conditions.device_id = devices.device_id AND conditions.building_id = devices.building_id. Since this is a quite drastic change and the checks are not easy to get right, it would be good to ensure that we haven't missed the obvious cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkindahl currently we support only 1 (one) join condition... for explicit INNER JOIN extra conditions should be placed into the WHERE clause, but not for JOIN...

@RafiaSabih would be good to add a failing test like FROM conditions JOIN devices ON conditions.device_id = devices.device_id WHERE conditions.building_id = devices.building_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple ON conditions is not supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then it would be good to have a test, as @fabriziomello mention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i have added that test in regression suite.

Copy link
Contributor

@mkindahl mkindahl Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where? I do not see a cagg with several join expressions in this pull request.

Never mind, found it.

@RafiaSabih RafiaSabih force-pushed the cagg_joins_fix branch 4 times, most recently from e3bc6ff to e30d8b3 Compare April 12, 2023 12:55
Copy link
Contributor

@fabriziomello fabriziomello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary formatting changes in test/sql/updates/setup.repair.sql

@RafiaSabih RafiaSabih force-pushed the cagg_joins_fix branch 2 times, most recently from f80400c to a34375f Compare April 12, 2023 17:03
@@ -55,10 +63,10 @@ INSERT INTO repair_test_date VALUES
-- repaired properly, we will notice them when restoring the
-- constraint.
ALTER TABLE _timescaledb_catalog.chunk_constraint
DROP CONSTRAINT chunk_constraint_dimension_slice_id_fkey;
DROP CONSTRAINT chunk_constraint_dimension_slice_id_fkey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert unnecessary formating!


CREATE VIEW slices AS (
SELECT ch.hypertable_id,
SELECT ch.hypertable_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert unnecessary formating!

@@ -143,6 +152,7 @@ WITH unparsed_slices AS (
(SELECT SUBSTRING(constraint_expr, $$<\s*'?([\w\d\s:+-]+)'?$$)) AS range_end
FROM slices
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert unnecessary formating!

Comment on lines +188 to +189
WHERE conditions.city = devices.location AND
conditions.temperature > 28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should error out, no??? because you're adding another JOIN condition in the WHERE clause or am I missing something?

In case of joins in the continuous aggregates, pass the required
structs to the new rte created. These values are required by the
planner to finally query the materialized view.

Fixes timescale#5433
@RafiaSabih RafiaSabih enabled auto-merge (rebase) April 13, 2023 02:41
@RafiaSabih RafiaSabih merged commit 3f9cb3c into timescale:main Apr 13, 2023
45 of 50 checks passed
@timescale-automation
Copy link

Automated backport to 2.10.x not done: cherry-pick failed.

Git status

HEAD detached at origin/2.10.x
You are currently cherry-picking commit 3f9cb3c2.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   CHANGELOG.md
	modified:   sql/updates/post-update.sql
	new file:   test/sql/updates/post.repair.cagg_joins.sql
	modified:   test/sql/updates/post.repair.sql
	new file:   test/sql/updates/setup.repair.cagg_joins.sql
	modified:   test/sql/updates/setup.repair.sql
	modified:   tsl/test/sql/cagg_joins.sql.in

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   scripts/test_updates_pg13.sh
	both modified:   scripts/test_updates_pg14.sh
	both modified:   scripts/test_updates_pg15.sh
	both modified:   tsl/src/continuous_aggs/create.c
	both modified:   tsl/test/expected/cagg_joins-12.out
	both modified:   tsl/test/expected/cagg_joins-13.out
	both modified:   tsl/test/expected/cagg_joins-14.out
	both modified:   tsl/test/expected/cagg_joins-15.out


Job log

@timescale-automation timescale-automation added the auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) label Apr 13, 2023
@RafiaSabih RafiaSabih deleted the cagg_joins_fix branch April 13, 2023 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) backported-2.10.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: XX000 - ERROR: unrecognized lock mode: 0
5 participants