Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable joins for hierarchical continuous aggregates #5344

Merged
merged 1 commit into from Mar 28, 2023

Conversation

RafiaSabih
Copy link
Contributor

The joins could be between a continuous aggregate and hypertable, continuous aggregate and a regular Postgres table, and continuous aggregate and a regular Postgres view.

@github-actions
Copy link

@nikkhils, @akuzm: please review this pull request.

Powered by pull-review

@RafiaSabih RafiaSabih force-pushed the cagg_cagg_joins branch 2 times, most recently from c9c4631 to 377f788 Compare February 17, 2023 16:49
@codecov
Copy link

codecov bot commented Feb 17, 2023

Codecov Report

Merging #5344 (c9c4fe6) into main (72c0f5b) will decrease coverage by 0.01%.
The diff coverage is 90.90%.

@@            Coverage Diff             @@
##             main    #5344      +/-   ##
==========================================
- Coverage   90.73%   90.73%   -0.01%     
==========================================
  Files         229      229              
  Lines       53563    53584      +21     
==========================================
+ Hits        48602    48620      +18     
- Misses       4961     4964       +3     
Impacted Files Coverage Δ
tsl/src/continuous_aggs/create.c 89.15% <90.90%> (-0.02%) ⬇️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@RafiaSabih RafiaSabih force-pushed the cagg_cagg_joins branch 5 times, most recently from 88c37cc to c5c8109 Compare February 23, 2023 13:35
Comment on lines 1290 to 1293
// if (rte->relkind == RELKIND_VIEW || rte_other->relkind == RELKIND_VIEW)
// ereport(ERROR,
// (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
// errmsg("joins for hierarchical continuous aggregates are not supported")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

CHANGELOG.md Outdated
@@ -43,6 +43,7 @@ Sooner to that time, we will announce the specific version of TimescaleDB in whi
* #5343 Set PortalContext when starting job
* #5312 Add timeout support to the ping_data_node()
* #5212 Allow pushdown of reference table joins
* #5344 Enable joins for hierarchical continuous aggregates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* #5344 Enable joins for hierarchical continuous aggregates
* #5344 Enable JOINS for Hierarchical Continuous Aggregates

@@ -310,7 +310,8 @@ SELECT time_bucket(INTERVAL '1 day', day) AS bucket,
FROM conditions, devices_view
WHERE conditions.device_id = devices_view.device_id
GROUP BY name, bucket, devices_view.device_id;
ERROR: joins for hierarchical continuous aggregates are not supported
NOTICE: refreshing continuous aggregate "conditions_summary_daily_view"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that these statements need the mentioned feature - but don't we need real queries/plan-s to track that it works as expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @kgyrtkirk and perhaps we should add some JOIN tests in the caggs_on_caggs suite where we extensively test it checking plans, doing different time bucket data type tests... caggs on caggs validation tests... etc. WDYT @RafiaSabih ?

* #5343 Set PortalContext when starting job
* #5312 Add timeout support to the ping_data_node()
* #5212 Allow pushdown of reference table joins
* #5344 Enable JOINS for Hierarchical continuous aggregates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess your CHANGELOG entry is on the wrong place, it should reside on the Unreleased section at the beginning of the file.

@fabriziomello fabriziomello added continuous_aggregate core Team: Core Database enhancement An enhancement to an existing feature for functionality labels Mar 10, 2023
Comment on lines 9 to 13
\if :IS_JOIN
time_bucket(:BUCKET_WIDTH_1ST, "time") AS bucket,
SUM(temperature) AS temperature,
device_id
\else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insted of all this conditionals what about something like this in the tests that include cagg_on_cagg_common.sql

-- In the cagg_on_cagg.sql
\set EXTRA_GROUPBY_COLUMN
-- In the cagg_on_cagg_joins.sql
\set EXTRA_GROUPBY_COLUMN ,device_id

And then use it in the query in cagg_on_cagg_common.sql:

Suggested change
\if :IS_JOIN
time_bucket(:BUCKET_WIDTH_1ST, "time") AS bucket,
SUM(temperature) AS temperature,
device_id
\else
time_bucket(:BUCKET_WIDTH_1ST, "time") AS bucket,
SUM(temperature) AS temperature
:EXTRA_GROUPBY_COLUMN

Doing that you'll reduce the log output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have included in the main schema itself as per the previous comment.

@fabriziomello
Copy link
Contributor

@RafiaSabih you also should add tests for distributed hypertables like we do in cagg_on_cagg_dist_ht.sql

@RafiaSabih RafiaSabih force-pushed the cagg_cagg_joins branch 2 times, most recently from a279300 to 5ed25ac Compare March 20, 2023 09:47
@RafiaSabih RafiaSabih force-pushed the cagg_cagg_joins branch 7 times, most recently from d5db146 to d9357db Compare March 22, 2023 11:58
Comment on lines 1305 to 1323
if (rte->lateral || rte_other->lateral ||
(rte->relkind == RELKIND_VIEW && ts_is_hypertable(rte_other->relid)) ||
(rte_other->relkind == RELKIND_VIEW && ts_is_hypertable(rte->relid)) ||
(rte->relkind != RELKIND_VIEW && rte_other->relkind != RELKIND_VIEW &&
(ts_is_hypertable(rte->relid) == ts_is_hypertable(rte_other->relid))))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("invalid continuous aggregate view"),
errdetail("from clause can only have one hypertable and one normal table")));
errdetail("views, lateral, multiple hypertables or normal tables"
" are not supported in FROM clause")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is a bit hard to follow... isn't better to split it in separated checks and with specific message and test case for each of them??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, I have broken it in multiple conditions. Regarding the test, they are already covered.

The joins could be between a continuous aggregate and hypertable,
continuous aggregate and a regular Postgres table,
and continuous aggregate and a regular Postgres view.
@konskov
Copy link
Contributor

konskov commented Mar 24, 2023

looks like there is a spelling error in the commit message (heirarchical instead of hierarchical)?

Copy link
Contributor

@konskov konskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the extensive tests, please remember to fix the commit message before merging

Comment on lines +2481 to +2491
foreach (l, inp->final_userquery->jointree->fromlist)
{
Node *jtnode = (Node *) lfirst(l);
JoinExpr *join = NULL;
if (IsA(jtnode, JoinExpr))
{
join = castNode(JoinExpr, jtnode);
RangeTblEntry *jrte = rt_fetch(join->rtindex, inp->final_userquery->rtable);
rte->joinaliasvars = jrte->joinaliasvars;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is needed?? Can u add a comment here. I also asked for the same in the other PR https://github.com/timescale/timescaledb/pull/5441/files#r1150565424

Copy link
Contributor

@fabriziomello fabriziomello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RafiaSabih RafiaSabih merged commit 98218c1 into timescale:main Mar 28, 2023
48 checks passed
@RafiaSabih RafiaSabih deleted the cagg_cagg_joins branch March 28, 2023 13:13
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 12, 2023
This release includes these noteworthy features:
* compressed hypertable enhancements:
  * UPDATE/DELETE support
  * ON CONFLICT DO UPDATE
* Join support for hierarchical Continougs Aggregates
* performance improvements

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 17, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
@kgyrtkirk kgyrtkirk mentioned this pull request May 17, 2023
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* #5212 Allow pushdown of reference table joins
* #5221 Improve Realtime Continuous Aggregate performance
* #5252 Improve unique constraint support on compressed hypertables
* #5339 Support UPDATE/DELETE on compressed hypertables
* #5344 Enable JOINS for Hierarchical Continuous Aggregates
* #5361 Add parallel support for partialize_agg()
* #5417 Refactor and optimize distributed COPY
* #5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* #5547 Skip Ordered Append when only 1 child node is present
* #5510 Propagate vacuum/analyze to compressed chunks
* #5584 Reduce decompression during constraint checking
* #5530 Optimize compressed chunk resorting
* #5639 Support sending telemetry event reports

**Bugfixes**
* #5396 Fix SEGMENTBY columns predicates to be pushed down
* #5427 Handle user-defined FDW options properly
* #5442 Decompression may have lost DEFAULT values
* #5459 Fix issue creating dimensional constraints
* #5570 Improve interpolate error message on datatype mismatch
* #5573 Fix unique constraint on compressed tables
* #5615 Add permission checks to run_job()
* #5614 Enable run_job() for telemetry job
* #5578 Fix on-insert decompression after schema changes
* #5613 Quote username identifier appropriately
* #5525 Fix tablespace for compressed hypertable and corresponding toast
* #5642 Fix ALTER TABLE SET with normal tables
* #5666 Reduce memory usage for distributed analyze
* #5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants