Continuous Aggregates finals form #4269

fabriziomello · 2022-04-22T22:10:34Z

Following work started by #4294 to improve performance of Continuous
Aggregates by removing the re-aggregation in the user view.

This PR get rid of partialize_agg and finalize_agg aggregate
functions and store the finalized aggregated (plain) data in the
materialization hypertable.

Because we're not storing partials anymore and removed the
re-aggregation, now is be possible to create indexes on aggregated
columns in the materialization hypertable in order to improve the
performance even more.

Also removed restrictions on types of aggregates users can perform
with Continuous Aggregates:

aggregates with DISTINCT
aggregates with FILTER
aggregates with FILTER in HAVING clause
aggregates without combine function
ordered-set aggregates
hypothetical-set aggregates

By default new Continuous Aggregates will be created using this new
format, but the previous version (with partials) will be supported.

Users can create the previous style by setting to false the storage
paramater named timescaledb.finalized during the creation of the
Continuous Aggregate.

Fixes #4233

codecov · 2022-04-25T13:40:47Z

Codecov Report

Merging #4269 (b957afd) into main (bb241ff) will increase coverage by 0.12%.
The diff coverage is 95.58%.

@@            Coverage Diff             @@
##             main    #4269      +/-   ##
==========================================
+ Coverage   90.68%   90.80%   +0.12%     
==========================================
  Files         215      217       +2     
  Lines       39704    40107     +403     
==========================================
+ Hits        36005    36421     +416     
+ Misses       3699     3686      -13

Impacted Files	Coverage Δ
src/chunk.h	`100.00% <ø> (ø)`
src/compat/compat.h	`94.73% <ø> (ø)`
src/init.c	`97.05% <ø> (+5.39%)`	⬆️
src/nodes/chunk_append/chunk_append.c	`97.94% <ø> (ø)`
src/planner/add_hashagg.c	`54.28% <ø> (ø)`
src/planner/agg_bookend.c	`93.16% <ø> (ø)`
src/planner/expand_hypertable.c	`93.94% <ø> (ø)`
src/planner/partialize.c	`97.91% <ø> (ø)`
src/planner/planner.h	`100.00% <ø> (ø)`
tsl/src/fdw/relinfo.c	`96.83% <ø> (-0.02%)`	⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ddd0292...b957afd. Read the comment docs.

mkindahl

Please remove the backquotes from the title line. I do not think it is a big issue, but we decided to not use those in the title line, while using in the body is fine.

@pdipesh02

Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269) that store the final aggregation state instead of the byte array of the partial aggregate state, offering multiple opportunities of optimizations as well a more compact form. In 2.10.0 released on February 2023 the Continuous Aggregate old format deprecation was announced. With this PR the ability of creating Continuous Aggregate in the old format was removed, but we still support migrate from the old to the new format by running the `cagg_migrate` procedure. This is the continuation of the PR timescale#5977 started by @pdipesh02. References: https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/ https://github.com/timescale/timescaledb/releases/tag/2.10.0 https://github.com/timescale/timescaledb/releases/tag/2.7.0 timescale#5977

@pdipesh02

Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269) that store the final aggregation state instead of the byte array of the partial aggregate state, offering multiple opportunities of optimizations as well a more compact form. In 2.10.0 released on February 2023 the Continuous Aggregate old format deprecation was announced. With this PR the ability of creating Continuous Aggregate in the old format was removed, but we still support migrate from the old to the new format by running the `cagg_migrate` procedure. This is the continuation of the PR timescale#5977 started by @pdipesh02. References: https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/ https://github.com/timescale/timescaledb/releases/tag/2.10.0 https://github.com/timescale/timescaledb/releases/tag/2.7.0 timescale#5977

@pdipesh02

Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269) that store the final aggregation state instead of the byte array of the partial aggregate state, offering multiple opportunities of optimizations as well a more compact form. In 2.10.0 released on February 2023 the Continuous Aggregate old format deprecation was announced. With this PR the ability of creating Continuous Aggregate in the old format was removed, but we still support migrate from the old to the new format by running the `cagg_migrate` procedure. This is the continuation of the PR timescale#5977 started by @pdipesh02. References: https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/ https://github.com/timescale/timescaledb/releases/tag/2.10.0 https://github.com/timescale/timescaledb/releases/tag/2.7.0 timescale#5977

@pdipesh02

Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269) that store the final aggregation state instead of the byte array of the partial aggregate state, offering multiple opportunities of optimizations as well a more compact form. In 2.10.0 released on February 2023 the Continuous Aggregate old format deprecation was announced. With this PR the ability of creating Continuous Aggregate in the old format was removed, but we still support migrate from the old to the new format by running the `cagg_migrate` procedure. This is the continuation of the PR timescale#5977 started by @pdipesh02. References: https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/ https://github.com/timescale/timescaledb/releases/tag/2.10.0 https://github.com/timescale/timescaledb/releases/tag/2.7.0 timescale#5977

@pdipesh02

Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269) that store the final aggregation state instead of the byte array of the partial aggregate state, offering multiple opportunities of optimizations as well a more compact form. In 2.10.0 released on February 2023 the Continuous Aggregate old format deprecation was announced. With this PR the ability of creating Continuous Aggregate in the old format was removed, but we still support migrate from the old to the new format by running the `cagg_migrate` procedure. This is the continuation of the PR timescale#5977 started by @pdipesh02. References: https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/ https://github.com/timescale/timescaledb/releases/tag/2.10.0 https://github.com/timescale/timescaledb/releases/tag/2.7.0 timescale#5977

@pdipesh02

Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269) that store the final aggregation state instead of the byte array of the partial aggregate state, offering multiple opportunities of optimizations as well a more compact form. In 2.10.0 released on February 2023 the Continuous Aggregate old format deprecation was announced. With this PR the ability of creating Continuous Aggregate in the old format was removed, but we still support migrate from the old to the new format by running the `cagg_migrate` procedure. This is the continuation of the PR timescale#5977 started by @pdipesh02. References: https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/ https://github.com/timescale/timescaledb/releases/tag/2.10.0 https://github.com/timescale/timescaledb/releases/tag/2.7.0 timescale#5977

@pdipesh02

Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269) that store the final aggregation state instead of the byte array of the partial aggregate state, offering multiple opportunities of optimizations as well a more compact form. In 2.10.0 released on February 2023 the Continuous Aggregate old format deprecation was announced. With this PR the ability of creating Continuous Aggregate in the old format was removed, but we still support migrate from the old to the new format by running the `cagg_migrate` procedure. This is the continuation of the PR timescale#5977 started by @pdipesh02. References: https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/ https://github.com/timescale/timescaledb/releases/tag/2.10.0 https://github.com/timescale/timescaledb/releases/tag/2.7.0 timescale#5977

@pdipesh02

Timescale 2.7 released a new version of Continuous Aggregate (#4269) that store the final aggregation state instead of the byte array of the partial aggregate state, offering multiple opportunities of optimizations as well a more compact form. In 2.10.0 released on February 2023 the Continuous Aggregate old format deprecation was announced. With this PR the ability of creating Continuous Aggregate in the old format was removed, but we still support migrate from the old to the new format by running the `cagg_migrate` procedure. This is the continuation of the PR #5977 started by @pdipesh02. References: https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/ https://github.com/timescale/timescaledb/releases/tag/2.10.0 https://github.com/timescale/timescaledb/releases/tag/2.7.0 #5977

Historically we preserve chunk metadata because the old format of the Continuous Aggregate has the `chunk_id` column in the materialization hypertable so in order to don't have chunk ids left over there we just mark it as dropped whe dropping chunks. In timescale#4269 we introduced a new Continuous Aggregate format that don't store the `chunk_id` in the materialization hypertable anymore so it's safe to also remove the metadata when dropping chunk and all associated Continuous Aggregates are in the new format. Also added a post-update SQL script to cleanup unecessary dropped chunk metadata in our catalog.

Historically we preserve chunk metadata because the old format of the Continuous Aggregate has the `chunk_id` column in the materialization hypertable so in order to don't have chunk ids left over there we just mark it as dropped whe dropping chunks. In timescale#4269 we introduced a new Continuous Aggregate format that don't store the `chunk_id` in the materialization hypertable anymore so it's safe to also remove the metadata when dropping chunk and all associated Continuous Aggregates are in the new format. Also added a post-update SQL script to cleanup unecessary dropped chunk metadata in our catalog. Fixes timescale#6570

Historically we preserve chunk metadata because the old format of the Continuous Aggregate has the `chunk_id` column in the materialization hypertable so in order to don't have chunk ids left over there we just mark it as dropped whe dropping chunks. In timescale#4269 we introduced a new Continuous Aggregate format that don't store the `chunk_id` in the materialization hypertable anymore so it's safe to also remove the metadata when dropping chunk and all associated Continuous Aggregates are in the new format. Also added a post-update SQL script to cleanup unecessary dropped chunk metadata in our catalog. Closes timescale#6570

Historically we preserve chunk metadata because the old format of the Continuous Aggregate has the `chunk_id` column in the materialization hypertable so in order to don't have chunk ids left over there we just mark it as dropped when dropping chunks. In timescale#4269 we introduced a new Continuous Aggregate format that don't store the `chunk_id` in the materialization hypertable anymore so it's safe to also remove the metadata when dropping chunk and all associated Continuous Aggregates are in the new format. Also added a post-update SQL script to cleanup unnecessary dropped chunk metadata in our catalog. Closes timescale#6570

Historically we preserve chunk metadata because the old format of the Continuous Aggregate has the `chunk_id` column in the materialization hypertable so in order to don't have chunk ids left over there we just mark it as dropped when dropping chunks. In #4269 we introduced a new Continuous Aggregate format that don't store the `chunk_id` in the materialization hypertable anymore so it's safe to also remove the metadata when dropping chunk and all associated Continuous Aggregates are in the new format. Also added a post-update SQL script to cleanup unnecessary dropped chunk metadata in our catalog. Closes #6570

Historically we preserve chunk metadata because the old format of the Continuous Aggregate has the `chunk_id` column in the materialization hypertable so in order to don't have chunk ids left over there we just mark it as dropped when dropping chunks. In #4269 we introduced a new Continuous Aggregate format that don't store the `chunk_id` in the materialization hypertable anymore so it's safe to also remove the metadata when dropping chunk and all associated Continuous Aggregates are in the new format. Also added a post-update SQL script to cleanup unnecessary dropped chunk metadata in our catalog. Closes #6570 (cherry picked from commit 5a359ac)

fabriziomello force-pushed the cagg_performance_finals_form branch 4 times, most recently from 6067dd1 to 04911fd Compare April 25, 2022 13:30

fabriziomello force-pushed the cagg_performance_finals_form branch 8 times, most recently from 9c2367b to 23955af Compare May 2, 2022 18:38

fabriziomello force-pushed the cagg_performance_finals_form branch from 23955af to e62566b Compare May 2, 2022 21:24

fabriziomello changed the title ~~Cagg performance finals form~~ Continuous Aggregates finals form May 2, 2022

fabriziomello self-assigned this May 2, 2022

fabriziomello added continuous_aggregate performance Team: Core Database labels May 2, 2022

fabriziomello added this to the TimescaleDB 2.7 milestone May 2, 2022

fabriziomello added the enhancement An enhancement to an existing feature for functionality label May 2, 2022

fabriziomello force-pushed the cagg_performance_finals_form branch 2 times, most recently from 9a7f0d2 to f4b1d3e Compare May 2, 2022 22:05

fabriziomello marked this pull request as ready for review May 2, 2022 22:13

fabriziomello requested a review from a team as a code owner May 2, 2022 22:13

fabriziomello requested review from afiskon and duncan-tsdb and removed request for a team May 2, 2022 22:13

mkindahl reviewed May 3, 2022

View reviewed changes

fabriziomello changed the title ~~Continuous Aggregates finals form~~ Continuous Aggregates finals form May 3, 2022

fabriziomello mentioned this pull request Feb 8, 2024

Remove metadata when dropping chunk #6621

Merged

timescale-automation mentioned this pull request Feb 28, 2024

Backport to 2.14.x: #6621: Remove metadata when dropping chunk #6711

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous Aggregates finals form #4269

Continuous Aggregates finals form #4269

fabriziomello commented Apr 22, 2022 •

edited

Loading

codecov bot commented Apr 25, 2022 •

edited

Loading

mkindahl left a comment

Continuous Aggregates finals form #4269

Continuous Aggregates finals form #4269

Conversation

fabriziomello commented Apr 22, 2022 • edited Loading

codecov bot commented Apr 25, 2022 • edited Loading

Codecov Report

mkindahl left a comment

Choose a reason for hiding this comment

fabriziomello commented Apr 22, 2022 •

edited

Loading

codecov bot commented Apr 25, 2022 •

edited

Loading