Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous Aggregates finals form #4269

Merged

Conversation

fabriziomello
Copy link
Contributor

@fabriziomello fabriziomello commented Apr 22, 2022

Following work started by #4294 to improve performance of Continuous
Aggregates by removing the re-aggregation in the user view.

This PR get rid of partialize_agg and finalize_agg aggregate
functions and store the finalized aggregated (plain) data in the
materialization hypertable.

Because we're not storing partials anymore and removed the
re-aggregation, now is be possible to create indexes on aggregated
columns in the materialization hypertable in order to improve the
performance even more.

Also removed restrictions on types of aggregates users can perform
with Continuous Aggregates:

  • aggregates with DISTINCT
  • aggregates with FILTER
  • aggregates with FILTER in HAVING clause
  • aggregates without combine function
  • ordered-set aggregates
  • hypothetical-set aggregates

By default new Continuous Aggregates will be created using this new
format, but the previous version (with partials) will be supported.

Users can create the previous style by setting to false the storage
paramater named timescaledb.finalized during the creation of the
Continuous Aggregate.

Fixes #4233

@fabriziomello fabriziomello force-pushed the cagg_performance_finals_form branch 4 times, most recently from 6067dd1 to 04911fd Compare April 25, 2022 13:30
@codecov
Copy link

codecov bot commented Apr 25, 2022

Codecov Report

Merging #4269 (b957afd) into main (bb241ff) will increase coverage by 0.12%.
The diff coverage is 95.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4269      +/-   ##
==========================================
+ Coverage   90.68%   90.80%   +0.12%     
==========================================
  Files         215      217       +2     
  Lines       39704    40107     +403     
==========================================
+ Hits        36005    36421     +416     
+ Misses       3699     3686      -13     
Impacted Files Coverage Δ
src/chunk.h 100.00% <ø> (ø)
src/compat/compat.h 94.73% <ø> (ø)
src/init.c 97.05% <ø> (+5.39%) ⬆️
src/nodes/chunk_append/chunk_append.c 97.94% <ø> (ø)
src/planner/add_hashagg.c 54.28% <ø> (ø)
src/planner/agg_bookend.c 93.16% <ø> (ø)
src/planner/expand_hypertable.c 93.94% <ø> (ø)
src/planner/partialize.c 97.91% <ø> (ø)
src/planner/planner.h 100.00% <ø> (ø)
tsl/src/fdw/relinfo.c 96.83% <ø> (-0.02%) ⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ddd0292...b957afd. Read the comment docs.

@fabriziomello fabriziomello force-pushed the cagg_performance_finals_form branch 8 times, most recently from 9c2367b to 23955af Compare May 2, 2022 18:38
@fabriziomello fabriziomello changed the title Cagg performance finals form Continuous Aggregates finals form May 2, 2022
@fabriziomello fabriziomello self-assigned this May 2, 2022
@fabriziomello fabriziomello added this to the TimescaleDB 2.7 milestone May 2, 2022
@fabriziomello fabriziomello added the enhancement An enhancement to an existing feature for functionality label May 2, 2022
@fabriziomello fabriziomello force-pushed the cagg_performance_finals_form branch 2 times, most recently from 9a7f0d2 to f4b1d3e Compare May 2, 2022 22:05
@fabriziomello fabriziomello marked this pull request as ready for review May 2, 2022 22:13
@fabriziomello fabriziomello requested a review from a team as a code owner May 2, 2022 22:13
@fabriziomello fabriziomello requested review from afiskon and duncan-tsdb and removed request for a team May 2, 2022 22:13
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the backquotes from the title line. I do not think it is a big issue, but we decided to not use those in the title line, while using in the body is fine.

@fabriziomello fabriziomello changed the title Continuous Aggregates finals form Continuous Aggregates finals form May 3, 2022
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Dec 13, 2023
Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.

In 2.10.0 released on February 2023 the Continuous Aggregate old format
deprecation was announced.

With this PR the ability of creating Continuous Aggregate in the old
format was removed, but we still support migrate from the old to the
new format by running the `cagg_migrate` procedure.

This is the continuation of the PR timescale#5977 started by @pdipesh02.

References:
https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/
https://github.com/timescale/timescaledb/releases/tag/2.10.0
https://github.com/timescale/timescaledb/releases/tag/2.7.0
timescale#5977
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Dec 13, 2023
Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.

In 2.10.0 released on February 2023 the Continuous Aggregate old format
deprecation was announced.

With this PR the ability of creating Continuous Aggregate in the old
format was removed, but we still support migrate from the old to the
new format by running the `cagg_migrate` procedure.

This is the continuation of the PR timescale#5977 started by @pdipesh02.

References:
https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/
https://github.com/timescale/timescaledb/releases/tag/2.10.0
https://github.com/timescale/timescaledb/releases/tag/2.7.0
timescale#5977
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Dec 13, 2023
Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.

In 2.10.0 released on February 2023 the Continuous Aggregate old format
deprecation was announced.

With this PR the ability of creating Continuous Aggregate in the old
format was removed, but we still support migrate from the old to the
new format by running the `cagg_migrate` procedure.

This is the continuation of the PR timescale#5977 started by @pdipesh02.

References:
https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/
https://github.com/timescale/timescaledb/releases/tag/2.10.0
https://github.com/timescale/timescaledb/releases/tag/2.7.0
timescale#5977
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Dec 13, 2023
Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.

In 2.10.0 released on February 2023 the Continuous Aggregate old format
deprecation was announced.

With this PR the ability of creating Continuous Aggregate in the old
format was removed, but we still support migrate from the old to the
new format by running the `cagg_migrate` procedure.

This is the continuation of the PR timescale#5977 started by @pdipesh02.

References:
https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/
https://github.com/timescale/timescaledb/releases/tag/2.10.0
https://github.com/timescale/timescaledb/releases/tag/2.7.0
timescale#5977
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Dec 13, 2023
Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.

In 2.10.0 released on February 2023 the Continuous Aggregate old format
deprecation was announced.

With this PR the ability of creating Continuous Aggregate in the old
format was removed, but we still support migrate from the old to the
new format by running the `cagg_migrate` procedure.

This is the continuation of the PR timescale#5977 started by @pdipesh02.

References:
https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/
https://github.com/timescale/timescaledb/releases/tag/2.10.0
https://github.com/timescale/timescaledb/releases/tag/2.7.0
timescale#5977
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Dec 13, 2023
Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.

In 2.10.0 released on February 2023 the Continuous Aggregate old format
deprecation was announced.

With this PR the ability of creating Continuous Aggregate in the old
format was removed, but we still support migrate from the old to the
new format by running the `cagg_migrate` procedure.

This is the continuation of the PR timescale#5977 started by @pdipesh02.

References:
https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/
https://github.com/timescale/timescaledb/releases/tag/2.10.0
https://github.com/timescale/timescaledb/releases/tag/2.7.0
timescale#5977
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Dec 13, 2023
Timescale 2.7 released a new version of Continuous Aggregate (timescale#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.

In 2.10.0 released on February 2023 the Continuous Aggregate old format
deprecation was announced.

With this PR the ability of creating Continuous Aggregate in the old
format was removed, but we still support migrate from the old to the
new format by running the `cagg_migrate` procedure.

This is the continuation of the PR timescale#5977 started by @pdipesh02.

References:
https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/
https://github.com/timescale/timescaledb/releases/tag/2.10.0
https://github.com/timescale/timescaledb/releases/tag/2.7.0
timescale#5977
fabriziomello added a commit that referenced this pull request Dec 13, 2023
Timescale 2.7 released a new version of Continuous Aggregate (#4269)
that store the final aggregation state instead of the byte array of
the partial aggregate state, offering multiple opportunities of
optimizations as well a more compact form.

In 2.10.0 released on February 2023 the Continuous Aggregate old format
deprecation was announced.

With this PR the ability of creating Continuous Aggregate in the old
format was removed, but we still support migrate from the old to the
new format by running the `cagg_migrate` procedure.

This is the continuation of the PR #5977 started by @pdipesh02.

References:
https://docs.timescale.com/api/latest/continuous-aggregates/cagg_migrate/
https://github.com/timescale/timescaledb/releases/tag/2.10.0
https://github.com/timescale/timescaledb/releases/tag/2.7.0
#5977
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 8, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 8, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Fixes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 8, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Fixes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 8, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 9, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 9, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 9, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 12, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 13, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 13, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 14, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 14, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 15, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 15, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 15, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped whe dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 15, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped when dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unnecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Feb 16, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped when dropping chunks.

In timescale#4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unnecessary dropped chunk
metadata in our catalog.

Closes timescale#6570
fabriziomello added a commit that referenced this pull request Feb 16, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped when dropping chunks.

In #4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unnecessary dropped chunk
metadata in our catalog.

Closes #6570
github-actions bot pushed a commit that referenced this pull request Feb 28, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped when dropping chunks.

In #4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unnecessary dropped chunk
metadata in our catalog.

Closes #6570

(cherry picked from commit 5a359ac)
jnidzwetzki pushed a commit that referenced this pull request Feb 28, 2024
Historically we preserve chunk metadata because the old format of the
Continuous Aggregate has the `chunk_id` column in the materialization
hypertable so in order to don't have chunk ids left over there we just
mark it as dropped when dropping chunks.

In #4269 we introduced a new Continuous Aggregate format that don't
store the `chunk_id` in the materialization hypertable anymore so it's
safe to also remove the metadata when dropping chunk and all associated
Continuous Aggregates are in the new format.

Also added a post-update SQL script to cleanup unnecessary dropped chunk
metadata in our catalog.

Closes #6570

(cherry picked from commit 5a359ac)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Bad Continuous Aggregate View Definition
6 participants