Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README for continuous aggregates #2405

Merged
merged 1 commit into from Sep 18, 2020

Conversation

erimatnor
Copy link
Contributor

The README that gives an overview of the implementation of continuous
aggregates is updated to reflect changes to the feature.

Fixes #2147

@erimatnor erimatnor added this to the 2.0.0 milestone Sep 16, 2020
@codecov
Copy link

codecov bot commented Sep 16, 2020

Codecov Report

Merging #2405 into master will increase coverage by 1.18%.
The diff coverage is 98.70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2405      +/-   ##
==========================================
+ Coverage   88.91%   90.09%   +1.18%     
==========================================
  Files         212      213       +1     
  Lines       34788    34300     -488     
==========================================
- Hits        30932    30904      -28     
+ Misses       3856     3396     -460     
Impacted Files Coverage Δ
src/catalog.c 84.70% <ø> (ø)
src/catalog.h 100.00% <ø> (ø)
src/cross_module_fn.c 58.13% <ø> (+1.32%) ⬆️
src/hypertable.c 87.42% <ø> (-0.08%) ⬇️
tsl/src/continuous_aggs/insert.c 87.23% <ø> (+0.15%) ⬆️
tsl/src/continuous_aggs/materialize.c 66.00% <ø> (+52.70%) ⬆️
tsl/src/init.c 88.88% <ø> (ø)
src/process_utility.c 93.59% <88.88%> (-0.05%) ⬇️
src/chunk.c 95.31% <100.00%> (-0.01%) ⬇️
src/continuous_agg.c 93.50% <100.00%> (+4.18%) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7abe65d...a3a28a7. Read the comment docs.

@erimatnor erimatnor marked this pull request as ready for review September 16, 2020 09:45
@erimatnor erimatnor requested a review from a team as a code owner September 16, 2020 09:45
@erimatnor erimatnor requested review from mkindahl, k-rus, gayyappan, pmwkaa and WireBaron and removed request for a team and gayyappan September 16, 2020 09:45
Copy link
Contributor

@pmwkaa pmwkaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Maybe one suggestion is to rename Readme.md to a more common README.md

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few things that I think would make the text easier to follow:

  • Change the order of the object list so that the references refer to previously introduced concepts.
  • You talk about "below" and "above" for timestamps. I would suggest to consistently use "older" and "newer", or "before" and "after". It is not clear what "below" and "above" means for timestamps.

tsl/src/continuous_aggs/Readme.md Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/Readme.md Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/Readme.md Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/Readme.md Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/Readme.md Outdated Show resolved Hide resolved
tsl/src/continuous_aggs/Readme.md Outdated Show resolved Hide resolved
Comment on lines 84 to 86
instead. Write amplification is further reduced by never writing
invalidations that modify time ranges that are more recent the current
invalidation threshold.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear why this reduces write amplification.

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some suggestions for improvements.

Comment on lines 51 to 55
5. An invalidation threshold, which tracks how "far" the
materialization has reached and below which invalidations must be
logged.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. An invalidation threshold, which tracks how "far" the
materialization has reached and below which invalidations must be
logged.
5. An invalidation threshold, which is a timestamp that tracks the latest
materialization. Invalidations before this timestamp will be
logged, but invalidations after this point will not be added to the invalidation log.

Comment on lines 39 to 40
allows write amplification to be kept to a minimum by, e.g, never
refreshing the most recent time bucket that is still being "filled".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other things we currently do? Otherwise, just write what we do.

Suggested change
allows write amplification to be kept to a minimum by, e.g, never
refreshing the most recent time bucket that is still being "filled".
allows write amplification to be kept to a minimum by not
refreshing the most recent time bucket, which is assumed to still have data being added to it.

Comment on lines 54 to 57
6. A trigger on the source hypertable that invalidates regions of data
in every transaction, based on INSERT, UPDATE, and DELETE
statements. The trigger writes its invalidations to the hypertable
invalidation log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't actually invalidate the regions, it just writes an invalidation record to the log. IMHO, it is better to be specific here.

Suggested change
6. A trigger on the source hypertable that invalidates regions of data
in every transaction, based on INSERT, UPDATE, and DELETE
statements. The trigger writes its invalidations to the hypertable
invalidation log.
6. A trigger on the source hypertable that writes invalidations of data
for every transaction, based on INSERT, UPDATE, and DELETE
statements. The trigger writes its invalidations to the hypertable
invalidation log.

Comment on lines +62 to +68
8. A materialization invalidation log. Once a refresh runs on a given
continuous aggregate, this log tracks how invalidations from the
hypertable invalidation log are processed against the refresh
window for the refreshed continuous aggregate. Thus, a single
invalidation in the hypertable invalidation log becomes one entry
per continuous aggregate in the materialization invalidation log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
8. A materialization invalidation log. Once a refresh runs on a given
continuous aggregate, this log tracks how invalidations from the
hypertable invalidation log are processed against the refresh
window for the refreshed continuous aggregate. Thus, a single
invalidation in the hypertable invalidation log becomes one entry
per continuous aggregate in the materialization invalidation log.
8. A materialization invalidation log for each continuous aggregate. Once a refresh runs on a given
continuous aggregate, this log tracks how invalidations from the
hypertable invalidation log are processed against the refresh
window for the refreshed continuous aggregate. Thus, a single
invalidation in the hypertable invalidation log becomes one entry
per continuous aggregate in the materialization invalidation log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one log, and your suggestion could be interpreted as if there is one log per cagg.

Comment on lines +82 to +86
Mutating transactions must record their mutations in the invalidation
log, so that a refresh knows to re-materialize the invalidated
range. This happens by installing a trigger on the source hypertable
when the first continuous aggregate on that hypertable is created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Mutating transactions must record their mutations in the invalidation
log, so that a refresh knows to re-materialize the invalidated
range. This happens by installing a trigger on the source hypertable
when the first continuous aggregate on that hypertable is created.
Mutating transactions must record their mutations in the invalidation
log for the hypertable, so that a refresh knows to re-materialize the invalidated
range. This happens by installing a trigger on the source hypertable
when the first continuous aggregate on that hypertable is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, your suggestion could be interpreted as if there is one log per hypertable, which is not the case.

The README that gives an overview of the implementation of continuous
aggregates is updated to reflect changes to the feature.

Fixes timescale#2147
@erimatnor erimatnor merged commit e2cefd9 into timescale:master Sep 18, 2020
@erimatnor erimatnor deleted the caggs-update-readme branch September 18, 2020 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the continuous aggregate README
3 participants