Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support transparent decompression on individual chunks #3717

Merged
merged 1 commit into from
Oct 20, 2021

Conversation

svenklemm
Copy link
Member

@svenklemm svenklemm commented Oct 18, 2021

In queries on distributed hypertables with per datanode queries
disabled transparent decompression would not work because the
query sent to the data node was for individual chunks. To support
queries on distributed hypertables with per datanode queries
disabled this patch adds support for transparent decompression
in queries on individual chunks.

Fixes #3714

@svenklemm svenklemm self-assigned this Oct 18, 2021
@svenklemm svenklemm force-pushed the decomp_chunk branch 10 times, most recently from 7ad76c8 to 3282d3b Compare October 19, 2021 03:16
@svenklemm svenklemm marked this pull request as ready for review October 19, 2021 04:37
@svenklemm svenklemm requested a review from a team as a code owner October 19, 2021 04:37
@svenklemm svenklemm requested review from berkley, fabriziomello and duncan-tsdb and removed request for a team October 19, 2021 04:37
@mkindahl mkindahl removed the request for review from berkley October 19, 2021 06:14
Copy link
Contributor

@erimatnor erimatnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this broken also on single node if you queried a chunk directly? So, it doesn't seem specific to per-datanode queries being disabled, or am I missing something?

I would also expect new tests to cover transparent decompressions on individual chunks, but don't see any.

tsl/src/nodes/decompress_chunk/planner.c Outdated Show resolved Hide resolved
tsl/src/nodes/decompress_chunk/planner.c Outdated Show resolved Hide resolved
tsl/src/nodes/decompress_chunk/planner.c Outdated Show resolved Hide resolved
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about a potentially NULL pointer. Otherwise looks good.

tsl/src/nodes/decompress_chunk/decompress_chunk.c Outdated Show resolved Hide resolved
@svenklemm svenklemm force-pushed the decomp_chunk branch 3 times, most recently from f01067a to e46fcd9 Compare October 19, 2021 09:55
@svenklemm
Copy link
Member Author

Wasn't this broken also on single node if you queried a chunk directly? So, it doesn't seem specific to per-datanode queries being disabled, or am I missing something?

I would also expect new tests to cover transparent decompressions on individual chunks, but don't see any.

There wasn't really a usecase for this on singlenode but it is a required feature for querying distributed hypertable without per datanode queries. But you are correct that it didn't work at all previously independent of singlenode/multinode.

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to add tests, even if just basic ones, but approve it since @erimatnor has already mentioned this.

@codecov
Copy link

codecov bot commented Oct 19, 2021

Codecov Report

Merging #3717 (40bca0c) into master (f25e795) will increase coverage by 0.05%.
The diff coverage is 95.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3717      +/-   ##
==========================================
+ Coverage   90.20%   90.26%   +0.05%     
==========================================
  Files         212      212              
  Lines       37012    37079      +67     
==========================================
+ Hits        33387    33469      +82     
+ Misses       3625     3610      -15     
Impacted Files Coverage Δ
src/compat/compat.h 94.28% <ø> (ø)
src/planner.h 100.00% <ø> (ø)
src/utils.c 78.10% <ø> (ø)
src/utils.h 80.00% <ø> (ø)
tsl/src/continuous_aggs/create.c 95.85% <66.66%> (-0.12%) ⬇️
tsl/src/nodes/decompress_chunk/planner.c 95.83% <85.71%> (-0.52%) ⬇️
tsl/src/fdw/estimate.c 97.29% <86.66%> (-0.77%) ⬇️
tsl/src/nodes/decompress_chunk/decompress_chunk.c 94.82% <91.66%> (+0.11%) ⬆️
tsl/src/fdw/scan_plan.c 95.50% <97.87%> (+1.51%) ⬆️
src/planner.c 94.64% <100.00%> (+0.22%) ⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8862081...40bca0c. Read the comment docs.

@svenklemm svenklemm force-pushed the decomp_chunk branch 2 times, most recently from 67819a3 to c4e68cb Compare October 19, 2021 16:31
src/planner.c Show resolved Hide resolved
tsl/src/planner.c Show resolved Hide resolved
Copy link
Contributor

@erimatnor erimatnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but have some comments on tests.

Also, I think it makes sense to update the commit/PR message to reflect that this is not specifically an issue with distributed hypertables, although it is fine to mention this as one use case.

Also, there are no tests on distributed hypertables with compression and no per-data node queries. I guess it would be good to test this, although it should work given current tests.

PREPARE prep AS SELECT count(time) FROM :TEST_TABLE WHERE device_id = 1;

:PREFIX EXECUTE prep;
EXECUTE prep;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of executing the prepared statement multiple times? Is it to ensure it is using a cached plan?

Assuming caching is the reason: How do we know at which point a cached plan is used? Is there a way to expose the information or make sure we set some parameter prior to these calls that clearly states the number of steps required? Now this is all very implicit and not possible to verify.

@svenklemm svenklemm force-pushed the decomp_chunk branch 12 times, most recently from 40164c7 to 1d929bd Compare October 20, 2021 17:51
This patch adds support for transparent decompression in queries
on individual chunks.
This is required for distributed hypertables with compression
when enable_per_data_node_queries is set to false. Without
this functionality queries on distributed hypertables with
compression would not return data for compressed chunks as
the generated FDW queries would target individual chunks.

Fixes timescale#3714
@svenklemm svenklemm merged commit acc6abe into timescale:master Oct 20, 2021
@fabriziomello fabriziomello mentioned this pull request Oct 26, 2021
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Oct 27, 2021
Pull request timescale#3717 introduced a new template SQL test file but missed
to add the properly gitgnore entry to ignore generated test files.
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Oct 27, 2021
Pull request timescale#3717 introduced a new template SQL test file but missed
to add the properly gitgnore entry to ignore generated test files.
fabriziomello added a commit that referenced this pull request Oct 27, 2021
Pull request #3717 introduced a new template SQL test file but missed
to add the properly gitgnore entry to ignore generated test files.
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* timescale#3034 Add support for PostgreSQL 14
* timescale#3435 Add continuous aggregates for distributed hypertables
* timescale#3505 Add support for timezones in `time_bucket_ng()`
* timescale#3598 Improve evaluation of stable functions such as now() on access node
* timescale#3717 Support transparent decompression on individual chunks

**Bugfixes**
* timescale#3580 Fix memory context bug executing TRUNCATE
* timescale#3592 Allow alter column type on distributed hypertable
* timescale#3618 Fix execution of refresh_caggs from user actions
* timescale#3625 Add shared dependencies when creating chunk
* timescale#3626 Fix memory context bug executing TRUNCATE
* timescale#3627 Schema qualify UDTs in multi-node
* timescale#3638 Allow owner change of a data node
* timescale#3654 Fix index attnum mapping in reorder_chunk
* timescale#3661 Fix SkipScan path generation with constant DISTINCT column
* timescale#3667 Fix compress_policy for multi txn handling
* timescale#3673 Fix distributed hypertable DROP within a procedure
* timescale#3701 Allow anyone to use size utilities on distributed hypertables
* timescale#3708 Fix crash in get_aggsplit
* timescale#3709 Fix ordered append pathkey check
* timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* timescale#3724 Fix inserts into compressed chunks on hypertables with caggs
* timescale#3727 Fix DirectFunctionCall crash in distributed_exec
* timescale#3728 Fix SkipScan with varchar column
* timescale#3733 Fix ANALYZE crash with custom statistics for custom types
* timescale#3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on multinode
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* timescale#3034 Add support for PostgreSQL 14
* timescale#3435 Add continuous aggregates for distributed hypertables
* timescale#3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* timescale#3580 Fix memory context bug executing TRUNCATE
* timescale#3592 Allow alter column type on distributed hypertable
* timescale#3598 Improve evaluation of stable functions such as now() on access
node
* timescale#3618 Fix execution of refresh_caggs from user actions
* timescale#3625 Add shared dependencies when creating chunk
* timescale#3626 Fix memory context bug executing TRUNCATE
* timescale#3627 Schema qualify UDTs in multi-node
* timescale#3638 Allow owner change of a data node
* timescale#3654 Fix index attnum mapping in reorder_chunk
* timescale#3661 Fix SkipScan path generation with constant DISTINCT column
* timescale#3667 Fix compress_policy for multi txn handling
* timescale#3673 Fix distributed hypertable DROP within a procedure
* timescale#3701 Allow anyone to use size utilities on distributed hypertables
* timescale#3708 Fix crash in get_aggsplit
* timescale#3709 Fix ordered append pathkey check
* timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* timescale#3717 Support transparent decompression on individual chunks
* timescale#3724 Fix inserts into compressed chunks on hypertables with caggs
* timescale#3727 Fix DirectFunctionCall crash in distributed_exec
* timescale#3728 Fix SkipScan with varchar column
* timescale#3733 Fix ANALYZE crash with custom statistics for custom types
* timescale#3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries
and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed
hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on
multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries
and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed
hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on
multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries
and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed
hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on
multinode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cannot read compressed chunks from distributed hypertables
4 participants