Fix duplicates on partially compressed chunk reads #5872

jnidzwetzki · 2023-07-12T20:37:37Z

When the uncompressed part of a partially compressed chunk is read by a non-partial path and the compressed part by a partial path, the append node on top could process the uncompressed part multiple times because the path was declared as a partial path (correct is non-partial path) and the append node assumed it could be executed in all workers in parallel without producing duplicates.

This PR ensures that the append node is aware of the correct path type.

When the uncompressed part of a partially compressed chunk is read by a non-partial path and the compressed part by a partial path, the append node on top could process the uncompressed part multiple times because the path was declared as a partial path and the append node assumed it could be executed in all workers in parallel without producing duplicates. This PR fixes the declaration of the path.

codecov · 2023-07-12T20:52:34Z

Codecov Report

Merging #5872 (0979dfe) into main (4c3d64a) will increase coverage by 0.02%.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##             main    #5872      +/-   ##
==========================================
+ Coverage   87.83%   87.86%   +0.02%     
==========================================
  Files         239      239              
  Lines       55792    55795       +3     
  Branches    12359    12359              
==========================================
+ Hits        49005    49023      +18     
- Misses       4870     4886      +16     
+ Partials     1917     1886      -31

Impacted Files	Coverage Δ
tsl/src/nodes/decompress_chunk/decompress_chunk.c	`90.54% <87.50%> (+0.34%)`	⬆️

... and 18 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

github-actions · 2023-07-12T21:23:57Z

@akuzm, @konskov: please review this pull request.

Powered by pull-review

timescale-automation · 2023-07-13T07:00:22Z

Automated backport to 2.11.x not done: cherry-pick failed.

Git status

HEAD detached at origin/2.11.x
You are currently cherry-picking commit 36e710001.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   .unreleased/bugfix_5872

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   tsl/src/nodes/decompress_chunk/decompress_chunk.c
	both modified:   tsl/test/expected/compression.out
	both modified:   tsl/test/sql/compression.sql

Job log

@ajcanterbury

This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * timescale#5137 Insert into index during chunk compression * timescale#5150 MERGE support on hypertables * timescale#5515 Make hypertables support replica identity * timescale#5586 Index scan support during UPDATE/DELETE on compressed hypertables * timescale#5596 Support for partial aggregations at chunk level * timescale#5599 Enable ChunkAppend for partially compressed chunks * timescale#5655 Improve the number of parallel workers for decompression * timescale#5758 Enable altering job schedule type through `alter_job` * timescale#5805 Make logrepl markers for (partial) decompressions * timescale#5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * timescale#5839 Support CAgg names in chunk_detailed_size * timescale#5852 Make set_chunk_time_interval CAggs aware * timescale#5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * timescale#5875 Add job exit status and runtime to log * timescale#5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * timescale#5860 Fix interval calculation for hierarchical CAggs * timescale#5894 Check unique indexes when enabling compression * timescale#5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * timescale#5988 Move functions to _timescaledb_functions schema * timescale#5788 Chunk_create must add an existing table or fail * timescale#5872 Fix duplicates on partially compressed chunk reads * timescale#5918 Fix crash in COPY from program returning error * timescale#5990 Place data in first/last function in correct mctx * timescale#5991 Call eq_func correctly in time_bucket_gapfill * timescale#6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * timescale#6035 Fix server crash on UPDATE of compressed chunk * timescale#6044 Fix server crash when using duplicate segmentby column * timescale#6045 Fix segfault in set_integer_now_func * timescale#6053 Fix approximate_row_count for CAggs * timescale#6081 Improve compressed DML datatype handling * timescale#6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill

@ajcanterbury

This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill

@ajcanterbury

This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill

@ajcanterbury

This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill

@ajcanterbury

This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes * #6102 Schedule compression policy more often **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill

@ajcanterbury

This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes * #6102 Schedule compression policy more often **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill

github-actions bot assigned jnidzwetzki Jul 12, 2023

jnidzwetzki added compression force-auto-backport Automatically backport this PR or fix of this issue, even if it's not marked as "bug" labels Jul 12, 2023

jnidzwetzki force-pushed the parallel_worker_partial branch from c54bbc0 to 0979dfe Compare July 12, 2023 20:39

jnidzwetzki marked this pull request as ready for review July 12, 2023 21:23

github-actions bot requested review from akuzm and konskov July 12, 2023 21:23

jnidzwetzki force-pushed the parallel_worker_partial branch 2 times, most recently from 65d1a9b to 0979dfe Compare July 12, 2023 21:49

pmwkaa approved these changes Jul 13, 2023

View reviewed changes

sb230132 approved these changes Jul 13, 2023

View reviewed changes

konskov approved these changes Jul 13, 2023

View reviewed changes

jnidzwetzki merged commit 36e7100 into timescale:main Jul 13, 2023
84 of 85 checks passed

timescale-automation added the auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) label Jul 13, 2023

jnidzwetzki mentioned this pull request Jul 17, 2023

Ensure partial paths are always created #5855

Closed

svenklemm mentioned this pull request Sep 20, 2023

Release 2.12.0 #6086

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicates on partially compressed chunk reads #5872

Fix duplicates on partially compressed chunk reads #5872

jnidzwetzki commented Jul 12, 2023 •

edited

codecov bot commented Jul 12, 2023 •

edited

github-actions bot commented Jul 12, 2023

timescale-automation commented Jul 13, 2023

Fix duplicates on partially compressed chunk reads #5872

Fix duplicates on partially compressed chunk reads #5872

Conversation

jnidzwetzki commented Jul 12, 2023 • edited

codecov bot commented Jul 12, 2023 • edited

Codecov Report

github-actions bot commented Jul 12, 2023

timescale-automation commented Jul 13, 2023

Git status

jnidzwetzki commented Jul 12, 2023 •

edited

codecov bot commented Jul 12, 2023 •

edited