Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce decompression during constraint checking #5584

Merged
merged 1 commit into from
Apr 20, 2023

Conversation

antekresic
Copy link
Contributor

When inserting into a compressed chunk with constraints present, we need to decompress relevant tuples in order to do speculative inserting. Usually we used segment by column values to limit the amount of compressed segments to decompress. This change expands on that by also using segment metadata to further filter compressed rows that need to be decompressed.

@codecov
Copy link

codecov bot commented Apr 18, 2023

Codecov Report

Merging #5584 (39be666) into main (28d9db1) will increase coverage by 0.13%.
The diff coverage is 89.18%.

❗ Current head 39be666 differs from pull request most recent head 12f6589. Consider uploading reports for the commit 12f6589 to get more accurate results

@@            Coverage Diff             @@
##             main    #5584      +/-   ##
==========================================
+ Coverage   90.40%   90.53%   +0.13%     
==========================================
  Files         229      229              
  Lines       53999    47521    -6478     
==========================================
- Hits        48817    43025    -5792     
+ Misses       5182     4496     -686     
Impacted Files Coverage Δ
tsl/src/compression/compression.c 95.30% <89.18%> (-0.63%) ⬇️

... and 203 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@antekresic antekresic marked this pull request as ready for review April 18, 2023 13:28
@github-actions
Copy link

@nikkhils, @jnidzwetzki: please review this pull request.

Powered by pull-review

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@sb230132 sb230132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more tests where you have many rows in a compressed chunk and ensure that the patch picks the right row.

@antekresic antekresic force-pushed the orderby_decompress branch 2 times, most recently from 0ad3fdc to bacf733 Compare April 19, 2023 12:14
When inserting into a compressed chunk with constraints present,
we need to decompress relevant tuples in order to do speculative
inserting. Usually we used segment by column values to limit the
amount of compressed segments to decompress. This change expands
on that by also using segment metadata to further filter
compressed rows that need to be decompressed.
@antekresic antekresic merged commit a49fdbc into timescale:main Apr 20, 2023
48 checks passed
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 12, 2023
This release includes these noteworthy features:
* compressed hypertable enhancements:
  * UPDATE/DELETE support
  * ON CONFLICT DO UPDATE
* Join support for hierarchical Continougs Aggregates
* performance improvements

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 17, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
@kgyrtkirk kgyrtkirk mentioned this pull request May 17, 2023
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* #5212 Allow pushdown of reference table joins
* #5221 Improve Realtime Continuous Aggregate performance
* #5252 Improve unique constraint support on compressed hypertables
* #5339 Support UPDATE/DELETE on compressed hypertables
* #5344 Enable JOINS for Hierarchical Continuous Aggregates
* #5361 Add parallel support for partialize_agg()
* #5417 Refactor and optimize distributed COPY
* #5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* #5547 Skip Ordered Append when only 1 child node is present
* #5510 Propagate vacuum/analyze to compressed chunks
* #5584 Reduce decompression during constraint checking
* #5530 Optimize compressed chunk resorting
* #5639 Support sending telemetry event reports

**Bugfixes**
* #5396 Fix SEGMENTBY columns predicates to be pushed down
* #5427 Handle user-defined FDW options properly
* #5442 Decompression may have lost DEFAULT values
* #5459 Fix issue creating dimensional constraints
* #5570 Improve interpolate error message on datatype mismatch
* #5573 Fix unique constraint on compressed tables
* #5615 Add permission checks to run_job()
* #5614 Enable run_job() for telemetry job
* #5578 Fix on-insert decompression after schema changes
* #5613 Quote username identifier appropriately
* #5525 Fix tablespace for compressed hypertable and corresponding toast
* #5642 Fix ALTER TABLE SET with normal tables
* #5666 Reduce memory usage for distributed analyze
* #5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants