Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize compressed chunk resorting #5530

Merged
merged 1 commit into from
May 2, 2023

Conversation

jnidzwetzki
Copy link
Contributor

@jnidzwetzki jnidzwetzki commented Apr 6, 2023

This patch adds an optimization to the DecompressChunk node. If the query 'order by' and the compression 'order by' are compatible (query 'order by' is equal or a prefix of compression 'order by'), the compressed batches of the segments are decompressed in parallel and merged using a binary heep. This preserves the ordering and the sorting of the result can be prevented. Especially LIMIT queries benefit from this optimization because only the first tuples of some batches have to be decompressed. Previously, all segments were completely decompressed and sorted.

Single Query Execution Directly in PostgreSQL

--- New
test2=# SELECT * FROM sensor_data ORDER BY time DESC LIMIT 1;
             time              | sensor_id |        cpu         |   temperature    
-------------------------------+-----------+--------------------+------------------
 2023-03-22 16:22:08.119032+01 |        53 | 0.7768845568255323 | 95.6922406819352
(1 row)

Time: 10.486 ms

--- Old
test2=# SET timescaledb.enable_decompression_heap_merge = 0;
SET
Time: 2.303 ms
test2=# SELECT * FROM sensor_data ORDER BY time DESC LIMIT 1;
             time              | sensor_id |        cpu         |   temperature    
-------------------------------+-----------+--------------------+------------------
 2023-03-22 16:22:08.119032+01 |        53 | 0.7768845568255323 | 95.6922406819352
(1 row)

Time: 560.409 ms

TSbench

$  tsbench -vvvv --with-connection pgsq://localhost:5432/benchmark  --benchmarks 'ordered_append_compressed'

Report for benchmark suite 'compressed_chunk_order'
+--------------------------------------------------------------------+------------------------------------------+
| Query                                                              | 0e8177f8aadb162ebb94abd171c041761e2bbe55 |
+--------------------------------------------------------------------+------------------------------------------+
| SELECT * FROM sensor_data_compressed ORDER BY time ASC LIMIT 1;    |                                   154.23 |
| SELECT * FROM sensor_data_compressed ORDER BY time ASC LIMIT 100;  |                                   154.00 |
| SELECT * FROM sensor_data_compressed ORDER BY time ASC;            |                                  2322.28 |
| SELECT * FROM sensor_data_compressed ORDER BY time DESC LIMIT 1;   |                                     5.94 |
| SELECT * FROM sensor_data_compressed ORDER BY time DESC LIMIT 100; |                                    12.59 |
| SELECT * FROM sensor_data_compressed ORDER BY time DESC;           |                                  2184.67 |
+--------------------------------------------------------------------+------------------------------------------+

Note: For ORDER BY time DESC queries, the merge optimization is enabled. The ORDER BY time ASC queries use the regular query plans.

Comparison with execution times on 2.10.1

$  tsbench -vvvv --with-connection pgsq://localhost:5432/benchmark  --benchmarks 'ordered_append_compressed'

Report for benchmark suite 'compressed_chunk_order'
+--------------------------------------------------------------------+------------------------------------------+
| Query                                                              | 540a63e6788e8b66cd86099d5c17f50507dcd080 |
+--------------------------------------------------------------------+------------------------------------------+
| SELECT * FROM sensor_data_compressed ORDER BY time ASC LIMIT 1;    |                                   156.90 |
| SELECT * FROM sensor_data_compressed ORDER BY time ASC LIMIT 100;  |                                   156.72 |
| SELECT * FROM sensor_data_compressed ORDER BY time ASC;            |                                  2341.44 |
| SELECT * FROM sensor_data_compressed ORDER BY time DESC LIMIT 1;   |                                   160.72 |
| SELECT * FROM sensor_data_compressed ORDER BY time DESC LIMIT 100; |                                   158.97 |
| SELECT * FROM sensor_data_compressed ORDER BY time DESC;           |                                  2303.65 |
+--------------------------------------------------------------------+------------------------------------------+

So, this PR does not affect the performance of the existing code path and just adds an optimization for certain queries.

Fixes: #4223

@jnidzwetzki jnidzwetzki force-pushed the compression_limit_squashed branch 12 times, most recently from 9217d8b to b5a8814 Compare April 13, 2023 07:35
@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Merging #5530 (7f661df) into main (cc9c3b3) will decrease coverage by 0.29%.
The diff coverage is 96.31%.

❗ Current head 7f661df differs from pull request most recent head 8a794df. Consider uploading reports for the commit 8a794df to get more accurate results

@@            Coverage Diff             @@
##             main    #5530      +/-   ##
==========================================
- Coverage   90.92%   90.64%   -0.29%     
==========================================
  Files         229      229              
  Lines       54064    54141      +77     
==========================================
- Hits        49158    49076      -82     
- Misses       4906     5065     +159     
Impacted Files Coverage Δ
src/compat/compat.h 96.61% <ø> (+6.13%) ⬆️
tsl/src/nodes/decompress_chunk/decompress_chunk.c 94.19% <87.17%> (+0.19%) ⬆️
tsl/src/nodes/decompress_chunk/planner.c 92.25% <92.75%> (-0.14%) ⬇️
tsl/src/nodes/decompress_chunk/exec.c 96.32% <99.08%> (+2.69%) ⬆️
src/guc.c 95.38% <100.00%> (+0.07%) ⬆️
src/import/planner.c 65.15% <100.00%> (ø)
tsl/src/compression/create.c 96.30% <100.00%> (ø)

... and 39 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

CHANGELOG.md Outdated Show resolved Hide resolved
@jnidzwetzki jnidzwetzki force-pushed the compression_limit_squashed branch 2 times, most recently from 0e53000 to 74b7864 Compare April 27, 2023 13:29
Copy link
Member

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the interface we ended up with, feels a little confusing. Anyway, let's proceed with it, we can improve later. Would be good to see some tsbench benchmarks.

jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request May 2, 2023
This patch enables the compressed merge optimization (see #timescale#5530) also
for partially compressed chunks.
@jnidzwetzki jnidzwetzki force-pushed the compression_limit_squashed branch 2 times, most recently from 5d32495 to 86c5729 Compare May 2, 2023 07:41
@jnidzwetzki jnidzwetzki enabled auto-merge (rebase) May 2, 2023 08:02
This patch adds an optimization to the DecompressChunk node. If the
query 'order by' and the compression 'order by' are compatible (query
'order by' is equal or a prefix of compression 'order by'), the
compressed batches of the segments are decompressed in parallel and
merged using a binary heep. This preserves the ordering and the sorting
of the result can be prevented. Especially LIMIT queries benefit from
this optimization because only the first tuples of some batches have to
be decompressed. Previously, all segments were completely decompressed
and sorted.

Fixes: timescale#4223

Co-authored-by: Sotiris Stamokostas <sotiris@timescale.com>
@jnidzwetzki jnidzwetzki merged commit df32ad4 into timescale:main May 2, 2023
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request May 2, 2023
This patch enables the compressed merge optimization (see #timescale#5530) also
for partially compressed chunks.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request May 2, 2023
This patch enables the compressed merge optimization (see #timescale#5530) also
for partially compressed chunks.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request May 2, 2023
This patch enables the compressed merge optimization (see timescale#5530) also
for partially compressed chunks.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request May 2, 2023
This patch enables the compressed merge optimization (see timescale#5530) also
for partially compressed chunks.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request May 2, 2023
This patch enables the compressed merge optimization (see timescale#5530) also
for partially compressed chunks.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request May 2, 2023
This patch enables the compressed merge optimization (see timescale#5530) also
for partially compressed chunks.
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request May 2, 2023
This patch enables the compressed merge optimization (see timescale#5530) also
for partially compressed chunks.
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 12, 2023
This release includes these noteworthy features:
* compressed hypertable enhancements:
  * UPDATE/DELETE support
  * ON CONFLICT DO UPDATE
* Join support for hierarchical Continougs Aggregates
* performance improvements

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 17, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
@kgyrtkirk kgyrtkirk mentioned this pull request May 17, 2023
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* #5212 Allow pushdown of reference table joins
* #5221 Improve Realtime Continuous Aggregate performance
* #5252 Improve unique constraint support on compressed hypertables
* #5339 Support UPDATE/DELETE on compressed hypertables
* #5344 Enable JOINS for Hierarchical Continuous Aggregates
* #5361 Add parallel support for partialize_agg()
* #5417 Refactor and optimize distributed COPY
* #5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* #5547 Skip Ordered Append when only 1 child node is present
* #5510 Propagate vacuum/analyze to compressed chunks
* #5584 Reduce decompression during constraint checking
* #5530 Optimize compressed chunk resorting
* #5639 Support sending telemetry event reports

**Bugfixes**
* #5396 Fix SEGMENTBY columns predicates to be pushed down
* #5427 Handle user-defined FDW options properly
* #5442 Decompression may have lost DEFAULT values
* #5459 Fix issue creating dimensional constraints
* #5570 Improve interpolate error message on datatype mismatch
* #5573 Fix unique constraint on compressed tables
* #5615 Add permission checks to run_job()
* #5614 Enable run_job() for telemetry job
* #5578 Fix on-insert decompression after schema changes
* #5613 Quote username identifier appropriately
* #5525 Fix tablespace for compressed hypertable and corresponding toast
* #5642 Fix ALTER TABLE SET with normal tables
* #5666 Reduce memory usage for distributed analyze
* #5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
kgyrtkirk added a commit to kgyrtkirk/timescaledb that referenced this pull request May 19, 2023
This release contains new features and bug fixes since the 2.10.3 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:
* Support for DML operations on compressed chunks:
  * UPDATE/DELETE support
  * Support for unique constraints on compressed chunks
  * Support for `ON CONFLICT DO UPDATE`
  * Support for `ON CONFLICT DO NOTHING`
* Join support for hierarchical Continuous Aggregates

**Features**
* timescale#5212 Allow pushdown of reference table joins
* timescale#5221 Improve Realtime Continuous Aggregate performance
* timescale#5252 Improve unique constraint support on compressed hypertables
* timescale#5339 Support UPDATE/DELETE on compressed hypertables
* timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates
* timescale#5361 Add parallel support for partialize_agg()
* timescale#5417 Refactor and optimize distributed COPY
* timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables
* timescale#5547 Skip Ordered Append when only 1 child node is present
* timescale#5510 Propagate vacuum/analyze to compressed chunks
* timescale#5584 Reduce decompression during constraint checking
* timescale#5530 Optimize compressed chunk resorting
* timescale#5639 Support sending telemetry event reports

**Bugfixes**
* timescale#5396 Fix SEGMENTBY columns predicates to be pushed down
* timescale#5427 Handle user-defined FDW options properly
* timescale#5442 Decompression may have lost DEFAULT values
* timescale#5459 Fix issue creating dimensional constraints
* timescale#5570 Improve interpolate error message on datatype mismatch
* timescale#5573 Fix unique constraint on compressed tables
* timescale#5615 Add permission checks to run_job()
* timescale#5614 Enable run_job() for telemetry job
* timescale#5578 Fix on-insert decompression after schema changes
* timescale#5613 Quote username identifier appropriately
* timescale#5525 Fix tablespace for compressed hypertable and corresponding toast
* timescale#5642 Fix ALTER TABLE SET with normal tables
* timescale#5666 Reduce memory usage for distributed analyze
* timescale#5668 Fix subtransaction resource owner

**Thanks**
* @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates
* @ollz272 for reporting an issue with interpolate error messages
jnidzwetzki added a commit to jnidzwetzki/timescaledb that referenced this pull request Jun 2, 2023
This patch enables the compressed merge optimization (see timescale#5530) also
for partially compressed chunks.
jnidzwetzki added a commit that referenced this pull request Jun 2, 2023
This patch enables the compressed merge optimization (see #5530) also
for partially compressed chunks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants