Fix unique constraint on compressed tables #5573

kgyrtkirk · 2023-04-14T15:24:00Z

Inserting multiple rows into a compressed chunk could have bypassed
constraint check in case the table had segment_by columns.

Fixes #5553

github-actions · 2023-04-14T15:24:23Z

@nikkhils, @mkindahl: please review this pull request.

Powered by pull-review

codecov · 2023-04-17T09:11:54Z

Codecov Report

Merging #5573 (7c6f09b) into main (a49fdbc) will increase coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #5573      +/-   ##
==========================================
+ Coverage   90.54%   90.60%   +0.06%     
==========================================
  Files         229      229              
  Lines       47525    53880    +6355     
==========================================
+ Hits        43031    48820    +5789     
- Misses       4494     5060     +566

Impacted Files	Coverage Δ
src/nodes/chunk_dispatch/chunk_dispatch.c	`96.57% <100.00%> (+0.44%)`	⬆️

... and 204 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

svenklemm · 2023-04-17T15:33:50Z

test/pg_regress.sh

@@ -30,6 +30,9 @@ SKIPS=${SKIPS:-}
 PSQL=${PSQL:-psql}
 PSQL="${PSQL} -X" # Prevent any .psqlrc files from being executed during the tests

+export PSQL


seems like this is an unrelated change not belonging to this PR

these changes enable me to debug tests - removed it

You can put it in separate PR or at least separate commit.

I will - and there I'll explain how this could be usefull

svenklemm · 2023-04-17T15:36:08Z

tsl/test/expected/compression_errors.out

@@ -3,6 +3,7 @@
 -- LICENSE-TIMESCALE for a copy of the license.
 \set ON_ERROR_STOP 0
 \set VERBOSITY default
+\set ECHO none


Why are you setting this?

without this all the satements loaded from test_utils.sql would be listed in the out file.

see the first few line of changes in tsl/test/sql/compression_errors.sql here - it turns back echo after the file is loaded.

should this be part of test_utils.sql then?

Great idea! as this change will affect other tests as well - I'll open a separate PR for that!

akuzm · 2023-04-18T09:44:19Z

src/nodes/chunk_dispatch/chunk_dispatch.c

-		if (found && ts_chunk_is_compressed(chunk) && !ts_chunk_is_distributed(chunk))
+	if (found)
+	{
+		Chunk *chunk = ts_chunk_get_by_id(cis->chunk_id, true);


This lookup is expensive to do per-row. Would be good to somehow merge with the above lookups, or cache it in the chunk insert state, or maybe check cis->chunk_compressed and cis->chunk_data_nodes first.

For every new chunk the cis code above reads the chunk twice...the api contract of the decompress_batches_for_insert is to give a Chunk to it.

I think instead of storing random attributes of the Chunk struct - it would be better to store a more lightweight version of it - even more because down below where only pretty small part of the Chunk struct is actually being used - meanwhile where scankey is being constructed the relid for the hypertable was not available easily

I've changed to use cis->... for now and load the Chunk right before calling decompress_batches_for_insert

mkindahl

Would be good if you can split up the commit comment into a description of what the problem is and what you do to solve it. It's a little hard to read and understand as a reviewer.

You also need to add a changelog entry.

src/nodes/chunk_dispatch/chunk_dispatch.c

mkindahl · 2023-04-18T09:48:47Z

src/nodes/chunk_dispatch/chunk_dispatch.c

+	{
+		Chunk *chunk = ts_chunk_get_by_id(cis->chunk_id, true);


Could you add a comment explaining why you need to re-fetch the chunk using the chunk_id from the insert state. It is not very clear why this is necessary.

the decompress_batches_for_insert needs it - so from now on every row inserted will need to have a Chunk struct.

the instructions above ts_set_compression_status is essentially bypasses the cache; and loads a new instance see here - so I'm not sure if we could trust the cache behind ts_hypertable_find_chunk_for_point ; or that will just cause trouble? (fyi: @antekresic)

for now I've decided to:

lazy-init the chunk field

load it with ts_chunk_get_by_id for now - in case its not already available

Please add a comment describing the situation. It will help your future self understanding the rationale behind it.

src/nodes/chunk_dispatch/chunk_dispatch.c

Inserting multiple rows into a compressed chunk could have bypassed constraint check in case the table had segment_by columns. Decompression is narrowed to only consider candidates by the actual segment_by value. Because of caching - decompression was skipped for follow-up rows of the same Chunk. Fixes timescale#5553

timescale-automation · 2023-04-20T20:30:02Z

Automated backport to 2.10.x not done: cherry-pick failed.

Git status

HEAD detached at origin/2.10.x
You are currently cherry-picking commit a0df8c8e.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   CHANGELOG.md
	modified:   tsl/test/expected/compression_errors.out
	modified:   tsl/test/sql/compression_errors.sql

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   src/nodes/chunk_dispatch/chunk_dispatch.c

Job log

@kovetskiy

This release includes these noteworthy features: * compressed hypertable enhancements: * UPDATE/DELETE support * ON CONFLICT DO UPDATE * Join support for hierarchical Continougs Aggregates * performance improvements **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages

@kovetskiy

This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting * timescale#5639 Support sending telemetry event reports **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages

@kovetskiy

This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting * timescale#5639 Support sending telemetry event reports **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages

@kovetskiy

This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * #5212 Allow pushdown of reference table joins * #5221 Improve Realtime Continuous Aggregate performance * #5252 Improve unique constraint support on compressed hypertables * #5339 Support UPDATE/DELETE on compressed hypertables * #5344 Enable JOINS for Hierarchical Continuous Aggregates * #5361 Add parallel support for partialize_agg() * #5417 Refactor and optimize distributed COPY * #5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * #5547 Skip Ordered Append when only 1 child node is present * #5510 Propagate vacuum/analyze to compressed chunks * #5584 Reduce decompression during constraint checking * #5530 Optimize compressed chunk resorting * #5639 Support sending telemetry event reports **Bugfixes** * #5396 Fix SEGMENTBY columns predicates to be pushed down * #5427 Handle user-defined FDW options properly * #5442 Decompression may have lost DEFAULT values * #5459 Fix issue creating dimensional constraints * #5570 Improve interpolate error message on datatype mismatch * #5573 Fix unique constraint on compressed tables * #5615 Add permission checks to run_job() * #5614 Enable run_job() for telemetry job * #5578 Fix on-insert decompression after schema changes * #5613 Quote username identifier appropriately * #5525 Fix tablespace for compressed hypertable and corresponding toast * #5642 Fix ALTER TABLE SET with normal tables * #5666 Reduce memory usage for distributed analyze * #5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages

@kovetskiy

This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting * timescale#5639 Support sending telemetry event reports **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages

@kovetskiy

This release contains new features and bug fixes since the 2.10.3 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * Support for DML operations on compressed chunks: * UPDATE/DELETE support * Support for unique constraints on compressed chunks * Support for `ON CONFLICT DO UPDATE` * Support for `ON CONFLICT DO NOTHING` * Join support for hierarchical Continuous Aggregates **Features** * timescale#5212 Allow pushdown of reference table joins * timescale#5221 Improve Realtime Continuous Aggregate performance * timescale#5252 Improve unique constraint support on compressed hypertables * timescale#5339 Support UPDATE/DELETE on compressed hypertables * timescale#5344 Enable JOINS for Hierarchical Continuous Aggregates * timescale#5361 Add parallel support for partialize_agg() * timescale#5417 Refactor and optimize distributed COPY * timescale#5454 Add support for ON CONFLICT DO UPDATE for compressed hypertables * timescale#5547 Skip Ordered Append when only 1 child node is present * timescale#5510 Propagate vacuum/analyze to compressed chunks * timescale#5584 Reduce decompression during constraint checking * timescale#5530 Optimize compressed chunk resorting * timescale#5639 Support sending telemetry event reports **Bugfixes** * timescale#5396 Fix SEGMENTBY columns predicates to be pushed down * timescale#5427 Handle user-defined FDW options properly * timescale#5442 Decompression may have lost DEFAULT values * timescale#5459 Fix issue creating dimensional constraints * timescale#5570 Improve interpolate error message on datatype mismatch * timescale#5573 Fix unique constraint on compressed tables * timescale#5615 Add permission checks to run_job() * timescale#5614 Enable run_job() for telemetry job * timescale#5578 Fix on-insert decompression after schema changes * timescale#5613 Quote username identifier appropriately * timescale#5525 Fix tablespace for compressed hypertable and corresponding toast * timescale#5642 Fix ALTER TABLE SET with normal tables * timescale#5666 Reduce memory usage for distributed analyze * timescale#5668 Fix subtransaction resource owner **Thanks** * @kovetskiy and @DZDomi for reporting peformance regression in Realtime Continuous Aggregates * @ollz272 for reporting an issue with interpolate error messages

github-actions bot assigned kgyrtkirk Apr 14, 2023

github-actions bot requested review from mkindahl and nikkhils April 14, 2023 15:24

kgyrtkirk marked this pull request as draft April 14, 2023 15:24

kgyrtkirk force-pushed the fix-compressed-unique branch 2 times, most recently from f1b4b9c to d86d2ae Compare April 17, 2023 08:55

kgyrtkirk marked this pull request as ready for review April 17, 2023 08:58

kgyrtkirk force-pushed the fix-compressed-unique branch from d86d2ae to 098b1e2 Compare April 17, 2023 09:32

svenklemm reviewed Apr 17, 2023

View reviewed changes

kgyrtkirk force-pushed the fix-compressed-unique branch from 098b1e2 to 4fa56eb Compare April 17, 2023 18:42

akuzm reviewed Apr 18, 2023

View reviewed changes

akuzm approved these changes Apr 18, 2023

View reviewed changes

mkindahl reviewed Apr 18, 2023

View reviewed changes

kgyrtkirk force-pushed the fix-compressed-unique branch 4 times, most recently from 02bb921 to 310828f Compare April 19, 2023 10:31

mkindahl approved these changes Apr 20, 2023

View reviewed changes

src/nodes/chunk_dispatch/chunk_dispatch.c Outdated Show resolved Hide resolved

kgyrtkirk force-pushed the fix-compressed-unique branch from 310828f to efc47e1 Compare April 20, 2023 06:43

kgyrtkirk enabled auto-merge (rebase) April 20, 2023 06:43

kgyrtkirk force-pushed the fix-compressed-unique branch from efc47e1 to 2570154 Compare April 20, 2023 09:14

kgyrtkirk force-pushed the fix-compressed-unique branch from 2570154 to 7c6f09b Compare April 20, 2023 10:58

kgyrtkirk merged commit a0df8c8 into timescale:main Apr 20, 2023
49 checks passed

timescale-automation added the auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) label Apr 20, 2023

kgyrtkirk mentioned this pull request May 17, 2023

Release 2.11.0 #5695

Merged

kgyrtkirk mentioned this pull request May 19, 2023

Release 2.11.0 - fix date #5702

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unique constraint on compressed tables #5573

Fix unique constraint on compressed tables #5573

kgyrtkirk commented Apr 14, 2023 •

edited

github-actions bot commented Apr 14, 2023

codecov bot commented Apr 17, 2023 •

edited

svenklemm Apr 17, 2023

kgyrtkirk Apr 17, 2023

svenklemm Apr 18, 2023

kgyrtkirk Apr 18, 2023

svenklemm Apr 17, 2023

kgyrtkirk Apr 17, 2023 •

edited

svenklemm Apr 18, 2023 •

edited

kgyrtkirk Apr 18, 2023

akuzm Apr 18, 2023

kgyrtkirk Apr 18, 2023

mkindahl left a comment

mkindahl Apr 18, 2023

kgyrtkirk Apr 18, 2023

mkindahl Apr 18, 2023

timescale-automation commented Apr 20, 2023

Fix unique constraint on compressed tables #5573

Fix unique constraint on compressed tables #5573

Conversation

kgyrtkirk commented Apr 14, 2023 • edited

github-actions bot commented Apr 14, 2023

codecov bot commented Apr 17, 2023 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgyrtkirk Apr 17, 2023 • edited

Choose a reason for hiding this comment

svenklemm Apr 18, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkindahl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timescale-automation commented Apr 20, 2023

Git status

kgyrtkirk commented Apr 14, 2023 •

edited

codecov bot commented Apr 17, 2023 •

edited

kgyrtkirk Apr 17, 2023 •

edited

svenklemm Apr 18, 2023 •

edited