Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for inserts into compressed chunks #3230

Merged
merged 10 commits into from May 24, 2021
Merged

Conversation

gayyappan
Copy link
Contributor

@gayyappan gayyappan commented May 17, 2021

Disable-check: commit-count

@codecov
Copy link

codecov bot commented May 17, 2021

Codecov Report

Merging #3230 (41a3a37) into master (45462c7) will increase coverage by 0.21%.
The diff coverage is 94.13%.

❗ Current head 41a3a37 differs from pull request most recent head a1e9bf5. Consider uploading reports for the commit a1e9bf5 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3230      +/-   ##
==========================================
+ Coverage   90.26%   90.47%   +0.21%     
==========================================
  Files         216      216              
  Lines       35720    35986     +266     
==========================================
+ Hits        32241    32557     +316     
+ Misses       3479     3429      -50     
Impacted Files Coverage Δ
src/chunk.h 100.00% <ø> (ø)
src/compat.h 100.00% <ø> (ø)
tsl/src/compression/compression.h 0.00% <ø> (ø)
tsl/src/init.c 83.33% <ø> (ø)
tsl/src/bgw_policy/compression_api.c 80.00% <63.15%> (-3.17%) ⬇️
tsl/src/compression/compress_utils.c 93.96% <87.23%> (-1.67%) ⬇️
src/indexing.c 95.34% <88.23%> (-0.79%) ⬇️
src/nodes/chunk_dispatch_state.c 95.09% <94.44%> (-0.20%) ⬇️
src/nodes/chunk_insert_state.c 98.07% <96.66%> (+0.45%) ⬆️
tsl/src/bgw_policy/job.c 97.76% <96.66%> (+0.11%) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45462c7...a1e9bf5. Read the comment docs.

@gayyappan gayyappan requested a review from afiskon May 17, 2021 14:28
@svenklemm svenklemm force-pushed the mutable_compression branch 3 times, most recently from 470942d to 5e8c5cf Compare May 18, 2021 02:38
@svenklemm svenklemm self-requested a review May 18, 2021 20:11
@gayyappan gayyappan marked this pull request as ready for review May 18, 2021 22:28
@gayyappan gayyappan requested a review from a team as a code owner May 18, 2021 22:28

typedef enum ChunkCompressionStatus
{
CHUNK_COMPRESS_NONE = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have this enum and then we have CHUNK_STATUS_UNORDERED type #defines as well..

Copy link
Contributor

@mfundul mfundul May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #defines are part of the database protocol and ideally will never change. They describe the chunk status column, and can be bitwise ORed. This is an internal in-memory state for some limited scope logic here. It's mostly to separate the cases of the return values of ts_chunk_get_compression_status().

My suggestion was to change CHUNK_DROPPED to CHUNK_COMPRESS_DROPPED for this reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I was also a bit confused by this, and the inconsistent naming between enums and flags. For instance, "unordered" having the "compress" prefix in one case (CHUNK_COMPRESS_UNORDERED) but not the other CHUNK_STATUS_UNORDERED).

Copy link
Contributor

@nikkhils nikkhils left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few queries, but overall looks very good to me.

Copy link
Contributor

@erimatnor erimatnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locks solid in terms of functionality. Have some suggestions, questions, and nits that I think would be good to look at and potentially address.

sql/maintenance_utils.sql Outdated Show resolved Hide resolved
src/chunk.c Outdated Show resolved Hide resolved
src/chunk.c Outdated Show resolved Hide resolved

typedef enum ChunkCompressionStatus
{
CHUNK_COMPRESS_NONE = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I was also a bit confused by this, and the inconsistent naming between enums and flags. For instance, "unordered" having the "compress" prefix in one case (CHUNK_COMPRESS_UNORDERED) but not the other CHUNK_STATUS_UNORDERED).

src/copy.c Show resolved Hide resolved
---------------------------------+--------+-----------+------------------
compressed_chunk_insert_blocker | 7 | O | _hyper_1_2_chunk
(1 row)
tgname | tgtype | tgenabled | relname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should now remove this whole check if the insert blocker is no longer needed?

h.table_name AS hypertable_name,
c.schema_name as chunk_schema,
c.table_name as chunk_name,
c.status as chunk_status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to convert the status to a human readable format with, e.g., a CASE statement. For instance, "compressed", "not_compressed", etc.

tsl/test/expected/compression_bgw.out Outdated Show resolved Hide resolved
tsl/src/compression/compress_utils.c Outdated Show resolved Hide resolved
@@ -223,3 +223,158 @@ SELECT COUNT(*) AS dropped_chunks_count
SELECT add_compression_policy AS job_id
FROM add_compression_policy('conditions', INTERVAL '1 day') \gset
CALL run_job(:job_id);
\i include/recompress_basic.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just note that most of the added testing here seems to be manual compression/recompression and not related to background jobs. Only found one call to add_compression_policy, so the question is if this is the appropriate test file to have the majority of these tests?

@erimatnor
Copy link
Contributor

I have a question about the compression policy. Now that we do either compress or recompress in the policy, together with the fact that the policy only compresses one chunk in each run, is there a risk that we end up in a state where the compression policy makes no real progress and keeps recompressing the same chunk every time it runs.

I am thinking of a situation where regular inserts into the an already compressed chunk causes it to be recompressed over-and-over again by the policy.

@erimatnor
Copy link
Contributor

Suggestion: Add CHANGELOG entry.

erimatnor added a commit to erimatnor/timescaledb that referenced this pull request May 21, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. It also adds support for triggers and compression
policies on distributed hypertables.

The bug fixes in this release addresses issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* timescale#3116 Add distributed hypertable compression policies
* timescale#3162 Use COPY when executing distributed INSERTs
* timescale#3199 Add GENERATED column support on distributed hypertables
* timescale#3210 Add trigger support on distributed hypertables
* timescale#3230 Support for inserts into compressed chunks

**Bugfixes**
* timescale#3209 Propagate grants to compressed hypertables
* timescale#3229 Use correct lock mode when updating chunk
* timescale#3241 Fix assertion failure in decompress_chunk_plan_create
* timescale#3243 Fix assertion failure in decompress_chunk_plan_create
* timescale#3250 Fix constraint triggers on hypertables
* timescale#3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* timescale#3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
@erimatnor erimatnor mentioned this pull request May 21, 2021
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request May 21, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. It also adds support for triggers and compression
policies on distributed hypertables.

The bug fixes in this release addresses issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* timescale#3116 Add distributed hypertable compression policies
* timescale#3162 Use COPY when executing distributed INSERTs
* timescale#3199 Add GENERATED column support on distributed hypertables
* timescale#3210 Add trigger support on distributed hypertables
* timescale#3230 Support for inserts into compressed chunks

**Bugfixes**
* timescale#3209 Propagate grants to compressed hypertables
* timescale#3229 Use correct lock mode when updating chunk
* timescale#3241 Fix assertion failure in decompress_chunk_plan_create
* timescale#3243 Fix assertion failure in decompress_chunk_plan_create
* timescale#3250 Fix constraint triggers on hypertables
* timescale#3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* timescale#3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request May 21, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. It also adds support for triggers and compression
policies on distributed hypertables.

The bug fixes in this release addresses issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* timescale#3116 Add distributed hypertable compression policies
* timescale#3162 Use COPY when executing distributed INSERTs
* timescale#3199 Add GENERATED column support on distributed hypertables
* timescale#3210 Add trigger support on distributed hypertables
* timescale#3230 Support for inserts into compressed chunks

**Bugfixes**
* timescale#3209 Propagate grants to compressed hypertables
* timescale#3229 Use correct lock mode when updating chunk
* timescale#3241 Fix assertion failure in decompress_chunk_plan_create
* timescale#3243 Fix assertion failure in decompress_chunk_plan_create
* timescale#3250 Fix constraint triggers on hypertables
* timescale#3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* timescale#3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
@gayyappan
Copy link
Contributor Author

I have a question about the compression policy. Now that we do either compress or recompress in the policy, together with the fact that the policy only compresses one chunk in each run, is there a risk that we end up in a state where the compression policy makes no real progress and keeps recompressing the same chunk every time it runs.

I am thinking of a situation where regular inserts into the an already compressed chunk causes it to be recompressed over-and-over again by the policy.

Yes, I think this is possible if inserts keep going back to the same chunk. Sven and I discussed a mitigation strategy for this: basically allow users to separate this into 2 separate policies: a) compression_policy and b) a recompression policy so that the recompression does not affect the compression. Plan to make these changes.
This still allows the possibility of recompression policy repeatedly going back to the same chunk (assuming that particular pattern of inserts). We would need to keep more information (like last time the job was run etc) to prioritize the jobs in that case. Defer this part for later

src/indexing.c Outdated Show resolved Hide resolved

-- test if default value for b and sequence value for id is used
INSERT INTO vessels(timec, i, t) values('2020-01-02 10:16:00-05' , 11, 'default' );
COPY vessels(timec,i,t )FROM STDIN DELIMITER ',';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also have a test with COPY directly into the chunk when it is compressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It needs to go through chunk_insert_state to get directed to the correct chunk.

Add CompressRowSingleState .
This has functions to compress a single row.
Support defaults, sequences and check constraints with inserts
into compressed chunks
@svenklemm svenklemm force-pushed the mutable_compression branch 2 times, most recently from 78d27b0 to 8d9e40b Compare May 24, 2021 21:15
Compressed chunks with inserts after being compressed have batches
that are not ordered according to compress_orderby for those
chunks we cannot set pathkeys on the DecompressChunk node and we
need an extra sort step if we require ordered output from those
chunks.
Remove the chunk_dml_blocker trigger which was used to prevent
INSERTs into compressed chunks.
gayyappan and others added 5 commits May 24, 2021 23:45
After inserts go into a compressed chunk, the chunk is marked as
unordered.This PR adds a new function recompress_chunk that
compresses the data and sets the status back to compressed. Further
optimizations for this function are planned but not part of this PR.

This function can be invoked by calling
SELECT recompress_chunk(<chunk_name>).

recompress_chunk function is automatically invoked by the compression
policy job, when it sees that a chunk is in unordered state.
Add a test case for copy on distr. hypertables with compressed chunks.
verifies that recompress_chunk and compression policy work as expected.
Additional changes include:
Clean up commented code
Make use of BulkInsertState optional in row compressor
Add test for insert into compressed chunk by a different role
other than the owner
Two insert transactions could potentially try
to update the chunk status to unordered. This results in
one of the transactions failing with a tuple concurrently
update error.
Before updating status, lock the tuple for update, thus
forcing the other transaction to wait for the tuple lock, then
check status column value and update it if needed.
This patch adds a recompress procedure that may be used as custom
job when compression and recompression should run as separate
background jobs.
@gayyappan gayyappan merged commit fe872cb into master May 24, 2021
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request May 25, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. Distributed hypertables now also support triggers and
compression policies.

The bug fixes in this release address issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* timescale#3116 Add distributed hypertable compression policies
* timescale#3162 Use COPY when executing distributed INSERTs
* timescale#3199 Add GENERATED column support on distributed hypertables
* timescale#3210 Add trigger support on distributed hypertables
* timescale#3230 Support for inserts into compressed chunks

**Bugfixes**
* timescale#3213 Propagate grants to compressed hypertables
* timescale#3229 Use correct lock mode when updating chunk
* timescale#3243 Fix assertion failure in decompress_chunk_plan_create
* timescale#3250 Fix constraint triggers on hypertables
* timescale#3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* timescale#3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request May 25, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. Distributed hypertables now also support triggers and
compression policies.

The bug fixes in this release address issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* timescale#3116 Add distributed hypertable compression policies
* timescale#3162 Use COPY when executing distributed INSERTs
* timescale#3199 Add GENERATED column support on distributed hypertables
* timescale#3210 Add trigger support on distributed hypertables
* timescale#3230 Support for inserts into compressed chunks

**Bugfixes**
* timescale#3213 Propagate grants to compressed hypertables
* timescale#3229 Use correct lock mode when updating chunk
* timescale#3243 Fix assertion failure in decompress_chunk_plan_create
* timescale#3250 Fix constraint triggers on hypertables
* timescale#3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* timescale#3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request May 25, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. Distributed hypertables now also support triggers and
compression policies.

The bug fixes in this release address issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* timescale#3116 Add distributed hypertable compression policies
* timescale#3162 Use COPY when executing distributed INSERTs
* timescale#3199 Add GENERATED column support on distributed hypertables
* timescale#3210 Add trigger support on distributed hypertables
* timescale#3230 Support for inserts into compressed chunks

**Bugfixes**
* timescale#3213 Propagate grants to compressed hypertables
* timescale#3229 Use correct lock mode when updating chunk
* timescale#3243 Fix assertion failure in decompress_chunk_plan_create
* timescale#3250 Fix constraint triggers on hypertables
* timescale#3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* timescale#3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
* @fabriziomello for documenting the prerequisites when compiling against PostgreSQL 13
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request May 25, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. Distributed hypertables now also support triggers and
compression policies.

The bug fixes in this release address issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* timescale#3116 Add distributed hypertable compression policies
* timescale#3162 Use COPY when executing distributed INSERTs
* timescale#3199 Add GENERATED column support on distributed hypertables
* timescale#3210 Add trigger support on distributed hypertables
* timescale#3230 Support for inserts into compressed chunks

**Bugfixes**
* timescale#3213 Propagate grants to compressed hypertables
* timescale#3229 Use correct lock mode when updating chunk
* timescale#3243 Fix assertion failure in decompress_chunk_plan_create
* timescale#3250 Fix constraint triggers on hypertables
* timescale#3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* timescale#3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
* @fabriziomello for documenting the prerequisites when compiling against PostgreSQL 13
erimatnor added a commit to erimatnor/timescaledb that referenced this pull request May 25, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. Distributed hypertables now also support triggers and
compression policies.

The bug fixes in this release address issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* timescale#3116 Add distributed hypertable compression policies
* timescale#3162 Use COPY when executing distributed INSERTs
* timescale#3199 Add GENERATED column support on distributed hypertables
* timescale#3210 Add trigger support on distributed hypertables
* timescale#3230 Support for inserts into compressed chunks

**Bugfixes**
* timescale#3213 Propagate grants to compressed hypertables
* timescale#3229 Use correct lock mode when updating chunk
* timescale#3243 Fix assertion failure in decompress_chunk_plan_create
* timescale#3250 Fix constraint triggers on hypertables
* timescale#3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* timescale#3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
* @fabriziomello for documenting the prerequisites when compiling against PostgreSQL 13
erimatnor added a commit that referenced this pull request May 25, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. Distributed hypertables now also support triggers and
compression policies.

The bug fixes in this release address issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* #3116 Add distributed hypertable compression policies
* #3162 Use COPY when executing distributed INSERTs
* #3199 Add GENERATED column support on distributed hypertables
* #3210 Add trigger support on distributed hypertables
* #3230 Support for inserts into compressed chunks

**Bugfixes**
* #3213 Propagate grants to compressed hypertables
* #3229 Use correct lock mode when updating chunk
* #3243 Fix assertion failure in decompress_chunk_plan_create
* #3250 Fix constraint triggers on hypertables
* #3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* #3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
* @fabriziomello for documenting the prerequisites when compiling against PostgreSQL 13
erimatnor added a commit that referenced this pull request May 25, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. Distributed hypertables now also support triggers and
compression policies.

The bug fixes in this release address issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* #3116 Add distributed hypertable compression policies
* #3162 Use COPY when executing distributed INSERTs
* #3199 Add GENERATED column support on distributed hypertables
* #3210 Add trigger support on distributed hypertables
* #3230 Support for inserts into compressed chunks

**Bugfixes**
* #3213 Propagate grants to compressed hypertables
* #3229 Use correct lock mode when updating chunk
* #3243 Fix assertion failure in decompress_chunk_plan_create
* #3250 Fix constraint triggers on hypertables
* #3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* #3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
* @fabriziomello for documenting the prerequisites when compiling against PostgreSQL 13
erimatnor added a commit that referenced this pull request May 25, 2021
This release adds major new features since the 2.2.1 release. We deem
it moderate priority for upgrading.

This release adds support for inserting data into compressed chunks
and improves performance when inserting data into distributed
hypertables. Distributed hypertables now also support triggers and
compression policies.

The bug fixes in this release address issues related to the handling
of privileges on compressed hypertables, locking, and triggers with
transition tables.

**Features**
* #3116 Add distributed hypertable compression policies
* #3162 Use COPY when executing distributed INSERTs
* #3199 Add GENERATED column support on distributed hypertables
* #3210 Add trigger support on distributed hypertables
* #3230 Support for inserts into compressed chunks

**Bugfixes**
* #3213 Propagate grants to compressed hypertables
* #3229 Use correct lock mode when updating chunk
* #3243 Fix assertion failure in decompress_chunk_plan_create
* #3250 Fix constraint triggers on hypertables
* #3251 Fix segmentation fault due to incorrect call to chunk_scan_internal
* #3252 Fix blocking triggers with transition tables

**Thanks**
* @yyjdelete for reporting a crash with decompress_chunk and identifying the bug in the code
* @fabriziomello for documenting the prerequisites when compiling against PostgreSQL 13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants