New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server crash when using duplicate segmentby column #6044
Conversation
Codecov Report
@@ Coverage Diff @@
## main #6044 +/- ##
==========================================
- Coverage 81.40% 76.10% -5.30%
==========================================
Files 243 242 -1
Lines 55940 53150 -2790
Branches 12385 11675 -710
==========================================
- Hits 45538 40450 -5088
+ Misses 8083 7668 -415
- Partials 2319 5032 +2713
... and 209 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more 📢 Have feedback on the report? Share it here. |
@mkindahl, @fabriziomello: please review this pull request.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but we should do the same test for compress_orderby
.
@@ -99,6 +99,7 @@ ALTER TABLE foo set (timescaledb.compress, timescaledb.compress_segmentby = 'ran | |||
ALTER TABLE foo set (timescaledb.compress, timescaledb.compress_segmentby = 'c LIMIT 1'); | |||
ALTER TABLE foo set (timescaledb.compress, timescaledb.compress_segmentby = 'c + b'); | |||
ALTER TABLE foo set (timescaledb.compress, timescaledb.compress_orderby = 'a, p'); | |||
ALTER TABLE foo set (timescaledb.compress, timescaledb.compress_segmentby = 'b, b'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test if duplicated compress_orderby
have the same problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mkindahl for pointing this. I have updated the patch to handle duplicate columns as part of both compress_segmentby
and compress_orderby
options.
cf28cdf
to
d42ad8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit that just looks strange, but approving since the code works anyway.
tsl/src/compression/create.c
Outdated
segorder_colindex[col_attno - 1] = i++; | ||
colindex[col_attno - 1] = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be?
segorder_colindex[col_attno - 1] = i++; | |
colindex[col_attno - 1] = i; | |
segorder_colindex[col_attno - 1] = i; | |
colindex[col_attno - 1] = i++; |
It does not really make a difference since the value just has to be non-zero, but if you start to debug this, it will look strange. You could also just set it to 1 (or any other non-zero value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, changed this to colindex[col_attno - 1] = 1
.
tsl/src/compression/create.c
Outdated
segorder_colindex[col_attno - 1] = i++; | ||
colindex[col_attno - 1] = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, shouldn't this be:
segorder_colindex[col_attno - 1] = i++; | |
colindex[col_attno - 1] = i; | |
segorder_colindex[col_attno - 1] = i; | |
colindex[col_attno - 1] = i++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The segmentby column info array is populated by using the column attribute number as an array index. This is done as part of validating and creating segment by column info in function `compresscolinfo_init`. Since the column is duplicated the attribute number for both the segmentby column is same. When this attribute number is used as an index, only one of the array element is populated correctly with the detailed column info whareas the other element of the array ramins NULL. This segmentby column info is updated in catalog as part of processing compression options (ALTER TABLE ...). When the chunk is being compressed this segmentby column information is being retrieved from the catalog to create the scan key in order to identify any existing index on the table that matches the segmentby column. Out of the two keys one key gets updated correctly whereas the second key contains NULL values. This results into a crash during index scan to identify any existing index on the table. The proposed change avoid this crash by raising an error if user has specified duplicated columns as part of compress_segmentby or compress_orderby options. Also, added postgresql-client package in linux-32bit build dependencies to avoid failure as part of uploading the regression results.
The Linux i386 regression failure ( |
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * timescale#5137 Insert into index during chunk compression * timescale#5150 MERGE support on hypertables * timescale#5515 Make hypertables support replica identity * timescale#5586 Index scan support during UPDATE/DELETE on compressed hypertables * timescale#5596 Support for partial aggregations at chunk level * timescale#5599 Enable ChunkAppend for partially compressed chunks * timescale#5655 Improve the number of parallel workers for decompression * timescale#5758 Enable altering job schedule type through `alter_job` * timescale#5805 Make logrepl markers for (partial) decompressions * timescale#5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * timescale#5839 Support CAgg names in chunk_detailed_size * timescale#5852 Make set_chunk_time_interval CAggs aware * timescale#5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * timescale#5875 Add job exit status and runtime to log * timescale#5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * timescale#5860 Fix interval calculation for hierarchical CAggs * timescale#5894 Check unique indexes when enabling compression * timescale#5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * timescale#5988 Move functions to _timescaledb_functions schema * timescale#5788 Chunk_create must add an existing table or fail * timescale#5872 Fix duplicates on partially compressed chunk reads * timescale#5918 Fix crash in COPY from program returning error * timescale#5990 Place data in first/last function in correct mctx * timescale#5991 Call eq_func correctly in time_bucket_gapfill * timescale#6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * timescale#6035 Fix server crash on UPDATE of compressed chunk * timescale#6044 Fix server crash when using duplicate segmentby column * timescale#6045 Fix segfault in set_integer_now_func * timescale#6053 Fix approximate_row_count for CAggs * timescale#6081 Improve compressed DML datatype handling * timescale#6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes * #6102 Schedule compression policy more often **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
This release contains performance improvements for compressed hypertables and continuous aggregates and bug fixes since the 2.11.2 release. We recommend that you upgrade at the next available opportunity. This release moves all internal functions from the _timescaleb_internal schema into the _timescaledb_functions schema. This separates code from internal data objects and improves security by allowing more restrictive permissions for the code schema. If you are calling any of those internal functions you should adjust your code as soon as possible. This version also includes a compatibility layer that allows calling them in the old location but that layer will be removed in 2.14.0. **PostgreSQL 12 support removal announcement** Following the deprecation announcement for PostgreSQL 12 in TimescaleDB 2.10, PostgreSQL 12 is not supported starting with TimescaleDB 2.12. Currently supported PostgreSQL major versions are 13, 14 and 15. PostgreSQL 16 support will be added with a following TimescaleDB release. **Features** * #5137 Insert into index during chunk compression * #5150 MERGE support on hypertables * #5515 Make hypertables support replica identity * #5586 Index scan support during UPDATE/DELETE on compressed hypertables * #5596 Support for partial aggregations at chunk level * #5599 Enable ChunkAppend for partially compressed chunks * #5655 Improve the number of parallel workers for decompression * #5758 Enable altering job schedule type through `alter_job` * #5805 Make logrepl markers for (partial) decompressions * #5809 Relax invalidation threshold table-level lock to row-level when refreshing a Continuous Aggregate * #5839 Support CAgg names in chunk_detailed_size * #5852 Make set_chunk_time_interval CAggs aware * #5868 Allow ALTER TABLE ... REPLICA IDENTITY (FULL|INDEX) on materialized hypertables (continuous aggregates) * #5875 Add job exit status and runtime to log * #5909 CREATE INDEX ONLY ON hypertable creates index on chunks **Bugfixes** * #5860 Fix interval calculation for hierarchical CAggs * #5894 Check unique indexes when enabling compression * #5951 _timescaledb_internal.create_compressed_chunk doesn't account for existing uncompressed rows * #5988 Move functions to _timescaledb_functions schema * #5788 Chunk_create must add an existing table or fail * #5872 Fix duplicates on partially compressed chunk reads * #5918 Fix crash in COPY from program returning error * #5990 Place data in first/last function in correct mctx * #5991 Call eq_func correctly in time_bucket_gapfill * #6015 Correct row count in EXPLAIN ANALYZE INSERT .. ON CONFLICT output * #6035 Fix server crash on UPDATE of compressed chunk * #6044 Fix server crash when using duplicate segmentby column * #6045 Fix segfault in set_integer_now_func * #6053 Fix approximate_row_count for CAggs * #6081 Improve compressed DML datatype handling * #6084 Propagate parameter changes to decompress child nodes * #6102 Schedule compression policy more often **Thanks** * @ajcanterbury for reporting a problem with lateral joins on compressed chunks * @alexanderlaw for reporting multiple server crashes * @lukaskirner for reporting a bug with monthly continuous aggregates * @mrksngl for reporting a bug with unusual user names * @willsbit for reporting a crash in time_bucket_gapfill
The segmentby column info array is populated by using the column attribute number as an array index. This is done as part of validating and creating segment by column info in function
compresscolinfo_init
.Since the column is duplicated the attribute number for both the segmentby column is same. When this attribute number is used as an index, only one of the array element is populated correctly with the detailed column info whareas the other element of the array ramins NULL. This segmentby column info is updated in catalog as part of processing compression options (ALTER TABLE ...).
When the chunk is being compressed this segmentby column information is being retrieved from the catalog to create the scan key in order to identify any existing index on the table that matches the segmentby column. Out of the two keys one key gets updated correctly whereas the second key contains NULL values. This results into a crash during index scan to identify any existing index on the table.
The proposed change avoid this crash by raising an error if user has specified duplicated columns as part of compress_segmentby option.