Fix segfault after column drop on compressed table #5462

kgyrtkirk · 2023-03-17T17:36:07Z

Decompression produces records which have all the decompressed data set, but it also retains the fields which are used internally during decompression.
These didn't cause any problem - unless an operation is being done with the whole row - in which case all the fields which have ended up being non-null can be a potential segfault source.

Fixes #5458 #5411

github-actions · 2023-03-17T17:36:29Z

@nikkhils, @fabriziomello: please review this pull request.

Powered by pull-review

codecov · 2023-03-20T08:21:59Z

Codecov Report

Merging #5462 (742a48c) into main (c2941a3) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head 742a48c differs from pull request most recent head d170d5a. Consider uploading reports for the commit d170d5a to get more accurate results

@@            Coverage Diff             @@
##             main    #5462      +/-   ##
==========================================
- Coverage   90.75%   90.72%   -0.04%     
==========================================
  Files         229      229              
  Lines       53707    53702       -5     
==========================================
- Hits        48742    48721      -21     
- Misses       4965     4981      +16

Impacted Files	Coverage Δ
tsl/src/nodes/decompress_chunk/exec.c	`93.69% <100.00%> (ø)`

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

akuzm · 2023-03-21T12:53:27Z

tsl/src/nodes/decompress_chunk/exec.c

+		/* some technical columns may left unpopulated in the slot;
+		 * setting isnull prevents issues for row expressions */
+		for (i = 0; i < slot->tts_tupleDescriptor->natts; i++)
+		{
+			slot->tts_isnull[i] = true;
+		}
+


Which columns exactly? Shouldn't drop column battery_status drop them as well?

Probably makes sense to add a test for whole-row var + compressed chunks with diffent columns, like in compression_alter.sql. I'm not sure how row type conversion works in this case, and why it even tries to read these columns that are not in the resulting row...

after the drop there is a 3rd column in tts_tupleDescriptor with the name : {data = "........pg.dropped.3........", '\000' <repeats 35 times>} which is not populated

I've looked into fixing it differently but the removed column can be anywhere in the row - so its not that easy to erase it; we also have have a projection embedded around here which further complicate things

The problem is the projections from the 2 chunks don't match. The compressed chunk output should be transformed to match that of the uncompressed chunk.
Compare the plan below with select * from readings. We already have code that adjusts for select *. Need to check why that is not working for this case.

test=# explain verbose select c from readings c; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Append (cost=1.01..43.76 rows=2850 width=32) -> Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_9_57_chunk c_1 (cost=1.01..1.01 rows=1000 width=32) Output: c_1.* -> Seq Scan on _timescaledb_internal.compress_hyper_10_58_chunk (cost=0.00..1.01 rows=1 width=44) Output: compress_hyper_10_58_chunk._ts_meta_count, compress_hyper_10_58_chunk."time", compress_hyper_10_58_chunk.battery_temperature -> Seq Scan on _timescaledb_internal._hyper_9_59_chunk c_2 (cost=0.00..28.50 rows=1850 width=32) Output: c_2.* (7 rows)

See decompress_chunk.c for whole_row var handling

I've debugged this a bit more and I'm now more convinced that this is the right solution; let me try to walk you thru why:

decompression with uncompressed rows works by utilizing an Append node to concatenate the rows of 2 operators

Append also has the requirement that the columns should match-up between its operands

in case there are exact columns requests this is not that big of an issue - they are being placed into the targetlist; and as long as the number/types of the targets are in sync I guess Append is happy to do its job

note that in this case we are selecting specific columns from the standard hypertable as well

if we have to select the whole row - it will do it by adding a Var with attno=0; there will be 1 column with a complex type...

whole-rows for compression is handled by interpreting the attno=0 and expanding them for compressed_rel ; to help build the underlying SeqScan on the compressed table.

based on the actual scanned column list on the compressed table the compression_map is computed which will be there during execution to route inputs to outputs after decompression

to make Append work the code actually piggybacks on the original rel ; by copying its targetlist for the decompression path

however this means that in the above path it will copy the row reference as well: c_2.*

this will make Append happy because there is no difference between the 2 operands output 👍

the hypertable already have a column dropped!

there will be a pg.dropped column in the output rowtype of of Seq Scan c_2 for c_2.*

because this rowtype was copied to be the output of the DecompressChunk; the deleted column will appear there as well...but since it was never really planned to get that column (whole_row stuff in decompress_chunk) - compression_map will not have any instructions how to fill it based on the actual input

I think the removal of pg.dropped. columns is done internally by postgres automatically

I think append will not work correctly unless all branches return the same rowtype. And that's why I think the options are:

go with the current fix

or it might be possible to pre-determine which columns will not have their is_null populated

map: probably it might be possible to add some marker to the decompress_map to handle unknown output columns and set is_null based on that

wild1: if the input contains whole-row stuff; reference all columns directly for both the hypertable and for the compressed; move all expression evaluation into a new projection on top of the append

wild2: push-in a projection for the standard seq-scan part to remove these removed columns

other ideas?

@gayyappan @akuzm Could you please take another look?

@svenklemm ; replying to your comment on the other PR here instead.

row schema for the compressed path is copied over from the non-compressed hypertable; which may have dropped columns.

CompressionInfo.chunk_rel is the base hypertable relation

decompression uses that to declare that it could a relation like that:

timescaledb/tsl/src/nodes/decompress_chunk/decompress_chunk.c

Line 1199 in 22841ab

path->cpath.path.parent = info->chunk_rel;

actually this kinda makes sense ; because for a compressed chunk that's the "closest thing" to what the compression will be producing

We could try other approaches...but setting the isnull bools to true by default can fix this segfault harmlessly without extensive changes

mkindahl

I ran into a similar problem in #4268. It seems like some of the code in PostgresSQL does not check the attisdropped column and rely on these columns to be NULL. Check, for instance, heap_compute_data_size and toast_build_flattened_tuple which is only checking for NULL, but not checking if the column is dropped.

Also check how make_tuple_from_row in pl_exec.c handles dropped columns by setting them to null.

Approving since I think this demonstrably solves the problem and does not really risk introducing any new bugs.

tsl/src/nodes/decompress_chunk/exec.c

tsl/test/sql/compression_errors.sql

fabriziomello

Left some minor cosmetic comments that is up to you. Approved anyway!!!

Decompression produces records which have all the decompressed data set, but it also retains the fields which are used internally during decompression. These didn't cause any problem - unless an operation is being done with the whole row - in which case all the fields which have ended up being non-null can be a potential segfault source. Fixes timescale#5458 timescale#5411

timescale-automation · 2023-04-06T06:51:29Z

Automated backport to 2.10.x not done: cherry-pick failed.

Git status

HEAD detached at origin/2.10.x
You are currently cherry-picking commit 975e9ca1.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   CHANGELOG.md
	modified:   tsl/test/expected/compression_errors.out
	modified:   tsl/test/sql/compression_errors.sql

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   tsl/src/nodes/decompress_chunk/exec.c

Job log

kgyrtkirk · 2023-04-06T08:51:49Z

Thank you @mkindahl and @fabriziomello for reviewing this PR!

@nikolaps

## 2.10.2 (2023-04-20) **Bugfixes** * timescale#5410 Fix file trailer handling in the COPY fetcher * timescale#5446 Add checks for malloc failure in libpq calls * timescale#5233 Out of on_proc_exit slots on guc license change * timescale#5428 Use consistent snapshots when scanning metadata * timescale#5499 Do not segfault on large histogram() parameters * timescale#5470 Ensure superuser perms during copy/move chunk * timescale#5500 Fix when no FROM clause in continuous aggregate definition * timescale#5433 Fix join rte in CAggs with joins * timescale#5556 Fix duplicated entries on timescaledb_experimental.policies view * timescale#5462 Fix segfault after column drop on compressed table * timescale#5543 Copy scheduled_jobs list before sorting it * timescale#5497 Allow named time_bucket arguments in Cagg definition * timescale#5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * timescale#5558 Use regrole for job owner * timescale#5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket

@nikolaps

## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket

@nikolaps

## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket

@nikolaps

## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket

@nikolaps

## 2.10.2 (2023-04-20) **Bugfixes** * timescale#5410 Fix file trailer handling in the COPY fetcher * timescale#5446 Add checks for malloc failure in libpq calls * timescale#5233 Out of on_proc_exit slots on guc license change * timescale#5428 Use consistent snapshots when scanning metadata * timescale#5499 Do not segfault on large histogram() parameters * timescale#5470 Ensure superuser perms during copy/move chunk * timescale#5500 Fix when no FROM clause in continuous aggregate definition * timescale#5433 Fix join rte in CAggs with joins * timescale#5556 Fix duplicated entries on timescaledb_experimental.policies view * timescale#5462 Fix segfault after column drop on compressed table * timescale#5543 Copy scheduled_jobs list before sorting it * timescale#5497 Allow named time_bucket arguments in Cagg definition * timescale#5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * timescale#5558 Use regrole for job owner * timescale#5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket

@nikolaps

## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket

@nikolaps

## 2.10.2 (2023-04-20) **Bugfixes** * #5410 Fix file trailer handling in the COPY fetcher * #5446 Add checks for malloc failure in libpq calls * #5233 Out of on_proc_exit slots on guc license change * #5428 Use consistent snapshots when scanning metadata * #5499 Do not segfault on large histogram() parameters * #5470 Ensure superuser perms during copy/move chunk * #5500 Fix when no FROM clause in continuous aggregate definition * #5433 Fix join rte in CAggs with joins * #5556 Fix duplicated entries on timescaledb_experimental.policies view * #5462 Fix segfault after column drop on compressed table * #5543 Copy scheduled_jobs list before sorting it * #5497 Allow named time_bucket arguments in Cagg definition * #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket * #5558 Use regrole for job owner * #5542 Enable indexscan on uncompressed part of partially compressed chunks **Thanks** * @nikolaps for reporting an issue with the COPY fetcher * @S-imo-n for reporting the issue on Background Worker Scheduler crash * @geezhu for reporting issue on segfault in historgram() * @mwahlhuetter for reporting the issue with joins in CAggs * @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view * @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket

github-actions bot assigned kgyrtkirk Mar 17, 2023

github-actions bot requested review from fabriziomello and nikkhils March 17, 2023 17:36

kgyrtkirk added compression segfault Segmentation fault labels Mar 17, 2023

kgyrtkirk force-pushed the comp-rowe-segfault-fix branch 2 times, most recently from 888ed3d to 63f20d1 Compare March 20, 2023 08:08

akuzm reviewed Mar 21, 2023

View reviewed changes

kgyrtkirk mentioned this pull request Mar 27, 2023

Fix SELECT query on compressed hypertable having reference to whole-row var expression #5488

Closed

kgyrtkirk requested review from gayyappan and akuzm March 31, 2023 06:22

kgyrtkirk added this to the TimescaleDB 2.10.2 milestone Mar 31, 2023

mkindahl approved these changes Apr 4, 2023

View reviewed changes

tsl/src/nodes/decompress_chunk/exec.c Outdated Show resolved Hide resolved

kgyrtkirk force-pushed the comp-rowe-segfault-fix branch from 63f20d1 to 579b5c5 Compare April 4, 2023 11:57

fabriziomello reviewed Apr 4, 2023

View reviewed changes

tsl/test/sql/compression_errors.sql Outdated Show resolved Hide resolved

fabriziomello reviewed Apr 4, 2023

View reviewed changes

tsl/test/sql/compression_errors.sql Outdated Show resolved Hide resolved

fabriziomello reviewed Apr 4, 2023

View reviewed changes

tsl/test/sql/compression_errors.sql Outdated Show resolved Hide resolved

fabriziomello reviewed Apr 4, 2023

View reviewed changes

tsl/test/sql/compression_errors.sql Outdated Show resolved Hide resolved

fabriziomello approved these changes Apr 4, 2023

View reviewed changes

kgyrtkirk force-pushed the comp-rowe-segfault-fix branch from 579b5c5 to 742a48c Compare April 5, 2023 06:25

kgyrtkirk force-pushed the comp-rowe-segfault-fix branch from 742a48c to d170d5a Compare April 5, 2023 06:27

kgyrtkirk enabled auto-merge (rebase) April 5, 2023 10:08

kgyrtkirk merged commit 975e9ca into timescale:main Apr 6, 2023
47 of 48 checks passed

timescale-automation added the auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) label Apr 6, 2023

akuzm mentioned this pull request Apr 14, 2023

Backport to 2.10.x: Fix segfault after column drop on compressed table #5565

Merged

timescale-automation added the backported-2.10.x label Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segfault after column drop on compressed table #5462

Fix segfault after column drop on compressed table #5462

kgyrtkirk commented Mar 17, 2023

github-actions bot commented Mar 17, 2023

codecov bot commented Mar 20, 2023 •

edited

akuzm Mar 21, 2023

kgyrtkirk Mar 21, 2023

gayyappan Mar 21, 2023

gayyappan Mar 21, 2023

kgyrtkirk Mar 23, 2023

kgyrtkirk Mar 24, 2023 •

edited

kgyrtkirk Mar 28, 2023

mkindahl left a comment

fabriziomello left a comment •

edited

timescale-automation commented Apr 6, 2023

kgyrtkirk commented Apr 6, 2023

Fix segfault after column drop on compressed table #5462

Fix segfault after column drop on compressed table #5462

Conversation

kgyrtkirk commented Mar 17, 2023

github-actions bot commented Mar 17, 2023

codecov bot commented Mar 20, 2023 • edited

Codecov Report

akuzm Mar 21, 2023

Choose a reason for hiding this comment

kgyrtkirk Mar 21, 2023

Choose a reason for hiding this comment

gayyappan Mar 21, 2023

Choose a reason for hiding this comment

gayyappan Mar 21, 2023

Choose a reason for hiding this comment

kgyrtkirk Mar 23, 2023

Choose a reason for hiding this comment

kgyrtkirk Mar 24, 2023 • edited

Choose a reason for hiding this comment

kgyrtkirk Mar 28, 2023

Choose a reason for hiding this comment

mkindahl left a comment

Choose a reason for hiding this comment

fabriziomello left a comment • edited

Choose a reason for hiding this comment

timescale-automation commented Apr 6, 2023

Git status

kgyrtkirk commented Apr 6, 2023

codecov bot commented Mar 20, 2023 •

edited

kgyrtkirk Mar 24, 2023 •

edited

fabriziomello left a comment •

edited