Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault after column drop on compressed table #5462

Merged
merged 1 commit into from Apr 6, 2023

Conversation

kgyrtkirk
Copy link
Contributor

Decompression produces records which have all the decompressed data set, but it also retains the fields which are used internally during decompression.
These didn't cause any problem - unless an operation is being done with the whole row - in which case all the fields which have ended up being non-null can be a potential segfault source.

Fixes #5458 #5411

@github-actions
Copy link

@nikkhils, @fabriziomello: please review this pull request.

Powered by pull-review

@kgyrtkirk kgyrtkirk added compression segfault Segmentation fault labels Mar 17, 2023
@kgyrtkirk kgyrtkirk force-pushed the comp-rowe-segfault-fix branch 2 times, most recently from 888ed3d to 63f20d1 Compare March 20, 2023 08:08
@codecov
Copy link

codecov bot commented Mar 20, 2023

Codecov Report

Merging #5462 (742a48c) into main (c2941a3) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head 742a48c differs from pull request most recent head d170d5a. Consider uploading reports for the commit d170d5a to get more accurate results

@@            Coverage Diff             @@
##             main    #5462      +/-   ##
==========================================
- Coverage   90.75%   90.72%   -0.04%     
==========================================
  Files         229      229              
  Lines       53707    53702       -5     
==========================================
- Hits        48742    48721      -21     
- Misses       4965     4981      +16     
Impacted Files Coverage Δ
tsl/src/nodes/decompress_chunk/exec.c 93.69% <100.00%> (ø)

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines 437 to 495
/* some technical columns may left unpopulated in the slot;
* setting isnull prevents issues for row expressions */
for (i = 0; i < slot->tts_tupleDescriptor->natts; i++)
{
slot->tts_isnull[i] = true;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which columns exactly? Shouldn't drop column battery_status drop them as well?

Probably makes sense to add a test for whole-row var + compressed chunks with diffent columns, like in compression_alter.sql. I'm not sure how row type conversion works in this case, and why it even tries to read these columns that are not in the resulting row...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after the drop there is a 3rd column in tts_tupleDescriptor with the name : {data = "........pg.dropped.3........", '\000' <repeats 35 times>} which is not populated

I've looked into fixing it differently but the removed column can be anywhere in the row - so its not that easy to erase it; we also have have a projection embedded around here which further complicate things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the projections from the 2 chunks don't match. The compressed chunk output should be transformed to match that of the uncompressed chunk.
Compare the plan below with select * from readings. We already have code that adjusts for select *. Need to check why that is not working for this case.

test=# explain verbose select c from readings c;
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=1.01..43.76 rows=2850 width=32)
   ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_9_57_chunk c_1  (cost=1.01..1.01 rows=1000 width=32)
         Output: c_1.*
         ->  Seq Scan on _timescaledb_internal.compress_hyper_10_58_chunk  (cost=0.00..1.01 rows=1 width=44)
               Output: compress_hyper_10_58_chunk._ts_meta_count, compress_hyper_10_58_chunk."time", compress_hyper_10_58_chunk.battery_temperature
   ->  Seq Scan on _timescaledb_internal._hyper_9_59_chunk c_2  (cost=0.00..28.50 rows=1850 width=32)
         Output: c_2.*
(7 rows)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See decompress_chunk.c for whole_row var handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've debugged this a bit more and I'm now more convinced that this is the right solution; let me try to walk you thru why:

  • decompression with uncompressed rows works by utilizing an Append node to concatenate the rows of 2 operators
  • Append also has the requirement that the columns should match-up between its operands
    • in case there are exact columns requests this is not that big of an issue - they are being placed into the targetlist; and as long as the number/types of the targets are in sync I guess Append is happy to do its job
      • note that in this case we are selecting specific columns from the standard hypertable as well
    • if we have to select the whole row - it will do it by adding a Var with attno=0; there will be 1 column with a complex type...
  • whole-rows for compression is handled by interpreting the attno=0 and expanding them for compressed_rel ; to help build the underlying SeqScan on the compressed table.
    • based on the actual scanned column list on the compressed table the compression_map is computed which will be there during execution to route inputs to outputs after decompression
  • to make Append work the code actually piggybacks on the original rel ; by copying its targetlist for the decompression path
    • however this means that in the above path it will copy the row reference as well: c_2.*
    • this will make Append happy because there is no difference between the 2 operands output 👍
  • the hypertable already have a column dropped!
    • there will be a pg.dropped column in the output rowtype of of Seq Scan c_2 for c_2.*
    • because this rowtype was copied to be the output of the DecompressChunk; the deleted column will appear there as well...but since it was never really planned to get that column (whole_row stuff in decompress_chunk) - compression_map will not have any instructions how to fill it based on the actual input
  • I think the removal of pg.dropped. columns is done internally by postgres automatically

I think append will not work correctly unless all branches return the same rowtype. And that's why I think the options are:

  • go with the current fix
    • or it might be possible to pre-determine which columns will not have their is_null populated
  • map: probably it might be possible to add some marker to the decompress_map to handle unknown output columns and set is_null based on that
  • wild1: if the input contains whole-row stuff; reference all columns directly for both the hypertable and for the compressed; move all expression evaluation into a new projection on top of the append
  • wild2: push-in a projection for the standard seq-scan part to remove these removed columns
  • other ideas?

Copy link
Contributor Author

@kgyrtkirk kgyrtkirk Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gayyappan @akuzm Could you please take another look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svenklemm ; replying to your comment on the other PR here instead.

row schema for the compressed path is copied over from the non-compressed hypertable; which may have dropped columns.

  • CompressionInfo.chunk_rel is the base hypertable relation
  • decompression uses that to declare that it could a relation like that:
    path->cpath.path.parent = info->chunk_rel;
    • actually this kinda makes sense ; because for a compressed chunk that's the "closest thing" to what the compression will be producing

We could try other approaches...but setting the isnull bools to true by default can fix this segfault harmlessly without extensive changes

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into a similar problem in #4268. It seems like some of the code in PostgresSQL does not check the attisdropped column and rely on these columns to be NULL. Check, for instance, heap_compute_data_size and toast_build_flattened_tuple which is only checking for NULL, but not checking if the column is dropped.

Also check how make_tuple_from_row in pl_exec.c handles dropped columns by setting them to null.

Approving since I think this demonstrably solves the problem and does not really risk introducing any new bugs.

tsl/src/nodes/decompress_chunk/exec.c Outdated Show resolved Hide resolved
Copy link
Contributor

@fabriziomello fabriziomello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor cosmetic comments that is up to you. Approved anyway!!!

Decompression produces records which have all the decompressed data
set, but it also retains the fields which are used internally during
decompression.
These didn't cause any problem - unless an operation is being done
with the whole row - in which case all the fields which have ended up
being non-null can be a potential segfault source.

Fixes timescale#5458 timescale#5411
@kgyrtkirk kgyrtkirk enabled auto-merge (rebase) April 5, 2023 10:08
@kgyrtkirk kgyrtkirk merged commit 975e9ca into timescale:main Apr 6, 2023
47 of 48 checks passed
@timescale-automation
Copy link

Automated backport to 2.10.x not done: cherry-pick failed.

Git status

HEAD detached at origin/2.10.x
You are currently cherry-picking commit 975e9ca1.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   CHANGELOG.md
	modified:   tsl/test/expected/compression_errors.out
	modified:   tsl/test/sql/compression_errors.sql

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   tsl/src/nodes/decompress_chunk/exec.c


Job log

@timescale-automation timescale-automation added the auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) label Apr 6, 2023
@kgyrtkirk
Copy link
Contributor Author

Thank you @mkindahl and @fabriziomello for reviewing this PR!

akuzm added a commit to akuzm/timescaledb that referenced this pull request Apr 19, 2023
 ## 2.10.2 (2023-04-20)

**Bugfixes**
* timescale#5410 Fix file trailer handling in the COPY fetcher
* timescale#5446 Add checks for malloc failure in libpq calls
* timescale#5233 Out of on_proc_exit slots on guc license change
* timescale#5428 Use consistent snapshots when scanning metadata
* timescale#5499 Do not segfault on large histogram() parameters
* timescale#5470 Ensure superuser perms during copy/move chunk
* timescale#5500 Fix when no FROM clause in continuous aggregate definition
* timescale#5433 Fix join rte in CAggs with joins
* timescale#5556 Fix duplicated entries on timescaledb_experimental.policies view
* timescale#5462 Fix segfault after column drop on compressed table
* timescale#5543 Copy scheduled_jobs list before sorting it
* timescale#5497 Allow named time_bucket arguments in Cagg definition
* timescale#5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket
* timescale#5558 Use regrole for job owner
* timescale#5542 Enable indexscan on uncompressed part of partially compressed chunks

**Thanks**
* @nikolaps for reporting an issue with the COPY fetcher
* @S-imo-n for reporting the issue on Background Worker Scheduler crash
* @geezhu for reporting issue on segfault in historgram()
* @mwahlhuetter for reporting the issue with joins in CAggs
* @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view
* @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm added a commit that referenced this pull request Apr 19, 2023
 ## 2.10.2 (2023-04-20)

**Bugfixes**
* #5410 Fix file trailer handling in the COPY fetcher
* #5446 Add checks for malloc failure in libpq calls
* #5233 Out of on_proc_exit slots on guc license change
* #5428 Use consistent snapshots when scanning metadata
* #5499 Do not segfault on large histogram() parameters
* #5470 Ensure superuser perms during copy/move chunk
* #5500 Fix when no FROM clause in continuous aggregate definition
* #5433 Fix join rte in CAggs with joins
* #5556 Fix duplicated entries on timescaledb_experimental.policies view
* #5462 Fix segfault after column drop on compressed table
* #5543 Copy scheduled_jobs list before sorting it
* #5497 Allow named time_bucket arguments in Cagg definition
* #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket
* #5558 Use regrole for job owner
* #5542 Enable indexscan on uncompressed part of partially compressed chunks

**Thanks**
* @nikolaps for reporting an issue with the COPY fetcher
* @S-imo-n for reporting the issue on Background Worker Scheduler crash
* @geezhu for reporting issue on segfault in historgram()
* @mwahlhuetter for reporting the issue with joins in CAggs
* @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view
* @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm added a commit that referenced this pull request Apr 19, 2023
 ## 2.10.2 (2023-04-20)

**Bugfixes**
* #5410 Fix file trailer handling in the COPY fetcher
* #5446 Add checks for malloc failure in libpq calls
* #5233 Out of on_proc_exit slots on guc license change
* #5428 Use consistent snapshots when scanning metadata
* #5499 Do not segfault on large histogram() parameters
* #5470 Ensure superuser perms during copy/move chunk
* #5500 Fix when no FROM clause in continuous aggregate definition
* #5433 Fix join rte in CAggs with joins
* #5556 Fix duplicated entries on timescaledb_experimental.policies view
* #5462 Fix segfault after column drop on compressed table
* #5543 Copy scheduled_jobs list before sorting it
* #5497 Allow named time_bucket arguments in Cagg definition
* #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket
* #5558 Use regrole for job owner
* #5542 Enable indexscan on uncompressed part of partially compressed chunks

**Thanks**
* @nikolaps for reporting an issue with the COPY fetcher
* @S-imo-n for reporting the issue on Background Worker Scheduler crash
* @geezhu for reporting issue on segfault in historgram()
* @mwahlhuetter for reporting the issue with joins in CAggs
* @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view
* @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm added a commit that referenced this pull request Apr 19, 2023
 ## 2.10.2 (2023-04-20)

**Bugfixes**
* #5410 Fix file trailer handling in the COPY fetcher
* #5446 Add checks for malloc failure in libpq calls
* #5233 Out of on_proc_exit slots on guc license change
* #5428 Use consistent snapshots when scanning metadata
* #5499 Do not segfault on large histogram() parameters
* #5470 Ensure superuser perms during copy/move chunk
* #5500 Fix when no FROM clause in continuous aggregate definition
* #5433 Fix join rte in CAggs with joins
* #5556 Fix duplicated entries on timescaledb_experimental.policies view
* #5462 Fix segfault after column drop on compressed table
* #5543 Copy scheduled_jobs list before sorting it
* #5497 Allow named time_bucket arguments in Cagg definition
* #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket
* #5558 Use regrole for job owner
* #5542 Enable indexscan on uncompressed part of partially compressed chunks

**Thanks**
* @nikolaps for reporting an issue with the COPY fetcher
* @S-imo-n for reporting the issue on Background Worker Scheduler crash
* @geezhu for reporting issue on segfault in historgram()
* @mwahlhuetter for reporting the issue with joins in CAggs
* @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view
* @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm added a commit to akuzm/timescaledb that referenced this pull request Apr 19, 2023
 ## 2.10.2 (2023-04-20)

**Bugfixes**
* timescale#5410 Fix file trailer handling in the COPY fetcher
* timescale#5446 Add checks for malloc failure in libpq calls
* timescale#5233 Out of on_proc_exit slots on guc license change
* timescale#5428 Use consistent snapshots when scanning metadata
* timescale#5499 Do not segfault on large histogram() parameters
* timescale#5470 Ensure superuser perms during copy/move chunk
* timescale#5500 Fix when no FROM clause in continuous aggregate definition
* timescale#5433 Fix join rte in CAggs with joins
* timescale#5556 Fix duplicated entries on timescaledb_experimental.policies view
* timescale#5462 Fix segfault after column drop on compressed table
* timescale#5543 Copy scheduled_jobs list before sorting it
* timescale#5497 Allow named time_bucket arguments in Cagg definition
* timescale#5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket
* timescale#5558 Use regrole for job owner
* timescale#5542 Enable indexscan on uncompressed part of partially compressed chunks

**Thanks**
* @nikolaps for reporting an issue with the COPY fetcher
* @S-imo-n for reporting the issue on Background Worker Scheduler crash
* @geezhu for reporting issue on segfault in historgram()
* @mwahlhuetter for reporting the issue with joins in CAggs
* @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view
* @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm added a commit that referenced this pull request Apr 20, 2023
 ## 2.10.2 (2023-04-20)

**Bugfixes**
* #5410 Fix file trailer handling in the COPY fetcher
* #5446 Add checks for malloc failure in libpq calls
* #5233 Out of on_proc_exit slots on guc license change
* #5428 Use consistent snapshots when scanning metadata
* #5499 Do not segfault on large histogram() parameters
* #5470 Ensure superuser perms during copy/move chunk
* #5500 Fix when no FROM clause in continuous aggregate definition
* #5433 Fix join rte in CAggs with joins
* #5556 Fix duplicated entries on timescaledb_experimental.policies view
* #5462 Fix segfault after column drop on compressed table
* #5543 Copy scheduled_jobs list before sorting it
* #5497 Allow named time_bucket arguments in Cagg definition
* #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket
* #5558 Use regrole for job owner
* #5542 Enable indexscan on uncompressed part of partially compressed chunks

**Thanks**
* @nikolaps for reporting an issue with the COPY fetcher
* @S-imo-n for reporting the issue on Background Worker Scheduler crash
* @geezhu for reporting issue on segfault in historgram()
* @mwahlhuetter for reporting the issue with joins in CAggs
* @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view
* @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
akuzm added a commit that referenced this pull request Apr 20, 2023
 ## 2.10.2 (2023-04-20)

**Bugfixes**
* #5410 Fix file trailer handling in the COPY fetcher
* #5446 Add checks for malloc failure in libpq calls
* #5233 Out of on_proc_exit slots on guc license change
* #5428 Use consistent snapshots when scanning metadata
* #5499 Do not segfault on large histogram() parameters
* #5470 Ensure superuser perms during copy/move chunk
* #5500 Fix when no FROM clause in continuous aggregate definition
* #5433 Fix join rte in CAggs with joins
* #5556 Fix duplicated entries on timescaledb_experimental.policies view
* #5462 Fix segfault after column drop on compressed table
* #5543 Copy scheduled_jobs list before sorting it
* #5497 Allow named time_bucket arguments in Cagg definition
* #5544 Fix refresh from beginning of Continuous Aggregate with variable time bucket
* #5558 Use regrole for job owner
* #5542 Enable indexscan on uncompressed part of partially compressed chunks

**Thanks**
* @nikolaps for reporting an issue with the COPY fetcher
* @S-imo-n for reporting the issue on Background Worker Scheduler crash
* @geezhu for reporting issue on segfault in historgram()
* @mwahlhuetter for reporting the issue with joins in CAggs
* @mwahlhuetter for reporting issue with duplicated entries on timescaledb_experimental.policies view
* @H25E for reporting error refreshing from beginning of a Continuous Aggregate with variable time bucket
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) backported-2.10.x compression segfault Segmentation fault
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Segmentation fault after column dropped from compressed table
6 participants