Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always reset expr context in DecompressChunk #3747

Merged
merged 1 commit into from
Oct 27, 2021

Conversation

mkindahl
Copy link
Contributor

When decompressing very large compressed chunks as part of executing a
DecompressChunk, the expression memory context was not reset after
each qual evaluation if the qual did not match, meaning that if the
qual did not match often, memory allocations would start to accumulate.

This commit fixes this by resetting the expression memory context for
each tuple when executing a DecompressChunk node, even if the qual
did not match.

Fixes #3620

@mkindahl
Copy link
Contributor Author

mkindahl commented Oct 27, 2021

Tested locally on a database with 12 million rows where not all match the conditions.

Before

Result of executing query from #3620 without patch

mats=# EXPLAIN ANALYZE SELECT * FROM events 
WHERE event->'timestamp' >= '0' 
AND event->'timestamp' < '1630454535290' 
AND event-> 'div' in ('"ABB"')
AND event->'mnemonic' = '"KQN_805WbKD_7_Z1Go4"';
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> 

image

After

Result of executing query from #3620 with patch

mats=# EXPLAIN ANALYZE SELECT * FROM events 
WHERE event->'timestamp' >= '0' 
AND event->'timestamp' < '1630454535290' 
AND event-> 'div' in ('"ABB"')
AND event->'mnemonic' = '"KQN_805WbKD_7_Z1Go4"';
                                                                                                              QUERY P
LAN                                                                                                               
---------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------
 Custom Scan (DecompressChunk) on _hyper_29_880_chunk  (cost=0.02..317.02 rows=13002000 width=522) (actual time=16724
.921..31391.073 rows=1084133 loops=1)
   Filter: (((event -> 'timestamp'::text) >= '0'::jsonb) AND ((event -> 'timestamp'::text) < '1630454535290'::jsonb) 
AND ((event -> 'div'::text) = '"ABB"'::jsonb) AND ((event -> 'mnemonic'::text) = '"KQN_805WbKD_7_Z1Go4"'::jsonb))
   Rows Removed by Filter: 11915867
   ->  Seq Scan on compress_hyper_30_881_chunk  (cost=0.00..317.02 rows=13002 width=109) (actual time=0.190..152.605 
rows=13002 loops=1)
 Planning Time: 24.489 ms
 Execution Time: 43933.637 ms
(6 rows)

image

@mkindahl mkindahl marked this pull request as ready for review October 27, 2021 08:25
@mkindahl mkindahl requested a review from a team as a code owner October 27, 2021 08:25
@mkindahl mkindahl requested review from svenklemm, akuzm and gayyappan and removed request for a team October 27, 2021 08:25
@mkindahl mkindahl self-assigned this Oct 27, 2021
@codecov
Copy link

codecov bot commented Oct 27, 2021

Codecov Report

Merging #3747 (1646de9) into master (0fecefd) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3747      +/-   ##
==========================================
- Coverage   90.29%   90.29%   -0.01%     
==========================================
  Files         213      213              
  Lines       37655    37654       -1     
==========================================
- Hits        34001    33999       -2     
- Misses       3654     3655       +1     
Impacted Files Coverage Δ
tsl/src/chunk_api.c 94.88% <100.00%> (ø)
tsl/src/nodes/decompress_chunk/exec.c 96.36% <100.00%> (ø)
tsl/src/nodes/data_node_dispatch.c 97.10% <0.00%> (-0.25%) ⬇️
tsl/src/bgw_policy/job.c 87.50% <0.00%> (-0.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6db012d...1646de9. Read the comment docs.

Copy link
Contributor

@erimatnor erimatnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wish we could have those screenshots auto-generated for every PR :)

When decompressing very large compressed chunks as part of executing a
`DecompressChunk`, the expression memory context was not reset after
each qual evaluation if the qual did not match, meaning that if the
qual did not match often, memory allocations would start to accumulate.

This commit fixes this by resetting the expression memory context for
each tuple when executing a `DecompressChunk` node, even if the qual
did not match.

Fixes timescale#3620
@mkindahl mkindahl enabled auto-merge (rebase) October 27, 2021 14:12
@mkindahl mkindahl merged commit f8bf3b9 into timescale:master Oct 27, 2021
@mkindahl mkindahl deleted the jsonb-compressed-crash branch October 27, 2021 15:04
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* timescale#3034 Add support for PostgreSQL 14
* timescale#3435 Add continuous aggregates for distributed hypertables
* timescale#3505 Add support for timezones in `time_bucket_ng()`
* timescale#3598 Improve evaluation of stable functions such as now() on access node
* timescale#3717 Support transparent decompression on individual chunks

**Bugfixes**
* timescale#3580 Fix memory context bug executing TRUNCATE
* timescale#3592 Allow alter column type on distributed hypertable
* timescale#3618 Fix execution of refresh_caggs from user actions
* timescale#3625 Add shared dependencies when creating chunk
* timescale#3626 Fix memory context bug executing TRUNCATE
* timescale#3627 Schema qualify UDTs in multi-node
* timescale#3638 Allow owner change of a data node
* timescale#3654 Fix index attnum mapping in reorder_chunk
* timescale#3661 Fix SkipScan path generation with constant DISTINCT column
* timescale#3667 Fix compress_policy for multi txn handling
* timescale#3673 Fix distributed hypertable DROP within a procedure
* timescale#3701 Allow anyone to use size utilities on distributed hypertables
* timescale#3708 Fix crash in get_aggsplit
* timescale#3709 Fix ordered append pathkey check
* timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* timescale#3724 Fix inserts into compressed chunks on hypertables with caggs
* timescale#3727 Fix DirectFunctionCall crash in distributed_exec
* timescale#3728 Fix SkipScan with varchar column
* timescale#3733 Fix ANALYZE crash with custom statistics for custom types
* timescale#3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on multinode
@fabriziomello fabriziomello mentioned this pull request Oct 27, 2021
fabriziomello added a commit to fabriziomello/timescaledb that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* timescale#3034 Add support for PostgreSQL 14
* timescale#3435 Add continuous aggregates for distributed hypertables
* timescale#3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* timescale#3580 Fix memory context bug executing TRUNCATE
* timescale#3592 Allow alter column type on distributed hypertable
* timescale#3598 Improve evaluation of stable functions such as now() on access
node
* timescale#3618 Fix execution of refresh_caggs from user actions
* timescale#3625 Add shared dependencies when creating chunk
* timescale#3626 Fix memory context bug executing TRUNCATE
* timescale#3627 Schema qualify UDTs in multi-node
* timescale#3638 Allow owner change of a data node
* timescale#3654 Fix index attnum mapping in reorder_chunk
* timescale#3661 Fix SkipScan path generation with constant DISTINCT column
* timescale#3667 Fix compress_policy for multi txn handling
* timescale#3673 Fix distributed hypertable DROP within a procedure
* timescale#3701 Allow anyone to use size utilities on distributed hypertables
* timescale#3708 Fix crash in get_aggsplit
* timescale#3709 Fix ordered append pathkey check
* timescale#3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* timescale#3717 Support transparent decompression on individual chunks
* timescale#3724 Fix inserts into compressed chunks on hypertables with caggs
* timescale#3727 Fix DirectFunctionCall crash in distributed_exec
* timescale#3728 Fix SkipScan with varchar column
* timescale#3733 Fix ANALYZE crash with custom statistics for custom types
* timescale#3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries
and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed
hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on
multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries
and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed
hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on
multinode
fabriziomello added a commit that referenced this pull request Oct 27, 2021
This release adds major new features since the 2.4.2 release.
We deem it moderate priority for upgrading.

This release includes these noteworthy features:

* Continuous Aggregates for Distributed Hypertables
* Support for PostgreSQL 14
* Experimental: Support for timezones in `time_bucket_ng()`, including
the `origin` argument

This release also includes several bug fixes.

**Features**
* #3034 Add support for PostgreSQL 14
* #3435 Add continuous aggregates for distributed hypertables
* #3505 Add support for timezones in `time_bucket_ng()`

**Bugfixes**
* #3580 Fix memory context bug executing TRUNCATE
* #3592 Allow alter column type on distributed hypertable
* #3598 Improve evaluation of stable functions such as now() on access
node
* #3618 Fix execution of refresh_caggs from user actions
* #3625 Add shared dependencies when creating chunk
* #3626 Fix memory context bug executing TRUNCATE
* #3627 Schema qualify UDTs in multi-node
* #3638 Allow owner change of a data node
* #3654 Fix index attnum mapping in reorder_chunk
* #3661 Fix SkipScan path generation with constant DISTINCT column
* #3667 Fix compress_policy for multi txn handling
* #3673 Fix distributed hypertable DROP within a procedure
* #3701 Allow anyone to use size utilities on distributed hypertables
* #3708 Fix crash in get_aggsplit
* #3709 Fix ordered append pathkey check
* #3712 Fix GRANT/REVOKE ALL IN SCHEMA handling
* #3717 Support transparent decompression on individual chunks
* #3724 Fix inserts into compressed chunks on hypertables with caggs
* #3727 Fix DirectFunctionCall crash in distributed_exec
* #3728 Fix SkipScan with varchar column
* #3733 Fix ANALYZE crash with custom statistics for custom types
* #3747 Always reset expr context in DecompressChunk

**Thanks**
* @binakot and @sebvett for reporting an issue with DISTINCT queries
* @hardikm10, @DavidPavlicek and @pafiti for reporting bugs on TRUNCATE
* @mjf for reporting an issue with ordered append and JOINs
* @phemmer for reporting the issues on multinode with aggregate queries
and evaluation of now()
* @abolognino for reporting an issue with INSERTs into compressed
hypertables that have cagg
* @tanglebones for reporting the ANALYZE crash with custom types on
multinode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants