Dont apply sort optimization when interval length is not fixed #7161

svenklemm · 2024-07-28T19:46:18Z

We applied our sort transformation for interval calculation too aggressively even in situations where it is not safe to do so, leading to potentially incorrectly sorted output or mergejoin input data is out of order error messages.

Fixes #7097

We applied our sort transformation for interval calculation too aggressively even in situations where it is not safe to do so, leading to potentially incorrectly sorted output or `mergejoin input data is out of order` error messages. Fixes #7097

fabriziomello · 2024-07-29T14:00:04Z

@svenklemm looks like this PR also fix this issue #6872

svenklemm · 2024-07-29T14:17:36Z

@svenklemm looks like this PR also fix this issue #6872

Did you verify this fixes #6872?

fabriziomello · 2024-07-29T14:37:02Z

@svenklemm looks like this PR also fix this issue #6872

Did you verify this fixes #6872?

Yep... against your PR the reproducible test case don't fail anymore. Maybe we can also include it in your PR.

akuzm · 2024-07-29T15:16:35Z

@svenklemm looks like this PR also fix this issue #6872

Did you verify this fixes #6872?

Yep... against your PR the reproducible test case don't fail anymore. Maybe we can also include it in your PR.

Is it a fix though? The optimization is now disabled with variable length intervals, but we're going to have the same problem for constant intervals, no?

svenklemm · 2024-07-29T19:05:32Z

Yea i think it's incidental that it fixes #6872 and will probably resurface with smaller intervals

erimatnor · 2024-07-30T09:15:38Z

Can you explain in the description why this is a fix for the issue. The current description just says it was previously unsafe and now it is fixed, but not how.

I am not sure I understand why this is a fix for this issue. Yes, it fixes the issue for some cases (where month and day is non-zero). But, AFAICT, the issue seems to be the use of timestamp with timezone, not non-fixed intervals. The problem seems to happen when you add an interval (fixed or non-fixed) and you "cross" a timezone change (e.g., daylight savings change). So, it seems to me that the issue remains and that the optimization is inherently unsafe for timestamp with timezone (unless the timezone is, e.g., UTC).

svenklemm · 2024-07-30T09:40:36Z

Can you explain in the description why this is a fix for the issue. The current description just says it was previously unsafe and now it is fixed, but not how.

I am not sure I understand why this is a fix for this issue. Yes, it fixes the issue for some cases (where month and day is non-zero). But, AFAICT, the issue seems to be the use of timestamp with timezone, not non-fixed intervals. The problem seems to happen when you add an interval (fixed or non-fixed) and you "cross" a timezone change (e.g., daylight savings change). So, it seems to me that the issue remains and that the optimization is inherently unsafe for timestamp with timezone (unless the timezone is, e.g., UTC).

The optimization is always safe for intervals with fixed length (intervals with no day or month component), but for any calculations that involve calendar time it may not be, so this PR disables the optimization for those cases.

erimatnor · 2024-07-30T09:48:52Z

Can you explain in the description why this is a fix for the issue. The current description just says it was previously unsafe and now it is fixed, but not how.
I am not sure I understand why this is a fix for this issue. Yes, it fixes the issue for some cases (where month and day is non-zero). But, AFAICT, the issue seems to be the use of timestamp with timezone, not non-fixed intervals. The problem seems to happen when you add an interval (fixed or non-fixed) and you "cross" a timezone change (e.g., daylight savings change). So, it seems to me that the issue remains and that the optimization is inherently unsafe for timestamp with timezone (unless the timezone is, e.g., UTC).

The optimization is always safe for intervals with fixed length (intervals with no day or month component), but for any calculations that involve calendar time it may not be, so this PR disables the optimization for those cases.

You are still not explaining why this is so, just stating the fact like before.

svenklemm · 2024-07-30T10:09:10Z

Internally timestamptz is always stored as UTC, any calculations with intervals with no day and month component is integer arithmetic. So for this specific case the requirement for the optimization that sorting with and without the calculation produces the same ordering is true. DST is not relevant since internally the calculation uses UTC time and is therefore ordering preserving even if there is a DST switch.

@jledentu

This release contains performance improvements and bug fixes since the 2.15.3 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6880: Add support for the array operators used for compressed DML batch filtering. * timescale#6895: Improve the compressed DML expression pushdown. * timescale#6897: Add support for replica identity on compressed hypertables. * timescale#6918: Remove support for PG13. * timescale#6920: Rework compression activity wal markers. * timescale#6989: Add support for foreign keys when converting plain tables to hypertables. * timescale#7020: Add support for the chunk column statistics tracking. * timescale#7048: Add an index scan for INSERT DML decompression. * timescale#7075: Reduce decompression on the compressed INSERT. * timescale#7101: Reduce decompressions for the compressed UPDATE/DELETE. * timescale#7108 Reduce decompressions for INSERTs with UNIQUE constraints * timescale#7116 Use DELETE instead of TRUNCATE after compression * timescale#7134 Refactor foreign key handling for compressed hypertables * timescale#7161 Fix `mergejoin input data is out of order` **Bugfixes** * timescale#6987 Fix REASSIGN OWNED BY for background jobs * timescale#7018: Fix `search_path` quoting in the compression defaults function. * timescale#7046: Prevent locking for compressed tuples. * timescale#7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`. * timescale#7064: Fix the bug in the default `order by` calculation in compression. * timescale#7069: Fix the index column name usage. * timescale#7074: Fix the bug in the default `segment by` calculation in compression. **Thanks** * @jledentu For reporting a problem with mergejoin input order

@jledentu

This release contains significant performance improvements when working with compressed data, extended join support in continuous aggregates, and the ability to define foreign keys from regular tables towards hypertables. We recommend that you upgrade at the next available opportunity. In TimescaleDB v2.16.0 we: * Introduce multiple performance focused optimizations for data manipulation operations (DML) over compressed chunks. Improved upsert performance by more than 100x in some cases and more than 1000x in some update/delete scenarios. * Add the ability to define chunk skipping indexes on non-partitioning columns of compressed hypertables TimescaleDB v2.16.0 extends chunk exclusion to use those skipping (sparse) indexes when queries filter on the relevant columns, and prune chunks that do not include any relevant data for calculating the query response. * Offer new options for use cases that require foreign keys defined. You can now add foreign keys from regular tables towards hypertables. We have also removed some really annoying locks in the reverse direction that blocked access to referenced tables while compression was running. * Extend Continuous Aggregates to support more types of analytical queries. More types of joins are supported, additional equality operators on join clauses, and support for joins between multiple regular tables. **Highlighted features in this release** * Improved query performance through chunk exclusion on compressed hypertables. You can now define chunk skipping indexes on compressed chunks for any column with one of the following integer data types: `smallint`, `int`, `bigint`, `serial`, `bigserial`, `date`, `timestamp`, `timestamptz`. After you call `enable_chunk_skipping` on a column, TimescaleDB tracks the min and max values for that column. TimescaleDB uses that information to exclude chunks for queries that filter on that column, and would not find any data in those chunks. * Improved upsert performance on compressed hypertables. By using index scans to verify constraints during inserts on compressed chunks, TimescaleDB speeds up some ON CONFLICT clauses by more than 100x. * Improved performance of updates, deletes, and inserts on compressed hypertables. By filtering data while accessing the compressed data and before decompressing, TimescaleDB has improved performance for updates and deletes on all types of compressed chunks, as well as inserts into compressed chunks with unique constraints. By signaling constraint violations without decompressing, or decompressing only when matching records are found in the case of updates, deletes and upserts, TimescaleDB v2.16.0 speeds up those operations more than 1000x in some update/delete scenarios, and 10x for upserts. * You can add foreign keys from regular tables to hypertables, with support for all types of cascading options. This is useful for hypertables that partition using sequential IDs, and need to reference those IDs from other tables. * Lower locking requirements during compression for hypertables with foreign keys Advanced foreign key handling removes the need for locking referenced tables when new chunks are compressed. DML is no longer blocked on referenced tables while compression runs on a hypertable. * Improved support for queries on Continuous Aggregates `INNER/LEFT` and `LATERAL` joins are now supported. Plus, you can now join with multiple regular tables, and you can have more than one equality operator on join clauses. **PostgreSQL 13 support removal announcement** Following the deprecation announcement for PostgreSQL 13 in TimescaleDB v2.13, PostgreSQL 13 is no longer supported in TimescaleDB v2.16. The Currently supported PostgreSQL major versions are 14, 15 and 16. **Features** * #6880: Add support for the array operators used for compressed DML batch filtering. * #6895: Improve the compressed DML expression pushdown. * #6897: Add support for replica identity on compressed hypertables. * #6918: Remove support for PG13. * #6920: Rework compression activity wal markers. * #6989: Add support for foreign keys when converting plain tables to hypertables. * #7020: Add support for the chunk column statistics tracking. * #7048: Add an index scan for INSERT DML decompression. * #7075: Reduce decompression on the compressed INSERT. * #7101: Reduce decompressions for the compressed UPDATE/DELETE. * #7108 Reduce decompressions for INSERTs with UNIQUE constraints * #7116 Use DELETE instead of TRUNCATE after compression * #7134 Refactor foreign key handling for compressed hypertables * #7161 Fix `mergejoin input data is out of order` **Bugfixes** * #6987 Fix REASSIGN OWNED BY for background jobs * #7018: Fix `search_path` quoting in the compression defaults function. * #7046: Prevent locking for compressed tuples. * #7055: Fix the `scankey` for `segment by` columns, where the type `constant` is different to `variable`. * #7064: Fix the bug in the default `order by` calculation in compression. * #7069: Fix the index column name usage. * #7074: Fix the bug in the default `segment by` calculation in compression. **Thanks** * @jledentu For reporting a problem with mergejoin input order

svenklemm self-assigned this Jul 28, 2024

svenklemm requested review from fabriziomello, antekresic and akuzm July 28, 2024 19:47

svenklemm force-pushed the mergejoin_order branch from 848b53d to 277d3d5 Compare July 28, 2024 20:41

svenklemm force-pushed the mergejoin_order branch from 277d3d5 to 6e52926 Compare July 28, 2024 20:43

akuzm approved these changes Jul 29, 2024

View reviewed changes

fabriziomello mentioned this pull request Jul 29, 2024

[Bug]: ORDER/GROUP BY expression not found in targetlist #6872

Closed

svenklemm added this to the TimescaleDB 2.16.0 milestone Jul 29, 2024

svenklemm requested a review from pallavisontakke July 30, 2024 07:39

svenklemm enabled auto-merge (rebase) July 30, 2024 09:11

fabriziomello approved these changes Jul 30, 2024

View reviewed changes

svenklemm merged commit 5a81be8 into main Jul 30, 2024
44 of 45 checks passed

svenklemm deleted the mergejoin_order branch July 30, 2024 13:05

timescale-automation mentioned this pull request Jul 30, 2024

Backport to 2.15.x: #7161: Dont apply sort optimization when interval length is not fixed #7165

Closed

pallavisontakke mentioned this pull request Jul 31, 2024

Release 2.16.0 #7169

Closed

bayandin mentioned this pull request Aug 1, 2024

timescaledb 2.16.0 bayandin/homebrew-tap#173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dont apply sort optimization when interval length is not fixed #7161

Dont apply sort optimization when interval length is not fixed #7161

svenklemm commented Jul 28, 2024 •

edited

Loading

fabriziomello commented Jul 29, 2024

svenklemm commented Jul 29, 2024

fabriziomello commented Jul 29, 2024

akuzm commented Jul 29, 2024

svenklemm commented Jul 29, 2024

erimatnor commented Jul 30, 2024 •

edited

Loading

svenklemm commented Jul 30, 2024

erimatnor commented Jul 30, 2024

svenklemm commented Jul 30, 2024

Dont apply sort optimization when interval length is not fixed #7161

Dont apply sort optimization when interval length is not fixed #7161

Conversation

svenklemm commented Jul 28, 2024 • edited Loading

fabriziomello commented Jul 29, 2024

svenklemm commented Jul 29, 2024

fabriziomello commented Jul 29, 2024

akuzm commented Jul 29, 2024

svenklemm commented Jul 29, 2024

erimatnor commented Jul 30, 2024 • edited Loading

svenklemm commented Jul 30, 2024

erimatnor commented Jul 30, 2024

svenklemm commented Jul 30, 2024

svenklemm commented Jul 28, 2024 •

edited

Loading

erimatnor commented Jul 30, 2024 •

edited

Loading