New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark partialize_agg as parallel safe #4307
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4307 +/- ##
==========================================
- Coverage 90.84% 90.84% -0.01%
==========================================
Files 220 220
Lines 40505 40504 -1
==========================================
- Hits 36798 36794 -4
- Misses 3707 3710 +3
Continue to review full report at Codecov.
|
@@ -81,6 +81,7 @@ ${PSQL} -U ${TEST_PGUSER} \ | |||
sed -e '/<exclude_from_test>/,/<\/exclude_from_test>/d' \ | |||
-e 's!_[0-9]\{1,\}_[0-9]\{1,\}_chunk!_X_X_chunk!g' \ | |||
-e 's!^ \{1,\}QUERY PLAN \{1,\}$!QUERY PLAN!' \ | |||
-e 's!: actual rows!: actual rows!' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not a fan of this. Can you get a stable test output without this change as the number of rows is useful for other tests so I would rather not have this removed globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm keeping the number, just replacing double space for single. The formatting was changed between postgres versions, so I'm normalizing it to avoid having separate references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Could you expand in the PR description (commit message) on why it is OK to make the function parallel safe.
I assume it is because partial aggregates actually exists in PG to do parallel computation, so they are parallel safe by definition? But it would be nice to make that clear if this is the reason.
Yes, Postgres knows whether a given aggregate is parallel-safe and paralellizes based on that. We don't need to artificially disable parallel plans with our partial aggregation wrapper, because this wrapper is a pure function that produces the unfinished aggregation state as the result, and doesn't change parallel safety. I'll try to describe this in more detail... |
e6a4384
to
b6fc663
Compare
Postgres knows whether a given aggregate is parallel-safe, and creates parallel aggregation plans based on that. The `partialize_agg` is a wrapper we use to perform partial aggregation on data nodes. It is a pure function that produces serialized aggregation state as a result. Being pure, it doesn't influence parallel safety. This means we don't need to mark it parallel-unsafe to artificially disable the parallel plans for partial aggregation. They will be chosen as usual based on the parallel-safety of the underlying aggregate function.
b6fc663
to
3e9254f
Compare
This release adds major new features since the 2.7.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * time_bucket now supports bucketing by month, year and timezone * Improve performance of bulk SELECT and COPY for distributed hypertables * 1 step CAgg policy management * Migrate Continuous Aggregates to the new format **Features** * timescale#4188 Use COPY protocol in row-by-row fetcher * timescale#4307 Mark partialize_agg as parallel safe * timescale#4380 Enable chunk exclusion for space dimensions in UPDATE/DELETE * timescale#4384 Add schedule_interval to policies * timescale#4390 Faster lookup of chunks by point * timescale#4393 Support intervals with day component when constifying now() * timescale#4397 Support intervals with month component when constifying now() * timescale#4405 Support ON CONFLICT ON CONSTRAINT for hypertables * timescale#4412 Add telemetry about replication * timescale#4415 Drop remote data when detaching data node * timescale#4416 Handle TRUNCATE TABLE on chunks * timescale#4425 Add parameter check_config to alter_job * timescale#4430 Create index on Continuous Aggregates * timescale#4439 Allow ORDER BY on continuous aggregates * timescale#4443 Add stateful partition mappings * timescale#4484 Use non-blocking data node connections for COPY * timescale#4495 Support add_dimension() with existing data * timescale#4502 Add chunks to baserel cache on chunk exclusion * timescale#4545 Add hypertable distributed argument and defaults * timescale#4552 Migrate Continuous Aggregates to the new format * timescale#4556 Add runtime exclusion for hypertables * timescale#4561 Change get_git_commit to return full commit hash * timescale#4563 1 step CAgg policy management * timescale#4641 Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill * timescale#4642 Add timezone support to time_bucket **Bugfixes** * timescale#4359 Create composite index on segmentby columns * timescale#4374 Remove constified now() constraints from plan * timescale#4416 Handle TRUNCATE TABLE on chunks * timescale#4478 Synchronize chunk cache sizes * timescale#4486 Adding boolean column with default value doesn't work on compressed table * timescale#4512 Fix unaligned pointer access * timescale#4519 Throw better error message on incompatible row fetcher settings * timescale#4549 Fix dump_meta_data for windows * timescale#4553 Fix timescaledb_post_restore GUC handling * timescale#4573 Load TSL library on compressed_data_out call * timescale#4575 Fix use of `get_partition_hash` and `get_partition_for_key` inside an IMMUTABLE function * timescale#4577 Fix segfaults in compression code with corrupt data * timescale#4580 Handle default privileges on CAggs properly * timescale#4582 Fix assertion in GRANT .. ON ALL TABLES IN SCHEMA * timescale#4583 Fix partitioning functions * timescale#4589 Fix rename for distributed hypertable * timescale#4601 Reset compression sequence when group resets * timescale#4611 Fix a potential OOM when loading large data sets into a hypertable * timescale#4624 Fix heap buffer overflow * timescale#4627 Fix telemetry initialization * timescale#4631 Ensure TSL library is loaded on database upgrades * timescale#4646 Fix time_bucket_ng origin handling * timescale#4647 Fix the error "SubPlan found with no parent plan" that occurred if using joins in RETURNING clause. **Thanks** * @AlmiS for reporting error on `get_partition_hash` executed inside an IMMUTABLE function * @Creatation for reporting an issue with renaming hypertables * @janko for reporting an issue when adding bool column with default value to compressed hypertable * @jayadevanm for reporting error of TRUNCATE TABLE on compressed chunk * @michaelkitson for reporting permission errors using default privileges on Continuous Aggregates * @mwahlhuetter for reporting error in joins in RETURNING clause * @ninjaltd and @mrksngl for reporting a potential OOM when loading large data sets into a hypertable * @PBudmark for reporting an issue with dump_meta_data.sql on Windows * @ssmoss for reporting an issue with time_bucket_ng origin handling
This release adds major new features since the 2.7.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * time_bucket now supports bucketing by month, year and timezone * Improve performance of bulk SELECT and COPY for distributed hypertables * 1 step CAgg policy management * Migrate Continuous Aggregates to the new format **Features** * #4188 Use COPY protocol in row-by-row fetcher * #4307 Mark partialize_agg as parallel safe * #4380 Enable chunk exclusion for space dimensions in UPDATE/DELETE * #4384 Add schedule_interval to policies * #4390 Faster lookup of chunks by point * #4393 Support intervals with day component when constifying now() * #4397 Support intervals with month component when constifying now() * #4405 Support ON CONFLICT ON CONSTRAINT for hypertables * #4412 Add telemetry about replication * #4415 Drop remote data when detaching data node * #4416 Handle TRUNCATE TABLE on chunks * #4425 Add parameter check_config to alter_job * #4430 Create index on Continuous Aggregates * #4439 Allow ORDER BY on continuous aggregates * #4443 Add stateful partition mappings * #4484 Use non-blocking data node connections for COPY * #4495 Support add_dimension() with existing data * #4502 Add chunks to baserel cache on chunk exclusion * #4545 Add hypertable distributed argument and defaults * #4552 Migrate Continuous Aggregates to the new format * #4556 Add runtime exclusion for hypertables * #4561 Change get_git_commit to return full commit hash * #4563 1 step CAgg policy management * #4641 Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill * #4642 Add timezone support to time_bucket **Bugfixes** * #4359 Create composite index on segmentby columns * #4374 Remove constified now() constraints from plan * #4416 Handle TRUNCATE TABLE on chunks * #4478 Synchronize chunk cache sizes * #4486 Adding boolean column with default value doesn't work on compressed table * #4512 Fix unaligned pointer access * #4519 Throw better error message on incompatible row fetcher settings * #4549 Fix dump_meta_data for windows * #4553 Fix timescaledb_post_restore GUC handling * #4573 Load TSL library on compressed_data_out call * #4575 Fix use of `get_partition_hash` and `get_partition_for_key` inside an IMMUTABLE function * #4577 Fix segfaults in compression code with corrupt data * #4580 Handle default privileges on CAggs properly * #4582 Fix assertion in GRANT .. ON ALL TABLES IN SCHEMA * #4583 Fix partitioning functions * #4589 Fix rename for distributed hypertable * #4601 Reset compression sequence when group resets * #4611 Fix a potential OOM when loading large data sets into a hypertable * #4624 Fix heap buffer overflow * #4627 Fix telemetry initialization * #4631 Ensure TSL library is loaded on database upgrades * #4646 Fix time_bucket_ng origin handling * #4647 Fix the error "SubPlan found with no parent plan" that occurred if using joins in RETURNING clause. **Thanks** * @AlmiS for reporting error on `get_partition_hash` executed inside an IMMUTABLE function * @Creatation for reporting an issue with renaming hypertables * @janko for reporting an issue when adding bool column with default value to compressed hypertable * @jayadevanm for reporting error of TRUNCATE TABLE on compressed chunk * @michaelkitson for reporting permission errors using default privileges on Continuous Aggregates * @mwahlhuetter for reporting error in joins in RETURNING clause * @ninjaltd and @mrksngl for reporting a potential OOM when loading large data sets into a hypertable * @PBudmark for reporting an issue with dump_meta_data.sql on Windows * @ssmoss for reporting an issue with time_bucket_ng origin handling
This release adds major new features since the 2.7.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * time_bucket now supports bucketing by month, year and timezone * Improve performance of bulk SELECT and COPY for distributed hypertables * 1 step CAgg policy management * Migrate Continuous Aggregates to the new format **Features** * #4188 Use COPY protocol in row-by-row fetcher * #4307 Mark partialize_agg as parallel safe * #4380 Enable chunk exclusion for space dimensions in UPDATE/DELETE * #4384 Add schedule_interval to policies * #4390 Faster lookup of chunks by point * #4393 Support intervals with day component when constifying now() * #4397 Support intervals with month component when constifying now() * #4405 Support ON CONFLICT ON CONSTRAINT for hypertables * #4412 Add telemetry about replication * #4415 Drop remote data when detaching data node * #4416 Handle TRUNCATE TABLE on chunks * #4425 Add parameter check_config to alter_job * #4430 Create index on Continuous Aggregates * #4439 Allow ORDER BY on continuous aggregates * #4443 Add stateful partition mappings * #4484 Use non-blocking data node connections for COPY * #4495 Support add_dimension() with existing data * #4502 Add chunks to baserel cache on chunk exclusion * #4545 Add hypertable distributed argument and defaults * #4552 Migrate Continuous Aggregates to the new format * #4556 Add runtime exclusion for hypertables * #4561 Change get_git_commit to return full commit hash * #4563 1 step CAgg policy management * #4641 Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill * #4642 Add timezone support to time_bucket **Bugfixes** * #4359 Create composite index on segmentby columns * #4374 Remove constified now() constraints from plan * #4416 Handle TRUNCATE TABLE on chunks * #4478 Synchronize chunk cache sizes * #4486 Adding boolean column with default value doesn't work on compressed table * #4512 Fix unaligned pointer access * #4519 Throw better error message on incompatible row fetcher settings * #4549 Fix dump_meta_data for windows * #4553 Fix timescaledb_post_restore GUC handling * #4573 Load TSL library on compressed_data_out call * #4575 Fix use of `get_partition_hash` and `get_partition_for_key` inside an IMMUTABLE function * #4577 Fix segfaults in compression code with corrupt data * #4580 Handle default privileges on CAggs properly * #4582 Fix assertion in GRANT .. ON ALL TABLES IN SCHEMA * #4583 Fix partitioning functions * #4589 Fix rename for distributed hypertable * #4601 Reset compression sequence when group resets * #4611 Fix a potential OOM when loading large data sets into a hypertable * #4624 Fix heap buffer overflow * #4627 Fix telemetry initialization * #4631 Ensure TSL library is loaded on database upgrades * #4646 Fix time_bucket_ng origin handling * #4647 Fix the error "SubPlan found with no parent plan" that occurred if using joins in RETURNING clause. **Thanks** * @AlmiS for reporting error on `get_partition_hash` executed inside an IMMUTABLE function * @Creatation for reporting an issue with renaming hypertables * @janko for reporting an issue when adding bool column with default value to compressed hypertable * @jayadevanm for reporting error of TRUNCATE TABLE on compressed chunk * @michaelkitson for reporting permission errors using default privileges on Continuous Aggregates * @mwahlhuetter for reporting error in joins in RETURNING clause * @ninjaltd and @mrksngl for reporting a potential OOM when loading large data sets into a hypertable * @PBudmark for reporting an issue with dump_meta_data.sql on Windows * @ssmoss for reporting an issue with time_bucket_ng origin handling
This release adds major new features since the 2.7.2 release. We deem it moderate priority for upgrading. This release includes these noteworthy features: * time_bucket now supports bucketing by month, year and timezone * Improve performance of bulk SELECT and COPY for distributed hypertables * 1 step CAgg policy management * Migrate Continuous Aggregates to the new format **Features** * #4188 Use COPY protocol in row-by-row fetcher * #4307 Mark partialize_agg as parallel safe * #4380 Enable chunk exclusion for space dimensions in UPDATE/DELETE * #4384 Add schedule_interval to policies * #4390 Faster lookup of chunks by point * #4393 Support intervals with day component when constifying now() * #4397 Support intervals with month component when constifying now() * #4405 Support ON CONFLICT ON CONSTRAINT for hypertables * #4412 Add telemetry about replication * #4415 Drop remote data when detaching data node * #4416 Handle TRUNCATE TABLE on chunks * #4425 Add parameter check_config to alter_job * #4430 Create index on Continuous Aggregates * #4439 Allow ORDER BY on continuous aggregates * #4443 Add stateful partition mappings * #4484 Use non-blocking data node connections for COPY * #4495 Support add_dimension() with existing data * #4502 Add chunks to baserel cache on chunk exclusion * #4545 Add hypertable distributed argument and defaults * #4552 Migrate Continuous Aggregates to the new format * #4556 Add runtime exclusion for hypertables * #4561 Change get_git_commit to return full commit hash * #4563 1 step CAgg policy management * #4641 Allow bucketing by month, year, century in time_bucket and time_bucket_gapfill * #4642 Add timezone support to time_bucket **Bugfixes** * #4359 Create composite index on segmentby columns * #4374 Remove constified now() constraints from plan * #4416 Handle TRUNCATE TABLE on chunks * #4478 Synchronize chunk cache sizes * #4486 Adding boolean column with default value doesn't work on compressed table * #4512 Fix unaligned pointer access * #4519 Throw better error message on incompatible row fetcher settings * #4549 Fix dump_meta_data for windows * #4553 Fix timescaledb_post_restore GUC handling * #4573 Load TSL library on compressed_data_out call * #4575 Fix use of `get_partition_hash` and `get_partition_for_key` inside an IMMUTABLE function * #4577 Fix segfaults in compression code with corrupt data * #4580 Handle default privileges on CAggs properly * #4582 Fix assertion in GRANT .. ON ALL TABLES IN SCHEMA * #4583 Fix partitioning functions * #4589 Fix rename for distributed hypertable * #4601 Reset compression sequence when group resets * #4611 Fix a potential OOM when loading large data sets into a hypertable * #4624 Fix heap buffer overflow * #4627 Fix telemetry initialization * #4631 Ensure TSL library is loaded on database upgrades * #4646 Fix time_bucket_ng origin handling * #4647 Fix the error "SubPlan found with no parent plan" that occurred if using joins in RETURNING clause. **Thanks** * @AlmiS for reporting error on `get_partition_hash` executed inside an IMMUTABLE function * @Creatation for reporting an issue with renaming hypertables * @janko for reporting an issue when adding bool column with default value to compressed hypertable * @jayadevanm for reporting error of TRUNCATE TABLE on compressed chunk * @michaelkitson for reporting permission errors using default privileges on Continuous Aggregates * @mwahlhuetter for reporting error in joins in RETURNING clause * @ninjaltd and @mrksngl for reporting a potential OOM when loading large data sets into a hypertable * @PBudmark for reporting an issue with dump_meta_data.sql on Windows * @ssmoss for reporting an issue with time_bucket_ng origin handling
Previous PR timescale#4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes timescale#4922
Previous PR timescale#4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes timescale#4922
Previous PR timescale#4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes timescale#4922
Previous PR timescale#4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes timescale#4922
Previous PR timescale#4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes timescale#4922
Previous PR timescale#4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes timescale#4922
Previous PR timescale#4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes timescale#4922
Previous PR #4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes #4922
Previous PR #4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes #4922
Previous PR #4307 mark `partialize_agg` and `finalize_agg` as parallel safe but this change is leading to incorrect results in some cases. Those functions are supposed work in parallel but seems is not the case and it is not evident yet the root cause and how to properly use it in parallel queries so we decided to revert this change and provide correct results to users. Fixes #4922
Just an experiment to see what happens.
UPD: the experiment turned out to be a success. Seems that we can mark the partialize_agg function as parallel safe always, and the parallel plans are generated depending on the underlying aggregate function. Postgres does the full expression tree traversal to check for parallel unsafe functions, as expected.
Fixes #4306
Commit message
Mark partialize_agg as parallel-safe
This function is used to push down aggregation to data nodes, and making it parallel-safe allows to use parallel plans on data nodes, if the underlying aggregate function is parallel-safe as well. Add negative and positive tests for this.