New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy chunk merge PR #3446
Copy chunk merge PR #3446
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3446 +/- ##
==========================================
+ Coverage 90.61% 90.70% +0.08%
==========================================
Files 211 212 +1
Lines 35684 36256 +572
==========================================
+ Hits 32335 32885 +550
- Misses 3349 3371 +22
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Individual commits were already reviewed.
Approve merge of feature to master.
Adds an internal API function to create an empty chunk table according the given hypertable for the given chunk table name and dimension slices. This functions creates a chunk table inheriting from the hypertable, so it guarantees the same schema. No TimescaleDB's metadata is updated. To be able to create the chunk table in a tablespace attached to the hyeprtable, this commit allows calculating the tablespace id without the dimension slice to exist in the catalog. If there is already a chunk, which collides on dimension slices, the function fails to create the chunk table. The function will be used internally in multi-node to be able to replicate a chunk from one data node to another.
Gives errors if any argument of create_chunk_table is NULL instead of being STRICT. Utilizes newly added macros for this.
When an empty chunk table is created, it is not associated with its hypertable in metadata, however it was inheriting from the hypertable. This commit removes inheritance, so the chunk table is completely standalone and cannot affect queries on the hypertable. In future, this fix of dropping inherit can be replaced with the chunk table creation, which does not create inheritance in the first place. Since it is larger amount of work, it was considered now.
Creates a table for chunk replica on the given data node. The table gets the same schema and name as the chunk. The created chunk replica table is not added into metadata on the access node or data node. The primary goal is to use it during copy/move chunk.
This function drops a chunk on a specified data node. It then removes the metadata about the datanode, chunk association on the access node. This function is meant for internal use as part of the "move chunk" functionality. If only one chunk replica remains then this function refuses to drop it to avoid data loss.
The "chunk_drop_replica" test is part of the overall chunk copy/move functionality. All related tests will go into this dist_chunk test. Also, fix the earlier flakiness in dist_chunk test by moving the compress_chunk calls into the shared setup
The `create_chunk` API has been extended to allow creating a chunk from an existing relational table. The table is turned into a chunk by attaching it to the root hypertable via inheritance. The purpose of this functionality is to allow copying a chunk to another node. First, the chunk table and data is copied. After that, the `create_chunk` can be executed to make the new table part of the hypertable. Currently, the relational table used to create the chunk has to match the hypertable in terms of constraints, triggers, etc. PostgreSQL itself enforces the existence of same-named CHECK constraints, but no enforcement currently exists for other objects, including triggers UNIQUE, PRIMARY KEY, or FOREIGN KEY constraints. Such enforcement can be implemented in the future, if deemed necessary. Another option is to automatically add all the required objects (triggers, constraints) based on the hypertable equivalents. However, that might also lead to duplicate objects in case some of them exist on the table prior to creating the chunk.
Add internal copy_chunk_data() function which implements a way to copy chunk data between data nodes using logical replication. This patch prepared together with @nikkhils.
The building blocks required for implementing end-to-end copy/move chunk functionality have now been wrapped in a procedure. A procedure is required because multiple transactions are needed to carry out the activity across the access node and the involved two data nodes. The following steps are encapsulated in this procedure 1) Create an empty chunk table on the destination data node 2) Copy the data from the src data node chunk to this newly created destination node chunk. This is done via inbuilt PostgreSQL logical replication functionality 3) Attach this chunk to the hypertable on the dst data node 4) Remove this chunk from the src data node to complete the move if requested A new catalog table "chunk_copy_activity" has been added to track the progress of the above stages. A unique id gets assigned to each activity and it is updated with the completed stages as things progress.
Remove copy_chunk_data() function and code needed to support it, such as the 'transactional' argument. Rework copy chunk logic using separate stages. Introduce copy_chunk() API function as an internal wrapper for the move_chunk().
We used to run transactions in autocommit mode on DN while running the chunk copy/move activity. This meant that any failures on the access node were de-coupled from the activity on the DN. This can make future cleanup messy since we wouldn't know what's failed/succeeded on the data nodes. We now drive the entire activity via transactions started on the access node.
A new view in the experimental schema shows information related to chunk replication. The view can be used to learn the replication status of a chunk while also providing a way to easily find nodes to move or copy chunks between in order to ensure a fully replicated multi-node cluster. Tests have been added to illustrate the potential usage.
Chunk constraints are now added after a chunk has been copied from one data node to another. The constraints are added when the chunk is made visible on the destination node, i.e., after data has been copied and the chunk's metadata is created. As an alternative, the constraints could be added when the chunk table is first created, but before the metadata for the chunk is added. This would have the benefit of validating each copied (inserted) row against the constraints during the data copy phase. However, this would also necessitate decoupling the step of creating the constraint metadata from the creation of the actual constraints since the other chunk metadata that is referenced does not yet exist. Such decoupling would require validating that the metadata actually matches the constraints of the table when the metadata is later created. One downside of adding the constraints after data copying is that it necessitates validating all the chunk's rows against the constraints after insertion as opposed to during insertion. If this turns out to be a performance issue, validation could be initially deferred. This is left as a future optimization.
Add tests to ensure API constraints and checks are working as expected as long as the basic functionality.
The internal function `copy_chunk_data` was removed as part of refactoring and is no longer necessary to remove in the downgrade script since the function was never part of a release.
This change fixes issues in the shared `dist_chunk` test that caused flakiness. Since the test shares a database and state with other tests running in parallel, it should modify the database (e.g., creating new tables and chunks) while the test runs. Such modifications will cause non-deterministic behavior that varies depending on the order the tests are run in. To fix this issue, all the table creations have been moved into the shared setup script and the test itself has been made less dependent on hard-coded IDs and chunk names. One of the tables used in the has been changed to use space-partitioning to make chunk placement on nodes more predictible.
A chunk copy/move operation is carried out in stages and it can fail in any of them. We track the last completed stage in the "chunk_copy_operation" catalog table. In case of failure, a "chunk_copy_cleanup" function can be invoked to bring the chunk back to its original state on the source datanode and all transient objects like replication slot, publication, subscription, empty chunk, metadata updates, etc are cleaned up. Includes test case changes for each and every stage induced failure. To avoid confusion between chunk copy activity and chunk copy operation this patch also consistently uses "operation" everywhere now instead of "activity"
This release adds new experimental features since the 2.3.1 release. The experimental features in this release are: * APIs for chunk manipulation across data nodes in a distributed hypertable setup. This includes the ability to add a data node and move chunks to the new data node for cluster rebalancing. * The `time_bucket_ng` function, a newer version of `time_bucket`. This function supports years, months, days, hours, minutes, and seconds. We’re committed to developing these experiments, giving the community a chance to provide early feedback and influence the direction of TimescaleDB’s development. We’ll travel faster with your input! Please create your feedback as a GitHub issue (using the experimental-schema label), describe what you found, and tell us the steps or share the code snip to recreate it. This release also includes several bug fixes. PostgreSQL 11 deprecation announcement Timescale is working hard on our next exciting features. To make that possible, we require functionality that is available in Postgres 12 and above. Postgres 11 is not supported with TimescaleDB 2.4. **Experimental Features** * timescale#3293 Add timescaledb_experimental schema * timescale#3302 Add block_new_chunks and allow_new_chunks API to experimental schema. Add chunk based refresh_continuous_aggregate. * timescale#3211 Introduce experimental time_bucket_ng function * timescale#3366 Allow use of experimental time_bucket_ng function in continuous aggregates * timescale#3408 Support for seconds, minutes and hours in time_bucket_ng * timescale#3446 Implement cleanup for chunk copy/move. **Bugfixes** * timescale#3401 Fix segfault for RelOptInfo without fdw_private * timescale#3411 Verify compressed chunk validity for compressed path * timescale#3416 Fix targetlist names for continuous aggregate views * timescale#3434 Remove extension check from relcache invalidation callback * timescale#3440 Fix remote_tx_heal_data_node to work with only current database **Thanks** * @fvannee for reporting an issue with hypertable expansion in functions * @amalek215 for reporting an issue with cache invalidation during pg_class vacuum full * @hardikm10 for reporting an issue with inserting into compressed chunks * @dberardo-com and @iancmcc for reporting an issue with extension updates after renaming columns of continuous aggregates.
This release adds new experimental features since the 2.3.1 release. The experimental features in this release are: * APIs for chunk manipulation across data nodes in a distributed hypertable setup. This includes the ability to add a data node and move chunks to the new data node for cluster rebalancing. * The `time_bucket_ng` function, a newer version of `time_bucket`. This function supports years, months, days, hours, minutes, and seconds. We’re committed to developing these experiments, giving the community a chance to provide early feedback and influence the direction of TimescaleDB’s development. We’ll travel faster with your input! Please create your feedback as a GitHub issue (using the experimental-schema label), describe what you found, and tell us the steps or share the code snip to recreate it. This release also includes several bug fixes. PostgreSQL 11 deprecation announcement Timescale is working hard on our next exciting features. To make that possible, we require functionality that is available in Postgres 12 and above. Postgres 11 is not supported with TimescaleDB 2.4. **Experimental Features** * timescale#3293 Add timescaledb_experimental schema * timescale#3302 Add block_new_chunks and allow_new_chunks API to experimental schema. Add chunk based refresh_continuous_aggregate. * timescale#3211 Introduce experimental time_bucket_ng function * timescale#3366 Allow use of experimental time_bucket_ng function in continuous aggregates * timescale#3408 Support for seconds, minutes and hours in time_bucket_ng * timescale#3446 Implement cleanup for chunk copy/move. **Bugfixes** * timescale#3401 Fix segfault for RelOptInfo without fdw_private * timescale#3411 Verify compressed chunk validity for compressed path * timescale#3416 Fix targetlist names for continuous aggregate views * timescale#3434 Remove extension check from relcache invalidation callback * timescale#3440 Fix remote_tx_heal_data_node to work with only current database **Thanks** * @fvannee for reporting an issue with hypertable expansion in functions * @amalek215 for reporting an issue with cache invalidation during pg_class vacuum full * @hardikm10 for reporting an issue with inserting into compressed chunks * @dberardo-com and @iancmcc for reporting an issue with extension updates after renaming columns of continuous aggregates.
This release adds new experimental features since the 2.3.1 release. The experimental features in this release are: * APIs for chunk manipulation across data nodes in a distributed hypertable setup. This includes the ability to add a data node and move chunks to the new data node for cluster rebalancing. * The `time_bucket_ng` function, a newer version of `time_bucket`. This function supports years, months, days, hours, minutes, and seconds. We’re committed to developing these experiments, giving the community a chance to provide early feedback and influence the direction of TimescaleDB’s development. We’ll travel faster with your input! Please create your feedback as a GitHub issue (using the experimental-schema label), describe what you found, and tell us the steps or share the code snip to recreate it. This release also includes several bug fixes. PostgreSQL 11 deprecation announcement Timescale is working hard on our next exciting features. To make that possible, we require functionality that is available in Postgres 12 and above. Postgres 11 is not supported with TimescaleDB 2.4. **Experimental Features** * timescale#3293 Add timescaledb_experimental schema * timescale#3302 Add block_new_chunks and allow_new_chunks API to experimental schema. Add chunk based refresh_continuous_aggregate. * timescale#3211 Introduce experimental time_bucket_ng function * timescale#3366 Allow use of experimental time_bucket_ng function in continuous aggregates * timescale#3408 Support for seconds, minutes and hours in time_bucket_ng * timescale#3446 Implement cleanup for chunk copy/move. **Bugfixes** * timescale#3401 Fix segfault for RelOptInfo without fdw_private * timescale#3411 Verify compressed chunk validity for compressed path * timescale#3416 Fix targetlist names for continuous aggregate views * timescale#3434 Remove extension check from relcache invalidation callback * timescale#3440 Fix remote_tx_heal_data_node to work with only current database **Thanks** * @fvannee for reporting an issue with hypertable expansion in functions * @amalek215 for reporting an issue with cache invalidation during pg_class vacuum full * @hardikm10 for reporting an issue with inserting into compressed chunks * @dberardo-com and @iancmcc for reporting an issue with extension updates after renaming columns of continuous aggregates.
This release adds new experimental features since the 2.3.1 release. The experimental features in this release are: * APIs for chunk manipulation across data nodes in a distributed hypertable setup. This includes the ability to add a data node and move chunks to the new data node for cluster rebalancing. * The `time_bucket_ng` function, a newer version of `time_bucket`. This function supports years, months, days, hours, minutes, and seconds. We’re committed to developing these experiments, giving the community a chance to provide early feedback and influence the direction of TimescaleDB’s development. We’ll travel faster with your input! Please create your feedback as a GitHub issue (using the experimental-schema label), describe what you found, and tell us the steps or share the code snip to recreate it. This release also includes several bug fixes. PostgreSQL 11 deprecation announcement Timescale is working hard on our next exciting features. To make that possible, we require functionality that is available in Postgres 12 and above. Postgres 11 is not supported with TimescaleDB 2.4. **Experimental Features** * timescale#3293 Add timescaledb_experimental schema * timescale#3302 Add block_new_chunks and allow_new_chunks API to experimental schema. Add chunk based refresh_continuous_aggregate. * timescale#3211 Introduce experimental time_bucket_ng function * timescale#3366 Allow use of experimental time_bucket_ng function in continuous aggregates * timescale#3408 Support for seconds, minutes and hours in time_bucket_ng * timescale#3446 Implement cleanup for chunk copy/move. **Bugfixes** * timescale#3401 Fix segfault for RelOptInfo without fdw_private * timescale#3411 Verify compressed chunk validity for compressed path * timescale#3416 Fix targetlist names for continuous aggregate views * timescale#3434 Remove extension check from relcache invalidation callback * timescale#3440 Fix remote_tx_heal_data_node to work with only current database **Thanks** * @fvannee for reporting an issue with hypertable expansion in functions * @amalek215 for reporting an issue with cache invalidation during pg_class vacuum full * @hardikm10 for reporting an issue with inserting into compressed chunks * @dberardo-com and @iancmcc for reporting an issue with extension updates after renaming columns of continuous aggregates.
This release adds new experimental features since the 2.3.1 release. The experimental features in this release are: * APIs for chunk manipulation across data nodes in a distributed hypertable setup. This includes the ability to add a data node and move chunks to the new data node for cluster rebalancing. * The `time_bucket_ng` function, a newer version of `time_bucket`. This function supports years, months, days, hours, minutes, and seconds. We’re committed to developing these experiments, giving the community a chance to provide early feedback and influence the direction of TimescaleDB’s development. We’ll travel faster with your input! Please create your feedback as a GitHub issue (using the experimental-schema label), describe what you found, and tell us the steps or share the code snip to recreate it. This release also includes several bug fixes. PostgreSQL 11 deprecation announcement Timescale is working hard on our next exciting features. To make that possible, we require functionality that is available in Postgres 12 and above. Postgres 11 is not supported with TimescaleDB 2.4. **Experimental Features** * #3293 Add timescaledb_experimental schema * #3302 Add block_new_chunks and allow_new_chunks API to experimental schema. Add chunk based refresh_continuous_aggregate. * #3211 Introduce experimental time_bucket_ng function * #3366 Allow use of experimental time_bucket_ng function in continuous aggregates * #3408 Support for seconds, minutes and hours in time_bucket_ng * #3446 Implement cleanup for chunk copy/move. **Bugfixes** * #3401 Fix segfault for RelOptInfo without fdw_private * #3411 Verify compressed chunk validity for compressed path * #3416 Fix targetlist names for continuous aggregate views * #3434 Remove extension check from relcache invalidation callback * #3440 Fix remote_tx_heal_data_node to work with only current database **Thanks** * @fvannee for reporting an issue with hypertable expansion in functions * @amalek215 for reporting an issue with cache invalidation during pg_class vacuum full * @hardikm10 for reporting an issue with inserting into compressed chunks * @dberardo-com and @iancmcc for reporting an issue with extension updates after renaming columns of continuous aggregates.
This release adds new experimental features since the 2.3.1 release. The experimental features in this release are: * APIs for chunk manipulation across data nodes in a distributed hypertable setup. This includes the ability to add a data node and move chunks to the new data node for cluster rebalancing. * The `time_bucket_ng` function, a newer version of `time_bucket`. This function supports years, months, days, hours, minutes, and seconds. We’re committed to developing these experiments, giving the community a chance to provide early feedback and influence the direction of TimescaleDB’s development. We’ll travel faster with your input! Please create your feedback as a GitHub issue (using the experimental-schema label), describe what you found, and tell us the steps or share the code snip to recreate it. This release also includes several bug fixes. PostgreSQL 11 deprecation announcement Timescale is working hard on our next exciting features. To make that possible, we require functionality that is available in Postgres 12 and above. Postgres 11 is not supported with TimescaleDB 2.4. **Experimental Features** * #3293 Add timescaledb_experimental schema * #3302 Add block_new_chunks and allow_new_chunks API to experimental schema. Add chunk based refresh_continuous_aggregate. * #3211 Introduce experimental time_bucket_ng function * #3366 Allow use of experimental time_bucket_ng function in continuous aggregates * #3408 Support for seconds, minutes and hours in time_bucket_ng * #3446 Implement cleanup for chunk copy/move. **Bugfixes** * #3401 Fix segfault for RelOptInfo without fdw_private * #3411 Verify compressed chunk validity for compressed path * #3416 Fix targetlist names for continuous aggregate views * #3434 Remove extension check from relcache invalidation callback * #3440 Fix remote_tx_heal_data_node to work with only current database **Thanks** * @fvannee for reporting an issue with hypertable expansion in functions * @amalek215 for reporting an issue with cache invalidation during pg_class vacuum full * @hardikm10 for reporting an issue with inserting into compressed chunks * @dberardo-com and @iancmcc for reporting an issue with extension updates after renaming columns of continuous aggregates.
Rebase on master and merge copy/chunk feature branch https://github.com/timescale/timescaledb/tree/multinode-feature-copy-chunk
Disable-check: commit-count