You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched in the issues and found nothing similar.
Motivation
supports At-Least-once semantics
At present, the cdc connector only supports exactly-once semantics. To achieve this, each cdc connector has to read the backfill log for each snapshot split.
However, read backfill log will also increase burden in source database. For example, the Postgres cdc connector will establish many logical replication connections to the Postgres database, which can easily reach the max_sender_num or max_slot_number limit. Assuming there are 10 Postgres cdc sources and each runs 4 parallel processes, a total of 10*(4+1) = 50 replication connections will be created.
In many situations, the sink databases provides idempotence. Therefore, We can also support at-least-once semantics by skipping the backfill period, which will reduce budget on the source databases. Users can choose between at-least-once or exactly-once based on their demands.
Add Snapshot Hooks for better test
Currently, there is no suitable option to mock the real snapshot split period. This means that data changes can occur between the snapshot split periods. Most tests do not perform any database action during the snapshot split read duration, so the backfill process also does nothing, making it meaningless.
Some tests, such as SqlServerScanFetchTaskTest or SnapshotSplitReaderTest, utilize a MakeChangeEventTaskContext or MakeBinlogEventTaskContext to execute SQL commands before reaching the high watermark. However, this approach not only results in redundant code but also lacks flexibility.
For example:
What if I want to insert a message after the low watermark? This becomes necessary when implementing the ability to skip backfill, as the logs between the low watermark and snapshot completion would be duplicated.
What if I want to do some operations only one specified split?
Only by adding hooks in the framework layer can we provide more flexible testing options.
Solution
Support at-least-once semantic.
Add a SnapshotPhaseHooks with 4 hooks:
preLowWatermarkAction
postLowWatermarkAction
preHighWatermarkAction
postHighWatermarkAction
Are you willing to submit a PR?
I'm willing to submit a PR!
The text was updated successfully, but these errors were encountered:
Search before asking
Motivation
supports At-Least-once semantics
At present, the cdc connector only supports exactly-once semantics. To achieve this, each cdc connector has to read the backfill log for each snapshot split.
However, read backfill log will also increase burden in source database. For example, the Postgres cdc connector will establish many logical replication connections to the Postgres database, which can easily reach the max_sender_num or max_slot_number limit. Assuming there are 10 Postgres cdc sources and each runs 4 parallel processes, a total of 10*(4+1) = 50 replication connections will be created.
In many situations, the sink databases provides idempotence. Therefore, We can also support at-least-once semantics by skipping the backfill period, which will reduce budget on the source databases. Users can choose between at-least-once or exactly-once based on their demands.
Add Snapshot Hooks for better test
Currently, there is no suitable option to mock the real snapshot split period. This means that data changes can occur between the snapshot split periods. Most tests do not perform any database action during the snapshot split read duration, so the backfill process also does nothing, making it meaningless.
Some tests, such as SqlServerScanFetchTaskTest or SnapshotSplitReaderTest, utilize a MakeChangeEventTaskContext or MakeBinlogEventTaskContext to execute SQL commands before reaching the high watermark. However, this approach not only results in redundant code but also lacks flexibility.
For example:
Only by adding hooks in the framework layer can we provide more flexible testing options.
Solution
Support at-least-once semantic.
Add a SnapshotPhaseHooks with 4 hooks:
preLowWatermarkAction
postLowWatermarkAction
preHighWatermarkAction
postHighWatermarkAction
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: