Skip to content

[Feature] Cross-cluster replication for cloud-native table (Part-1: implementation in FE ) #60586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wxl24life
Copy link
Contributor

Why I'm doing:

What I'm doing:

This PR introduces an online data migration solution tailored for shared-data clusters, complementing the earlier storage-coupled migration strategy.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.5
    • 3.4
    • 3.3

Signed-off-by: drake_wang <wxl250059@alibaba-inc.com>
@wxl24life wxl24life requested a review from a team as a code owner July 3, 2025 22:32
@wanpengfei-git wanpengfei-git requested review from a team July 3, 2025 22:32
Copy link

github-actions bot commented Jul 3, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Jul 4, 2025

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link
Contributor

@alvin-celerdata alvin-celerdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request changes for potential incompatibility.

@@ -216,6 +217,7 @@ enum TTaskType {
REPLICATE_SNAPSHOT,
UPDATE_SCHEMA,
COMPACTION_CONTROL,
REPLICATE_LAKE_REMOTE_STORAGE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All new added enum items should be added to the last to be compatible during upgrading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed, could reuse REPLICATE_SNAPSHOT

@wxl24life wxl24life changed the title [Feature] Add support for shared-data cross-cluster replication (Part-1: replication implementation in FE ) [Feature] Cross-cluster replication for cloud-native table (Part-1: implementation in FE ) Jul 4, 2025
@@ -1922,6 +1923,11 @@ struct TTableReplicationRequest {
7: optional i64 src_table_data_size
8: optional map<i64, TPartitionReplicationInfo> partition_replication_infos
9: optional string job_id
10: optional Types.TRunMode src_cluster_run_mode
Copy link
Contributor

@xiangguangyxg xiangguangyxg Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use src_table_type to tell whether the source is shared data or nothing, not need src_cluster_run_mode

Comment on lines +426 to +444
struct TReplicateLakeRemoteStorageRequest {
1: optional Types.TTransactionId transaction_id
2: optional Types.TTableId table_id
3: optional Types.TPartitionId partition_id
4: optional Types.TTabletId tablet_id
5: optional TTabletType tablet_type
6: optional Types.TSchemaHash schema_hash
7: optional Types.TVersion visible_version
8: optional Types.TTabletId src_tablet_id
9: optional TTabletType src_tablet_type
10: optional Types.TSchemaHash src_schema_hash
11: optional Types.TVersion src_visible_version
12: optional binary encryption_meta
13: optional Types.TVersion data_version
14: optional Types.TTabletId faked_shard_id
15: optional Types.TDatabaseId src_db_id
16: optional Types.TTableId src_table_id
17: optional Types.TPartitionId src_partition_id
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can add some fileds in the TReplicateSnapshotRequest and reuse it.
Using src_tablet_type to tell whether the source is shared data or nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants