Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ndc history resender to handle multiple remote clusters #2866

Merged
merged 3 commits into from
May 19, 2022

Conversation

yycptt
Copy link
Member

@yycptt yycptt commented May 18, 2022

What changed?

  • Refactor ndc history resender to handle multiple remote clusters

Why?

  • Existing history resend is bind to particular remote cluster and can't be used for resending history from other remote clusters. This means we need to create one resender for each remote cluster and also worry about cluster metadata update.
  • Needed by Guarantee history task execution #2864, so that we don't need to refactor existing standby task executor implementation (or creating multiple standby executors and add callback for cluster metadata update) but still be able to handle standby tasks which need to resend history from different remote clusters.

How did you test it?

  • Existing tests

Potential risks

Is hotfix candidate?

@yycptt yycptt requested a review from yux0 May 18, 2022 23:45
@yycptt yycptt requested a review from a team as a code owner May 18, 2022 23:45
@@ -66,6 +67,7 @@ type (
// NewTaskExecutor creates a replication task executor
// The executor uses by 1) DLQ replication task handler 2) history replication task processor
func NewTaskExecutor(
remoteCluster string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the remoteCluster be the active cluster from namespace cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the question is why we still have one replication task executor for each remote cluster, I think we can refactor that too and have only one executor? cc @yux0

If it's about why not letting xdc resender itself figure out the remote cluster name from ns cache, then I thought about it as well. The main concern is that if the caller can't control which remote cluster the resender is talking to, there might be a mismatch between caller and resender. E.g. when caller is replication task executor for cluster A, the resend may resend from cluster B. Another example is in standby timer/transfer task executor, we may connect to cluster A to refresh task, but may resend history from cluster B. It's rare but will make behavior reasoning much harder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the question is why we still have one replication task executor for each remote cluster, I think we can refactor that too and have only one executor?

Yes. We can do this

@yycptt yycptt enabled auto-merge (squash) May 19, 2022 21:20
@yycptt yycptt disabled auto-merge May 19, 2022 21:20
@yycptt yycptt merged commit d36291f into temporalio:master May 19, 2022
@yycptt yycptt deleted the refactor-history-resender branch May 19, 2022 23:36
Sushisource pushed a commit to Sushisource/temporal that referenced this pull request Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants