Misc changes to migration workflow #2838

meiliang86 · 2022-05-12T06:44:43Z

What changed?

Why?

How did you test it?

Potential risks

Is hotfix candidate?

- Separates workflows and activities out into separate files for cleanliness. - Fixes WorkflowTaskTimeout if ConcurrentActivityTask count was too high. - Replaces ForceReplicationInput of PerActivityRPS with an OverallRPS - Adds basic unit tests for the Namespace Handover Workflow. - Misc code cleanup. - Adds QueryHandler to force-replication workflow so a caller can better gauge status of the force-replication part.

mastermanu · 2022-05-12T16:03:58Z

service/worker/migration/activities.go

@@ -470,6 +134,9 @@ func (a *activities) checkReplicationOnce(ctx context.Context, waitRequest waitR
 				tag.NewDurationTag("AllowedLagging", waitRequest.AllowedLagging),
 				tag.NewDurationTag("ActualLagging", shard.ShardLocalTime.Sub(*clusterInfo.AckedTaskVisibilityTime)),
 				tag.NewStringTag("RemoteCluster", waitRequest.RemoteCluster),
+				tag.NewInt64("MaxReplicationTaskId", shard.MaxReplicationTaskId),
+				tag.NewTimeTag("ShardLocalTime", *shard.ShardLocalTime),


can you trace AllowedLaggingTasks (we are anyways also tracing AllowedLagging). Also might be easier to debug if we also trace the subtraction of MaxTaskID and AckedTaskID (slightly less math someone has to perform :))

mastermanu · 2022-05-12T16:04:44Z

service/worker/migration/activities.go

 	for _, shard := range resp.Shards {
 		clusterInfo, hasClusterInfo := shard.RemoteClusters[waitRequest.RemoteCluster]
 		if hasClusterInfo {
-			if clusterInfo.AckedTaskId == shard.MaxReplicationTaskId ||
+			if shard.MaxReplicationTaskId-clusterInfo.AckedTaskId <= waitRequest.AllowedLaggingTasks ||
 				(clusterInfo.AckedTaskId >= waitRequest.WaitForTaskIds[shard.ShardId] &&
 					shard.ShardLocalTime.Sub(*clusterInfo.AckedTaskVisibilityTime) <= waitRequest.AllowedLagging) {


with the altered condition of the first conditional in the if statement, how often do you expect the AllowedLaggingTasks to be out of the threshold but the times to be in threshold

could happen when load is high? threshold for AllowedLaggingTasks should be very low (I'm thinking 3 or 5 at most).

We need to make sure this is always true: clusterInfo.AckedTaskId >= waitRequest.WaitForTaskIds[shard.ShardId]

mastermanu · 2022-05-12T16:08:03Z

service/worker/migration/force_replication_workflow.go

+	workflow.Go(ctx, func(ctx workflow.Context) {
+		listWorkflowsErr = listWorkflowsForReplication(ctx, workflowExecutionsCh, &params)
+
+		// enqueueReplicationTasks only returns when listWorkflowCh is closed (or if it encounters an error).


comment references listWorkflowCh, but we should change it to workflowExecutionsCh

Co-authored-by: Manu Srivastava <manu@temporal.io>

mastermanu and others added 2 commits May 11, 2022 22:31

Add more logs for replication workflow

6c7363d

meiliang86 requested a review from a team as a code owner May 12, 2022 06:44

meiliang86 changed the title ~~Migration workflow changes~~ Misc changes to migration workflow May 12, 2022

mastermanu reviewed May 12, 2022

View reviewed changes

mastermanu approved these changes May 12, 2022

View reviewed changes

Add allowed lagging tasks

ee40c22

meiliang86 force-pushed the migration-workflow branch from 97b17bc to ee40c22 Compare May 12, 2022 16:51

meiliang86 merged commit 85978d5 into temporalio:master May 12, 2022

yiminc mentioned this pull request May 12, 2022

First pass at overhauling namespace-related migration workflows #2654

Closed

meiliang86 added a commit that referenced this pull request May 12, 2022

Misc changes to migration workflow (#2838)

6f559e4

Co-authored-by: Manu Srivastava <manu@temporal.io>

meiliang86 added a commit that referenced this pull request May 27, 2022

Misc changes to migration workflow (#2838)

d6099cd

Co-authored-by: Manu Srivastava <manu@temporal.io>

meiliang86 added a commit that referenced this pull request Jun 1, 2022

Misc changes to migration workflow (#2838)

aad0f85

Co-authored-by: Manu Srivastava <manu@temporal.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc changes to migration workflow #2838

Misc changes to migration workflow #2838

meiliang86 commented May 12, 2022

mastermanu May 12, 2022

mastermanu May 12, 2022

meiliang86 May 12, 2022

yiminc May 12, 2022

mastermanu May 12, 2022

Misc changes to migration workflow #2838

Misc changes to migration workflow #2838

Conversation

meiliang86 commented May 12, 2022

mastermanu May 12, 2022

Choose a reason for hiding this comment

mastermanu May 12, 2022

Choose a reason for hiding this comment

meiliang86 May 12, 2022

Choose a reason for hiding this comment

yiminc May 12, 2022

Choose a reason for hiding this comment

mastermanu May 12, 2022

Choose a reason for hiding this comment