Run one per-ns worker per namespace instead of namespace × component #3116

dnr · 2022-07-18T23:28:12Z

What changed?
Rearrange per-namespace worker manager to always run all components in the same worker+client+task queue.

Why?
Reduces load when we have more than one component.

How did you test it?
updated unit test

Potential risks

Is hotfix candidate?

service/worker/pernamespaceworker.go

yux0 · 2022-07-19T00:46:26Z

LGTM

common/dynamicconfig/constants.go

service/worker/pernamespaceworker.go

yiminc · 2022-07-21T16:32:34Z

service/worker/pernamespaceworker.go

 }

-func (wm *perNamespaceWorkerManager) responsibleForNamespace(ns *namespace.Namespace, queueName string, num int) (int, error) {
-	// This can result in fewer than the intended number of workers if num > 1, because
+func (wm *perNamespaceWorkerManager) responsibleForNamespace(ns *namespace.Namespace) (int, error) {


maybe rename to getWorkerCount?

I used getWorkerMultiplicity since that's what it's returning, the number of those workers that land on the current node

yiminc · 2022-07-21T16:38:40Z

service/worker/pernamespaceworker.go

+	policy.SetMaximumInterval(1 * time.Minute)
+	policy.SetExpirationInterval(backoff.NoInterval)


If something goes wrong, worker will fall in this retry loop forever? Is there metrics to detect this and logs for troubleshooting?

should it stop retrying at some point? that seems worse: if there was some transient problem blocking the creation of a client, and we give up, but it's fixed after ten minutes, then we would have to wait for a membership change to try again.

I added some logging.

for metrics maybe we'd want a gauge of how many are stuck retrying at once?

keep slowly retry is fine, but we want to be able to be alert in case it keeps failing for long time, and have logs to investigate.

I added a todo for metrics also, but I think we can merge it for now

…3116)

dnr requested a review from yux0 July 18, 2022 23:28

yux0 reviewed Jul 19, 2022

View reviewed changes

service/worker/pernamespaceworker.go Outdated Show resolved Hide resolved

yiminc reviewed Jul 21, 2022

View reviewed changes

dnr marked this pull request as ready for review July 21, 2022 20:38

dnr requested a review from a team as a code owner July 21, 2022 20:38

yiminc approved these changes Jul 21, 2022

View reviewed changes

dnr added 6 commits July 27, 2022 16:26

Run one per-ns worker per namespace instead of namespace × component

a2e36f9

renames

4bd340b

refactor to make it easier to understand

a63784a

add more logging

d0e1104

update test

1173e0b

package name

042c637

dnr force-pushed the pernsworker branch from 322eea9 to 042c637 Compare July 27, 2022 23:31

dnr enabled auto-merge (squash) July 27, 2022 23:43

dnr merged commit 87847e0 into temporalio:master Jul 27, 2022

dnr mentioned this pull request Jul 30, 2022

Batch operation per namespace worker #3094

Merged

dnr added the release/1.17.3 label Aug 9, 2022

yycptt pushed a commit that referenced this pull request Aug 12, 2022

Run one per-ns worker per namespace instead of namespace × component (#…

bd22047

…3116)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run one per-ns worker per namespace instead of namespace × component #3116

Run one per-ns worker per namespace instead of namespace × component #3116

dnr commented Jul 18, 2022 •

edited

Loading

yux0 commented Jul 19, 2022

yiminc Jul 21, 2022

dnr Jul 21, 2022

yiminc Jul 21, 2022

dnr Jul 21, 2022

yiminc Jul 21, 2022

dnr Jul 27, 2022

		policy.SetMaximumInterval(1 * time.Minute)
		policy.SetExpirationInterval(backoff.NoInterval)

Run one per-ns worker per namespace instead of namespace × component #3116

Run one per-ns worker per namespace instead of namespace × component #3116

Conversation

dnr commented Jul 18, 2022 • edited Loading

yux0 commented Jul 19, 2022

yiminc Jul 21, 2022

Choose a reason for hiding this comment

dnr Jul 21, 2022

Choose a reason for hiding this comment

yiminc Jul 21, 2022

Choose a reason for hiding this comment

dnr Jul 21, 2022

Choose a reason for hiding this comment

yiminc Jul 21, 2022

Choose a reason for hiding this comment

dnr Jul 27, 2022

Choose a reason for hiding this comment

dnr commented Jul 18, 2022 •

edited

Loading