[Scheduler] Add percentage-based dialup for creation and migration#10348
Conversation
| // The hash function and its inputs (algorithm, input order, separator byte) | ||
| // are a load-bearing rollout key: changing any of them re-shuffles every | ||
| // namespace's cohort and breaks monotonicity for in-flight rollouts. | ||
| func RolloutAccepts(namespace, businessID string, percent int) bool { |
There was a problem hiding this comment.
this is similar to dynamicconfig.GradualChange.. can we put it in the dynamicconfig package? (we shouldn't add more to common, it has way too much stuff and should be broken up)
also how about using farm.Fingerprint32 for consistency with all the other hashing stuff in the server?
There was a problem hiding this comment.
this is similar to dynamicconfig.GradualChange.. can we put it in the dynamicconfig package?
Sure, will do.
also how about using farm.Fingerprint32 for consistency with all the other hashing stuff in the server?
Yup, will fix.
There was a problem hiding this comment.
could we do something like this?
// constants.go — replaces EnableCHASMSchedulerCreation + CHASMSchedulerCreationRolloutPercent
CHASMSchedulerCreation = NewNamespaceTypedSettingWithConverter(
"history.chasmSchedulerCreation",
ConvertPercentRollout, // handles plain bool for back-compat: true → {Enabled:true, Percent:100}
PercentRollout{},
`CHASMSchedulerCreation controls whether new ....`,
)key := fmt.Appendf(nil, "%s\x00%s", namespaceName, scheduleID)
return wh.config.CHASMSchedulerCreation(namespaceName).Accepts(key)
CHASMSchedulerCreation dynamicconfig.TypedPropertyFnWithNamespaceFilter[dynamicconfig.PercentRollout]
There was a problem hiding this comment.
As elegant as that is, I'm nervous to go with that since the behavior isn't clear to me in a rollback situation (e.g., a version without this patch getting a value like 100). Let's keep this a little bit simpler, even if a bit kludgy.
|
It'd be great if this new dynamic config type could have Namespace exclusion. Cell wide percentage rollout with exclusions would be awesome. |
# Conflicts: # tests/schedule_test.go
| wfFunc := func(ctx workflow.Context, args *schedulespb.StartScheduleArgs) error { | ||
| key := fmt.Appendf(nil, "%s\x00%s", nsName, args.State.ScheduleId) | ||
| enableMigration := s.enableCHASMMigration(nsName) && | ||
| dynamicconfig.RolloutAccepts(key, s.chasmMigrationRolloutPercent(nsName)) |
There was a problem hiding this comment.
could you make this dc actually dynamic? I should have done it the first time. I think you can do it by passing a closure to schedulerWorkflowWithSpecBuilder
There was a problem hiding this comment.
Passing a closure doesn't avoid non-determinism, it should be in a side effect of some kind.. how about rolling this into the "tweakables" mutablesideeffect? That avoids extra markers if the config doesn't change.
The specBuilder thing is okay just because it's a pure cache
There was a problem hiding this comment.
temporal/service/worker/scheduler/workflow.go
Line 1269 in e22e630
temporal/service/worker/scheduler/workflow.go
Line 162 in e22e630
we do that now.
There was a problem hiding this comment.
Also, once a v1 schedule is marked to migrate, it will keep attempting to migrate until it succeeds.
There was a problem hiding this comment.
Will do (making this MutableSideEffect update during run rather than at scheduler spawn)
nvm, exclusions is probably a bad idea. |
# Conflicts: # tests/schedule_migration_test.go
What changed?
Why?
How did you test it?