Skip to content

[Scheduler] Add percentage-based dialup for creation and migration#10348

Merged
lina-temporal merged 8 commits into
mainfrom
sched-chasm-percent
Jun 2, 2026
Merged

[Scheduler] Add percentage-based dialup for creation and migration#10348
lina-temporal merged 8 commits into
mainfrom
sched-chasm-percent

Conversation

@lina-temporal

Copy link
Copy Markdown
Contributor

What changed?

  • Gates CHASM scheduler creation and migration against a new rollout percentage, based on a stable hash of namespace + business ID.
  • Defaults to 100%, to preserve existing rollouts (rollout tooling will be updated to also set these keys)
  • Added a RolloutApplies helper, since we'll likely want to do this for other features going forward

Why?

  • We want finely-grained control for rollouts even within a namespace.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Comment thread common/rollout.go Outdated
// The hash function and its inputs (algorithm, input order, separator byte)
// are a load-bearing rollout key: changing any of them re-shuffles every
// namespace's cohort and breaks monotonicity for in-flight rollouts.
func RolloutAccepts(namespace, businessID string, percent int) bool {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is similar to dynamicconfig.GradualChange.. can we put it in the dynamicconfig package? (we shouldn't add more to common, it has way too much stuff and should be broken up)

also how about using farm.Fingerprint32 for consistency with all the other hashing stuff in the server?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is similar to dynamicconfig.GradualChange.. can we put it in the dynamicconfig package?

Sure, will do.

also how about using farm.Fingerprint32 for consistency with all the other hashing stuff in the server?

Yup, will fix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we do something like this?

// constants.go — replaces EnableCHASMSchedulerCreation + CHASMSchedulerCreationRolloutPercent
CHASMSchedulerCreation = NewNamespaceTypedSettingWithConverter(
    "history.chasmSchedulerCreation",
    ConvertPercentRollout, // handles plain bool for back-compat: true → {Enabled:true, Percent:100}
    PercentRollout{},
    `CHASMSchedulerCreation controls whether new ....`,
)
key := fmt.Appendf(nil, "%s\x00%s", namespaceName, scheduleID)
return wh.config.CHASMSchedulerCreation(namespaceName).Accepts(key)
CHASMSchedulerCreation dynamicconfig.TypedPropertyFnWithNamespaceFilter[dynamicconfig.PercentRollout]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As elegant as that is, I'm nervous to go with that since the behavior isn't clear to me in a rollback situation (e.g., a version without this patch getting a value like 100). Let's keep this a little bit simpler, even if a bit kludgy.

Comment thread common/dynamicconfig/constants.go Outdated
@chaptersix

chaptersix commented May 21, 2026

Copy link
Copy Markdown
Contributor

It'd be great if this new dynamic config type could have Namespace exclusion.

Cell wide percentage rollout with exclusions would be awesome.

Comment thread service/worker/scheduler/fx.go Outdated
Comment thread service/worker/scheduler/fx.go Outdated
wfFunc := func(ctx workflow.Context, args *schedulespb.StartScheduleArgs) error {
key := fmt.Appendf(nil, "%s\x00%s", nsName, args.State.ScheduleId)
enableMigration := s.enableCHASMMigration(nsName) &&
dynamicconfig.RolloutAccepts(key, s.chasmMigrationRolloutPercent(nsName))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make this dc actually dynamic? I should have done it the first time. I think you can do it by passing a closure to schedulerWorkflowWithSpecBuilder

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a closure doesn't avoid non-determinism, it should be in a side effect of some kind.. how about rolling this into the "tweakables" mutablesideeffect? That avoids extra markers if the config doesn't change.

The specBuilder thing is okay just because it's a pure cache

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get := func(ctx workflow.Context) any {

EnableCHASMMigration bool // Whether to automatically migrate this schedule to CHASM (V2)

we do that now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, once a v1 schedule is marked to migrate, it will keep attempting to migrate until it succeeds.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do (making this MutableSideEffect update during run rather than at scheduler spawn)

@chaptersix

Copy link
Copy Markdown
Contributor

It'd be great if this new dynamic config type could have Namespace exclusion.

Cell wide percentage rollout with exclusions would be awesome.

nvm, exclusions is probably a bad idea.

@lina-temporal lina-temporal merged commit d4e4958 into main Jun 2, 2026
49 checks passed
@lina-temporal lina-temporal deleted the sched-chasm-percent branch June 2, 2026 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants