Skip to content

Implement SignalWithStart as a system nexus endpoint#9833

Merged
spkane31 merged 51 commits into
mainfrom
spk/signal-with-start
May 20, 2026
Merged

Implement SignalWithStart as a system nexus endpoint#9833
spkane31 merged 51 commits into
mainfrom
spk/signal-with-start

Conversation

@spkane31
Copy link
Copy Markdown
Contributor

@spkane31 spkane31 commented Apr 6, 2026

What changed?

This PR adds SignalWithStartWorkflowExecution as a synchronous Nexus operation exposed via __temporal_system endpoint, allowing workflows to signal-with-start other workflows through the CHASM Nexus operation framework.

Key changes:

  • chasm/lib/workflow/nexus_service.go: workflowServiceNexusHandler implements the SignalWithStartWorkflowExecution Nexus sync operation by resolving the namespace and delegating to the History service. A SignalWithStartOperationProcessor handles input enrichment (namespace, request ID, links) and routing via CHASM's NexusOperationProcessorResult.
  • chasm/lib/workflow/library.go — library now holds the workflowServiceNexusHandler, the workflow Config, the SA mapper provider, and the SA validator. newLibrary (used by fx) takes those dependencies, while the public NewLibrary keeps its old signature for external callers. Adds NexusServices() so the library registers its Nexus service via CHASM.
  • chasm/lib/workflow/validator.go: RequestValidator consolidates the SignalWithStartWorkflowExecution validation logic (previously inlined in WorkflowHandler) into a reusable, injectable struct. This same validator is used by both the frontend handler and the new CHASM processor.
  • common/dynamicconfig/constants.go — adds EnableSignalWithStartFromWorkflow (namespace-scoped, default false).
  • service/frontend/workflow_handler.go: Removed the SignalWithStart validation block
  • service/history/fx.go: Provides a HistoryServiceServerProvider so the CHASM workflow library can call the history handler directly.
  • temporal/fx.go: Removes the now-redundant ChasmLibraryOptions grouping; each service module registers its own CHASM libraries.
  • components/nexusoperations/workflow/commands.go: NotFound and InvalidArgument errors during Nexus command handling are now surfaced as workflow task failures instead of being treated as transient handler errors.
  • common/payloads: Adds EncodeSingle, MustEncodeSingle, and MustEncode helpers used in tests.
  • cmd/tools/getproto: Adds support for nexus-proto-annotations proto imports.
  • tests/signal_with_start_from_workflow_test.go: Functional test suite covering the happy path, duplicate detection, conflict policies, and validation rejection for the new Nexus operation.

Why?

This functionality is one of our most requested GitHub issues.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

The history service now directly exposes a HistoryServiceServer interface via fx for injection into the CHASM workflow library. This tight coupling between the CHASM workflow library and the history handler could complicate future layering — callers outside the history service should not adopt this pattern. The feature is gated by history.enableSignalWithStartFromWorkflow for rollout.

@spkane31 spkane31 requested a review from lina-temporal April 7, 2026 16:19
@spkane31 spkane31 requested a review from bergundy April 10, 2026 22:05
Copy link
Copy Markdown
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Make sure to unit test the validator and I would revert the changes to the error strings that I made in my prototype. It's best not to change the existing experience for all of the common workflow APIs.

s.Nil(failure)
s.True(resp.Started, "expected Started=true when starting a new workflow")
s.NotEmpty(resp.RunId)
defer func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make to put the defer at the bottom here.

Comment thread chasm/lib/workflow/nexus_service.go Outdated
}

// SignalWithStartWorkflowExecution implements the SignalWithStartWorkflowExecution Nexus operation.
func (h *workflowServiceNexusHandler) SignalWithStartWorkflowExecution(name string) nexus.Operation[*workflowservice.SignalWithStartWorkflowExecutionRequest, *workflowservice.SignalWithStartWorkflowExecutionResponse] {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this function, change this to register below avoid the closure:

nexus.NewSyncOperation(workflowservicenexus.WorkflowService.SignalWithStartWorkflowExecution.Name(), h.signalWithStartWorkflowExecution) 

@spkane31 spkane31 requested a review from bergundy April 16, 2026 19:38
Copy link
Copy Markdown
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with some small comments.

Comment thread chasm/lib/workflow/fx.go
// HistoryHandlerModule wires the workflow library's Nexus handler to the
// history service. Only include this in services that provide
// historyservice.HistoryServiceServer (the history service).
var HistoryHandlerModule = fx.Invoke(func(library *library, historyHandler historyservice.HistoryServiceServer) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be an fx.Module?

Comment thread chasm/lib/workflow/nexus_service.go Outdated
"go.temporal.io/server/common/searchattribute"
)

var ErrSystemNexusOperationsDisabled = serviceerror.NewUnimplemented("System Nexus operations are disabled")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this specific to the signal with start operation. We will want to enable each operation separately same way that we do for gRPC methods.

System nexus will already error out when a handler for the given operation isn't registered during command processing (or task execution, not sure exactly which one).

if err != nil {
return nil, err
}
link := commonnexus.ConvertLinkWorkflowEventToNexusLink(&commonpb.Link_WorkflowEvent{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backlink should be generated in the history handler so it can ensure that the request ID is properly mapped to an event ID in order for the UI to properly follow the link (the mapping is available as part of the DescribeWorkflowExecution response for other events).
I missed this in the initial review, that's something I haven't completed when I handed over the work to you. I don't want to block the PR for this because it's a non-trivial amount of work that the Nexus team was going to tackle anyways.

if err != nil {
var notFoundErr *serviceerror.NotFound
var invalidArgumentErr *serviceerror.InvalidArgument
if errors.As(err, &notFoundErr) || errors.As(err, &invalidArgumentErr) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use the more modern errors.AsType.

currentWorkflowLease.GetMutableState().IsWorkflowExecutionRunning() &&
signalWithStartRequest.WorkflowIdConflictPolicy != enumspb.WORKFLOW_ID_CONFLICT_POLICY_TERMINATE_EXISTING {
signalWithStartRequest.WorkflowIdConflictPolicy != enumspb.WORKFLOW_ID_CONFLICT_POLICY_TERMINATE_EXISTING &&
signalWithStartRequest.WorkflowIdConflictPolicy != enumspb.WORKFLOW_ID_CONFLICT_POLICY_FAIL {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change this to explicitly check for use existing?

switch r := result.(type) {
case interface{ ValueAsAny() any }:
ps, err := payloads.Encode(r.ValueAsAny())
ps, err := sdkconverter.PreferProtoDataConverter.ToPayloads(r.ValueAsAny())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we aligned with UI and SDK that we are going to use binary proto for responses?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tconley1428 is aligned on this.

Copy link
Copy Markdown
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@lina-temporal lina-temporal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments mostly for my comprehension, approved

Comment thread tests/signal_with_start_from_workflow_test.go
Comment thread tests/signal_with_start_from_workflow_test.go
Comment thread tests/signal_with_start_from_workflow_test.go Outdated
Comment on lines +322 to +326
// desc, err := s.SdkClient().DescribeWorkflowExecution(ctx, workflowID, response.RunID)
// s.NoError(err)
// requestIDInfos := desc.GetWorkflowExtendedInfo().GetRequestIdInfos()
// requestID := slices.Collect(maps.Keys(requestIDInfos))[0]
// s.Equal(opScheduledEvent.GetNexusOperationScheduledEventAttributes().GetRequestId(), requestID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why's this commented out?

Comment thread service/worker/fx.go
migration.Module,
resource.Module,
deletenamespace.Module,
chasmscheduler.Module,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these have to be included? Worker shouldn't need chasmscheduler's module injected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without these the containers fail to start with:

 logger.go:146: 2026-05-20T15:21:09.404Z	panic	FATAL: unable to start worker service	{"error": "could not build arguments for function \"go.temporal.io/server/service/worker\".ServiceLifetimeHooks (/home/runner/work/temporal/temporal/service/worker/fx.go:192): failed to build *worker.Service: could not build arguments for function \"go.temporal.io/server/service/worker\".NewService (/home/runner/work/temporal/temporal/service/worker/service.go:111): failed to build *worker.PerNamespaceWorkerManager: could not build arguments for function \"go.temporal.io/server/service/worker\".PerNamespaceWorkerManagerProvider (/home/runner/work/temporal/temporal/service/worker/fx.go:207): could not build value group common.PerNSWorkerComponent[group=\"perNamespaceWorkerComponent\"]: missing dependencies for function \"go.temporal.io/server/service/worker/scheduler\".NewResult (/home/runner/work/temporal/temporal/service/worker/scheduler/fx.go:81): missing type: *scheduler.SpecBuilder", "errorVerbose": "could not build arguments for function \"go.temporal.io/server/service/worker\".ServiceLifetimeHooks\n\t/home/runner/work/temporal/temporal/service/worker/fx.go:192:\nfailed to build *worker.Service:\ncould not build arguments for function \"go.temporal.io/server/service/worker\".NewService\n\t/home/runner/work/temporal/temporal/service/worker/service.go:111:\nfailed to build *worker.PerNamespaceWorkerManager:\ncould not build arguments for function \"go.temporal.io/server/service/worker\".PerNamespaceWorkerManagerProvider\n\t/home/runner/work/temporal/temporal/service/worker/fx.go:207:\ncould not build value group common.PerNSWorkerComponent[group=\"perNamespaceWorkerComponent\"]:\nmissing dependencies for function \"go.temporal.io/server/service/worker/scheduler\".NewResult\n\t/home/runner/work/temporal/temporal/service/worker/scheduler/fx.go:81:\nmissing type:\n\t- *scheduler.SpecBuilder (did you mean to Provide it?)", "host": "127.0.0.1:42869", "logging-call-at": "/home/runner/work/temporal/temporal/tests/testcore/onebox.go:663"}

Comment thread service/matching/fx.go Outdated
Comment on lines +36 to +37
scheduler.Module,
callback.Module,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does matching need these now?

)

type Config struct {
maxIDLengthLimit dynamicconfig.IntPropertyFn
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this (and maybe linkMaxSize) could be simple constants, since we probably wouldn't want these particular limits for specific customers. Up to you.

@spkane31 spkane31 enabled auto-merge (squash) May 20, 2026 15:35
@awln-temporal awln-temporal self-requested a review May 20, 2026 17:42
@spkane31 spkane31 merged commit 01aa279 into main May 20, 2026
70 of 72 checks passed
@spkane31 spkane31 deleted the spk/signal-with-start branch May 20, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants