Skip to content

[Nexus-Chasm] Migrate OperationInvocationTaskExecutor#9680

Merged
bergundy merged 9 commits intonexus/hsm-to-chasm-migrationfrom
cg/nexus/task_executors_2
Apr 6, 2026
Merged

[Nexus-Chasm] Migrate OperationInvocationTaskExecutor#9680
bergundy merged 9 commits intonexus/hsm-to-chasm-migrationfrom
cg/nexus/task_executors_2

Conversation

@gow
Copy link
Copy Markdown
Contributor

@gow gow commented Mar 26, 2026

What changed?

This PR migrates the Nexus operation invocation task handler from HSM version to Chasm.

Why?

Migrating from HSM to Chasm

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Note

Medium Risk
Introduces a new CHASM-based Nexus StartOperation execution path with endpoint lookup, callback URL/token generation, and error classification; mistakes could cause failed invocations, incorrect retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by strict task validation and added unit coverage, but the change touches critical workflow/history integration and outbound request handling.

Overview
Migrates Nexus operation invocation execution to CHASM by implementing OperationInvocationTaskHandler.Validate/Execute end-to-end, including endpoint resolution (ID with name fallback), callback URL selection (system vs templated), callback token generation, timeout budgeting, outbound StartOperation calls (HTTP or internal history service), metrics/logging, and classification of results into operation state transitions.

Adds supporting plumbing: OperationStore.NexusOperationInvocationData and workflow implementation that loads invocation input/headers from the scheduled history event, plus a new MSPointer.LoadHistoryEvent/NodeBackend.LoadHistoryEvent API. Configuration is extended to parse CallbackURLTemplate into a *template.Template, add UseSystemCallbackURL, and pass NumHistoryShards for internal routing; new helper utilities centralize callback building, error/failure conversion, and internal/HTTP start logic.

Written by Cursor Bugbot for commit 4b13977. This will update automatically on new commits. Configure here.

@gow gow requested review from a team as code owners March 26, 2026 06:39
@gow gow marked this pull request as draft March 26, 2026 06:39
@gow gow force-pushed the cg/nexus/task_executors_2 branch from 78f3c98 to 761370c Compare March 27, 2026 16:49
@gow gow force-pushed the cg/nexus/task_executors_1 branch 2 times, most recently from 350d41a to b29957d Compare March 30, 2026 04:43
@gow gow force-pushed the cg/nexus/task_executors_2 branch from 761370c to c0a5d28 Compare March 30, 2026 04:44
Base automatically changed from cg/nexus/task_executors_1 to nexus/hsm-to-chasm-migration March 30, 2026 05:26
@gow gow force-pushed the cg/nexus/task_executors_2 branch 3 times, most recently from 7492591 to c480b70 Compare March 30, 2026 06:47
@gow gow requested review from bergundy and stephanos March 30, 2026 07:12
@gow gow marked this pull request as ready for review March 30, 2026 07:12
@gow gow requested a review from Quinn-With-Two-Ns March 30, 2026 17:31
Copy link
Copy Markdown
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting what I have so far since there's going to be another pass when you add the tests.

query complexity. Consider the cardinality impact when enabling these tags.`,
)

var UseSystemCallbackURL = dynamicconfig.NewGlobalBoolSetting(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll deprecate this soon enough. I'm kinda on the fence if we should even include it in the CHASM implementation since we only guarantee compatibility with the previous minor version and 1.30 supports system callback URLs already.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined towards removing this since by the time chasm version makes it to production, we might not need it anymore.

type ClientProvider func(ctx context.Context, namespaceID string, entry *persistencespb.NexusEndpointEntry, service string) (*nexusrpc.HTTPClient, error)

// OperationInvocationTaskHandlerOptions is the fx parameter object for the invocation task executor.
type OperationInvocationTaskHandlerOptions struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go next to the handler definition IMHO, it "belongs" to that struct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// loadStartArgs is a ReadComponent callback that loads the start arguments from the operation.
func (o *Operation) loadStartArgs(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put struct methods in the file the struct is defined in please to keep the code organized.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved these methods to operation.go next to the the operation struct.

ctx chasm.Context,
_ chasm.NoValue,
) (startArgs, error) {
invocationData, err := o.GetInvocationData(ctx)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary. The operation can just call its parent here or get whatever is embedded in the component for standalone. You may need to add a TODO for the standalone path to implement what's needed here. CC @stephanos.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the TODO in that method. Removed the method and kept everything inline.

}

// saveResult is an UpdateComponent callback that saves the invocation outcome.
func (o *Operation) saveResult(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, it should be in operation.go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// GetInvocationData loads invocation data from the store or returns an error if no store is present.
func (o *Operation) GetInvocationData(ctx chasm.Context) (InvocationData, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this function, and please don't use the Get prefix for getters in Go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this method as a placeholder so that @stephanos can implement loading the data from operation state.
Renamed the method to InvocationData()

}

// OnStarted applies the started transition or delegates to the store if one is present.
func (o *Operation) OnStarted(ctx chasm.MutableContext, _ *Operation, operationToken string, links []*commonpb.Link) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned here and in another review where my comment was lost. There's no need to take Operation as an argument on methods of Operation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +57 to +66
config: opts.Config,
namespaceRegistry: opts.NamespaceRegistry,
metricsHandler: opts.MetricsHandler,
logger: opts.Logger,
callbackTokenGenerator: opts.CallbackTokenGenerator,
clientProvider: opts.ClientProvider,
endpointRegistry: opts.EndpointRegistry,
httpTraceProvider: opts.HTTPTraceProvider,
historyClient: opts.HistoryClient,
chasmRegistry: opts.ChasmRegistry,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively you can just store opts on the struct. You can't embed options in the handler because it has an fx.In embedded in it already and AFAIR that causes issues.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to copy just the necessary dependencies than to save the whole opts object. Will leave this as is for now.


// Skip endpoint lookup for system-internal operations.
if args.endpointName != commonnexus.SystemEndpoint {
if args.endpointID == "" {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a bunch of comments that were not transferred to this function's implementation too, specifically the one we had here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied over the comments (from HSM version) that looked important.


callbackURL, err := buildCallbackURL(h.config.UseSystemCallbackURL(), h.config.CallbackURLTemplate(), ns, endpoint)
if err != nil {
return serviceerror.NewFailedPreconditionf("failed to build callback URL: %v", err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be a failed precondition but it doesn't hurt apart from losing the wrapping of the original error with %w.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

}
if op.GetAttempt() != task.GetAttempt() {
return false, serviceerror.NewFailedPreconditionf("task attempt %d does not match operation attempt %d", task.GetAttempt(), op.GetAttempt())
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate returns error for stale tasks instead of dropping

Medium Severity

Validate returns (false, error) when the operation is no longer in the scheduled state or the attempt doesn't match. Per the TaskValidator contract, (false, nil) means silently drop an obsolete task, while (anything, error) means validation failed. Stale invocation tasks (e.g., operation already started/completed, or attempt advanced) are expected and should be silently dropped, not treated as validation failures. The other task handlers (backoff, timeout) correctly return (bool, nil).

Fix in Cursor Fix in Web

@gow gow force-pushed the nexus/hsm-to-chasm-migration branch from d07f7e9 to 48c21b2 Compare April 1, 2026 06:18
@gow gow force-pushed the cg/nexus/task_executors_2 branch from 35b4a1f to 312e79a Compare April 1, 2026 06:30
@gow gow requested review from S15 and bergundy April 1, 2026 16:53
@gow gow force-pushed the nexus/hsm-to-chasm-migration branch from 48c21b2 to 5c6bbd9 Compare April 3, 2026 16:58
@gow gow force-pushed the cg/nexus/task_executors_2 branch from 312e79a to 4b13977 Compare April 3, 2026 16:59
@bergundy bergundy merged commit 291cb26 into nexus/hsm-to-chasm-migration Apr 6, 2026
47 checks passed
@bergundy bergundy deleted the cg/nexus/task_executors_2 branch April 6, 2026 17:09
bergundy pushed a commit to bergundy/temporal that referenced this pull request Apr 7, 2026
This PR migrates the Nexus operation invocation task handler from HSM
version to Chasm.

Migrating from HSM to Chasm

- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces a new CHASM-based Nexus `StartOperation` execution path
with endpoint lookup, callback URL/token generation, and error
classification; mistakes could cause failed invocations, incorrect
retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by
strict task validation and added unit coverage, but the change touches
critical workflow/history integration and outbound request handling.
>
> **Overview**
> Migrates Nexus operation invocation execution to CHASM by implementing
`OperationInvocationTaskHandler.Validate/Execute` end-to-end, including
endpoint resolution (ID with name fallback), callback URL selection
(system vs templated), callback token generation, timeout budgeting,
outbound StartOperation calls (HTTP or internal history service),
metrics/logging, and classification of results into operation state
transitions.
>
> Adds supporting plumbing:
`OperationStore.NexusOperationInvocationData` and workflow
implementation that loads invocation input/headers from the scheduled
history event, plus a new
`MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API.
Configuration is extended to parse `CallbackURLTemplate` into a
`*template.Template`, add `UseSystemCallbackURL`, and pass
`NumHistoryShards` for internal routing; new helper utilities centralize
callback building, error/failure conversion, and internal/HTTP start
logic.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4b13977. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
bergundy pushed a commit to bergundy/temporal that referenced this pull request Apr 7, 2026
This PR migrates the Nexus operation invocation task handler from HSM
version to Chasm.

Migrating from HSM to Chasm

- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces a new CHASM-based Nexus `StartOperation` execution path
with endpoint lookup, callback URL/token generation, and error
classification; mistakes could cause failed invocations, incorrect
retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by
strict task validation and added unit coverage, but the change touches
critical workflow/history integration and outbound request handling.
>
> **Overview**
> Migrates Nexus operation invocation execution to CHASM by implementing
`OperationInvocationTaskHandler.Validate/Execute` end-to-end, including
endpoint resolution (ID with name fallback), callback URL selection
(system vs templated), callback token generation, timeout budgeting,
outbound StartOperation calls (HTTP or internal history service),
metrics/logging, and classification of results into operation state
transitions.
>
> Adds supporting plumbing:
`OperationStore.NexusOperationInvocationData` and workflow
implementation that loads invocation input/headers from the scheduled
history event, plus a new
`MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API.
Configuration is extended to parse `CallbackURLTemplate` into a
`*template.Template`, add `UseSystemCallbackURL`, and pass
`NumHistoryShards` for internal routing; new helper utilities centralize
callback building, error/failure conversion, and internal/HTTP start
logic.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4b13977. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
bergundy pushed a commit that referenced this pull request Apr 8, 2026
This PR migrates the Nexus operation invocation task handler from HSM
version to Chasm.

Migrating from HSM to Chasm

- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces a new CHASM-based Nexus `StartOperation` execution path
with endpoint lookup, callback URL/token generation, and error
classification; mistakes could cause failed invocations, incorrect
retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by
strict task validation and added unit coverage, but the change touches
critical workflow/history integration and outbound request handling.
>
> **Overview**
> Migrates Nexus operation invocation execution to CHASM by implementing
`OperationInvocationTaskHandler.Validate/Execute` end-to-end, including
endpoint resolution (ID with name fallback), callback URL selection
(system vs templated), callback token generation, timeout budgeting,
outbound StartOperation calls (HTTP or internal history service),
metrics/logging, and classification of results into operation state
transitions.
>
> Adds supporting plumbing:
`OperationStore.NexusOperationInvocationData` and workflow
implementation that loads invocation input/headers from the scheduled
history event, plus a new
`MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API.
Configuration is extended to parse `CallbackURLTemplate` into a
`*template.Template`, add `UseSystemCallbackURL`, and pass
`NumHistoryShards` for internal routing; new helper utilities centralize
callback building, error/failure conversion, and internal/HTTP start
logic.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4b13977. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
bergundy pushed a commit that referenced this pull request Apr 9, 2026
This PR migrates the Nexus operation invocation task handler from HSM
version to Chasm.

Migrating from HSM to Chasm

- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces a new CHASM-based Nexus `StartOperation` execution path
with endpoint lookup, callback URL/token generation, and error
classification; mistakes could cause failed invocations, incorrect
retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by
strict task validation and added unit coverage, but the change touches
critical workflow/history integration and outbound request handling.
>
> **Overview**
> Migrates Nexus operation invocation execution to CHASM by implementing
`OperationInvocationTaskHandler.Validate/Execute` end-to-end, including
endpoint resolution (ID with name fallback), callback URL selection
(system vs templated), callback token generation, timeout budgeting,
outbound StartOperation calls (HTTP or internal history service),
metrics/logging, and classification of results into operation state
transitions.
>
> Adds supporting plumbing:
`OperationStore.NexusOperationInvocationData` and workflow
implementation that loads invocation input/headers from the scheduled
history event, plus a new
`MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API.
Configuration is extended to parse `CallbackURLTemplate` into a
`*template.Template`, add `UseSystemCallbackURL`, and pass
`NumHistoryShards` for internal routing; new helper utilities centralize
callback building, error/failure conversion, and internal/HTTP start
logic.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4b13977. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
bergundy pushed a commit that referenced this pull request Apr 9, 2026
This PR migrates the Nexus operation invocation task handler from HSM
version to Chasm.

Migrating from HSM to Chasm

- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces a new CHASM-based Nexus `StartOperation` execution path
with endpoint lookup, callback URL/token generation, and error
classification; mistakes could cause failed invocations, incorrect
retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by
strict task validation and added unit coverage, but the change touches
critical workflow/history integration and outbound request handling.
>
> **Overview**
> Migrates Nexus operation invocation execution to CHASM by implementing
`OperationInvocationTaskHandler.Validate/Execute` end-to-end, including
endpoint resolution (ID with name fallback), callback URL selection
(system vs templated), callback token generation, timeout budgeting,
outbound StartOperation calls (HTTP or internal history service),
metrics/logging, and classification of results into operation state
transitions.
>
> Adds supporting plumbing:
`OperationStore.NexusOperationInvocationData` and workflow
implementation that loads invocation input/headers from the scheduled
history event, plus a new
`MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API.
Configuration is extended to parse `CallbackURLTemplate` into a
`*template.Template`, add `UseSystemCallbackURL`, and pass
`NumHistoryShards` for internal routing; new helper utilities centralize
callback building, error/failure conversion, and internal/HTTP start
logic.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4b13977. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants