[Nexus-Chasm] Migrate OperationInvocationTaskExecutor#9680
[Nexus-Chasm] Migrate OperationInvocationTaskExecutor#9680bergundy merged 9 commits intonexus/hsm-to-chasm-migrationfrom
Conversation
78f3c98 to
761370c
Compare
350d41a to
b29957d
Compare
761370c to
c0a5d28
Compare
7492591 to
c480b70
Compare
bergundy
left a comment
There was a problem hiding this comment.
Submitting what I have so far since there's going to be another pass when you add the tests.
| query complexity. Consider the cardinality impact when enabling these tags.`, | ||
| ) | ||
|
|
||
| var UseSystemCallbackURL = dynamicconfig.NewGlobalBoolSetting( |
There was a problem hiding this comment.
We'll deprecate this soon enough. I'm kinda on the fence if we should even include it in the CHASM implementation since we only guarantee compatibility with the previous minor version and 1.30 supports system callback URLs already.
There was a problem hiding this comment.
I'm inclined towards removing this since by the time chasm version makes it to production, we might not need it anymore.
| type ClientProvider func(ctx context.Context, namespaceID string, entry *persistencespb.NexusEndpointEntry, service string) (*nexusrpc.HTTPClient, error) | ||
|
|
||
| // OperationInvocationTaskHandlerOptions is the fx parameter object for the invocation task executor. | ||
| type OperationInvocationTaskHandlerOptions struct { |
There was a problem hiding this comment.
This should go next to the handler definition IMHO, it "belongs" to that struct.
| } | ||
|
|
||
| // loadStartArgs is a ReadComponent callback that loads the start arguments from the operation. | ||
| func (o *Operation) loadStartArgs( |
There was a problem hiding this comment.
Put struct methods in the file the struct is defined in please to keep the code organized.
There was a problem hiding this comment.
Done. Moved these methods to operation.go next to the the operation struct.
| ctx chasm.Context, | ||
| _ chasm.NoValue, | ||
| ) (startArgs, error) { | ||
| invocationData, err := o.GetInvocationData(ctx) |
There was a problem hiding this comment.
This is unnecessary. The operation can just call its parent here or get whatever is embedded in the component for standalone. You may need to add a TODO for the standalone path to implement what's needed here. CC @stephanos.
There was a problem hiding this comment.
I had the TODO in that method. Removed the method and kept everything inline.
| } | ||
|
|
||
| // saveResult is an UpdateComponent callback that saves the invocation outcome. | ||
| func (o *Operation) saveResult( |
There was a problem hiding this comment.
Same here, it should be in operation.go.
| } | ||
|
|
||
| // GetInvocationData loads invocation data from the store or returns an error if no store is present. | ||
| func (o *Operation) GetInvocationData(ctx chasm.Context) (InvocationData, error) { |
There was a problem hiding this comment.
I don't think you need this function, and please don't use the Get prefix for getters in Go.
There was a problem hiding this comment.
I added this method as a placeholder so that @stephanos can implement loading the data from operation state.
Renamed the method to InvocationData()
| } | ||
|
|
||
| // OnStarted applies the started transition or delegates to the store if one is present. | ||
| func (o *Operation) OnStarted(ctx chasm.MutableContext, _ *Operation, operationToken string, links []*commonpb.Link) error { |
There was a problem hiding this comment.
As I mentioned here and in another review where my comment was lost. There's no need to take Operation as an argument on methods of Operation.
| config: opts.Config, | ||
| namespaceRegistry: opts.NamespaceRegistry, | ||
| metricsHandler: opts.MetricsHandler, | ||
| logger: opts.Logger, | ||
| callbackTokenGenerator: opts.CallbackTokenGenerator, | ||
| clientProvider: opts.ClientProvider, | ||
| endpointRegistry: opts.EndpointRegistry, | ||
| httpTraceProvider: opts.HTTPTraceProvider, | ||
| historyClient: opts.HistoryClient, | ||
| chasmRegistry: opts.ChasmRegistry, |
There was a problem hiding this comment.
Alternatively you can just store opts on the struct. You can't embed options in the handler because it has an fx.In embedded in it already and AFAIR that causes issues.
There was a problem hiding this comment.
I think it's better to copy just the necessary dependencies than to save the whole opts object. Will leave this as is for now.
|
|
||
| // Skip endpoint lookup for system-internal operations. | ||
| if args.endpointName != commonnexus.SystemEndpoint { | ||
| if args.endpointID == "" { |
There was a problem hiding this comment.
I see a bunch of comments that were not transferred to this function's implementation too, specifically the one we had here.
There was a problem hiding this comment.
Copied over the comments (from HSM version) that looked important.
|
|
||
| callbackURL, err := buildCallbackURL(h.config.UseSystemCallbackURL(), h.config.CallbackURLTemplate(), ns, endpoint) | ||
| if err != nil { | ||
| return serviceerror.NewFailedPreconditionf("failed to build callback URL: %v", err) |
There was a problem hiding this comment.
This doesn't need to be a failed precondition but it doesn't hurt apart from losing the wrapping of the original error with %w.
| } | ||
| if op.GetAttempt() != task.GetAttempt() { | ||
| return false, serviceerror.NewFailedPreconditionf("task attempt %d does not match operation attempt %d", task.GetAttempt(), op.GetAttempt()) | ||
| } |
There was a problem hiding this comment.
Validate returns error for stale tasks instead of dropping
Medium Severity
Validate returns (false, error) when the operation is no longer in the scheduled state or the attempt doesn't match. Per the TaskValidator contract, (false, nil) means silently drop an obsolete task, while (anything, error) means validation failed. Stale invocation tasks (e.g., operation already started/completed, or attempt advanced) are expected and should be silently dropped, not treated as validation failures. The other task handlers (backoff, timeout) correctly return (bool, nil).
d07f7e9 to
48c21b2
Compare
35b4a1f to
312e79a
Compare
48c21b2 to
5c6bbd9
Compare
312e79a to
4b13977
Compare
This PR migrates the Nexus operation invocation task handler from HSM version to Chasm. Migrating from HSM to Chasm - [x] built - [ ] run locally and tested manually - [ ] covered by existing tests - [x] added new unit test(s) - [ ] added new functional test(s) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Introduces a new CHASM-based Nexus `StartOperation` execution path with endpoint lookup, callback URL/token generation, and error classification; mistakes could cause failed invocations, incorrect retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by strict task validation and added unit coverage, but the change touches critical workflow/history integration and outbound request handling. > > **Overview** > Migrates Nexus operation invocation execution to CHASM by implementing `OperationInvocationTaskHandler.Validate/Execute` end-to-end, including endpoint resolution (ID with name fallback), callback URL selection (system vs templated), callback token generation, timeout budgeting, outbound StartOperation calls (HTTP or internal history service), metrics/logging, and classification of results into operation state transitions. > > Adds supporting plumbing: `OperationStore.NexusOperationInvocationData` and workflow implementation that loads invocation input/headers from the scheduled history event, plus a new `MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API. Configuration is extended to parse `CallbackURLTemplate` into a `*template.Template`, add `UseSystemCallbackURL`, and pass `NumHistoryShards` for internal routing; new helper utilities centralize callback building, error/failure conversion, and internal/HTTP start logic. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 4b13977. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
This PR migrates the Nexus operation invocation task handler from HSM version to Chasm. Migrating from HSM to Chasm - [x] built - [ ] run locally and tested manually - [ ] covered by existing tests - [x] added new unit test(s) - [ ] added new functional test(s) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Introduces a new CHASM-based Nexus `StartOperation` execution path with endpoint lookup, callback URL/token generation, and error classification; mistakes could cause failed invocations, incorrect retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by strict task validation and added unit coverage, but the change touches critical workflow/history integration and outbound request handling. > > **Overview** > Migrates Nexus operation invocation execution to CHASM by implementing `OperationInvocationTaskHandler.Validate/Execute` end-to-end, including endpoint resolution (ID with name fallback), callback URL selection (system vs templated), callback token generation, timeout budgeting, outbound StartOperation calls (HTTP or internal history service), metrics/logging, and classification of results into operation state transitions. > > Adds supporting plumbing: `OperationStore.NexusOperationInvocationData` and workflow implementation that loads invocation input/headers from the scheduled history event, plus a new `MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API. Configuration is extended to parse `CallbackURLTemplate` into a `*template.Template`, add `UseSystemCallbackURL`, and pass `NumHistoryShards` for internal routing; new helper utilities centralize callback building, error/failure conversion, and internal/HTTP start logic. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 4b13977. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
This PR migrates the Nexus operation invocation task handler from HSM version to Chasm. Migrating from HSM to Chasm - [x] built - [ ] run locally and tested manually - [ ] covered by existing tests - [x] added new unit test(s) - [ ] added new functional test(s) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Introduces a new CHASM-based Nexus `StartOperation` execution path with endpoint lookup, callback URL/token generation, and error classification; mistakes could cause failed invocations, incorrect retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by strict task validation and added unit coverage, but the change touches critical workflow/history integration and outbound request handling. > > **Overview** > Migrates Nexus operation invocation execution to CHASM by implementing `OperationInvocationTaskHandler.Validate/Execute` end-to-end, including endpoint resolution (ID with name fallback), callback URL selection (system vs templated), callback token generation, timeout budgeting, outbound StartOperation calls (HTTP or internal history service), metrics/logging, and classification of results into operation state transitions. > > Adds supporting plumbing: `OperationStore.NexusOperationInvocationData` and workflow implementation that loads invocation input/headers from the scheduled history event, plus a new `MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API. Configuration is extended to parse `CallbackURLTemplate` into a `*template.Template`, add `UseSystemCallbackURL`, and pass `NumHistoryShards` for internal routing; new helper utilities centralize callback building, error/failure conversion, and internal/HTTP start logic. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 4b13977. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
This PR migrates the Nexus operation invocation task handler from HSM version to Chasm. Migrating from HSM to Chasm - [x] built - [ ] run locally and tested manually - [ ] covered by existing tests - [x] added new unit test(s) - [ ] added new functional test(s) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Introduces a new CHASM-based Nexus `StartOperation` execution path with endpoint lookup, callback URL/token generation, and error classification; mistakes could cause failed invocations, incorrect retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by strict task validation and added unit coverage, but the change touches critical workflow/history integration and outbound request handling. > > **Overview** > Migrates Nexus operation invocation execution to CHASM by implementing `OperationInvocationTaskHandler.Validate/Execute` end-to-end, including endpoint resolution (ID with name fallback), callback URL selection (system vs templated), callback token generation, timeout budgeting, outbound StartOperation calls (HTTP or internal history service), metrics/logging, and classification of results into operation state transitions. > > Adds supporting plumbing: `OperationStore.NexusOperationInvocationData` and workflow implementation that loads invocation input/headers from the scheduled history event, plus a new `MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API. Configuration is extended to parse `CallbackURLTemplate` into a `*template.Template`, add `UseSystemCallbackURL`, and pass `NumHistoryShards` for internal routing; new helper utilities centralize callback building, error/failure conversion, and internal/HTTP start logic. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 4b13977. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
This PR migrates the Nexus operation invocation task handler from HSM version to Chasm. Migrating from HSM to Chasm - [x] built - [ ] run locally and tested manually - [ ] covered by existing tests - [x] added new unit test(s) - [ ] added new functional test(s) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Introduces a new CHASM-based Nexus `StartOperation` execution path with endpoint lookup, callback URL/token generation, and error classification; mistakes could cause failed invocations, incorrect retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by strict task validation and added unit coverage, but the change touches critical workflow/history integration and outbound request handling. > > **Overview** > Migrates Nexus operation invocation execution to CHASM by implementing `OperationInvocationTaskHandler.Validate/Execute` end-to-end, including endpoint resolution (ID with name fallback), callback URL selection (system vs templated), callback token generation, timeout budgeting, outbound StartOperation calls (HTTP or internal history service), metrics/logging, and classification of results into operation state transitions. > > Adds supporting plumbing: `OperationStore.NexusOperationInvocationData` and workflow implementation that loads invocation input/headers from the scheduled history event, plus a new `MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API. Configuration is extended to parse `CallbackURLTemplate` into a `*template.Template`, add `UseSystemCallbackURL`, and pass `NumHistoryShards` for internal routing; new helper utilities centralize callback building, error/failure conversion, and internal/HTTP start logic. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 4b13977. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->


What changed?
This PR migrates the Nexus operation invocation task handler from HSM version to Chasm.
Why?
Migrating from HSM to Chasm
How did you test it?
Note
Medium Risk
Introduces a new CHASM-based Nexus
StartOperationexecution path with endpoint lookup, callback URL/token generation, and error classification; mistakes could cause failed invocations, incorrect retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by strict task validation and added unit coverage, but the change touches critical workflow/history integration and outbound request handling.Overview
Migrates Nexus operation invocation execution to CHASM by implementing
OperationInvocationTaskHandler.Validate/Executeend-to-end, including endpoint resolution (ID with name fallback), callback URL selection (system vs templated), callback token generation, timeout budgeting, outbound StartOperation calls (HTTP or internal history service), metrics/logging, and classification of results into operation state transitions.Adds supporting plumbing:
OperationStore.NexusOperationInvocationDataand workflow implementation that loads invocation input/headers from the scheduled history event, plus a newMSPointer.LoadHistoryEvent/NodeBackend.LoadHistoryEventAPI. Configuration is extended to parseCallbackURLTemplateinto a*template.Template, addUseSystemCallbackURL, and passNumHistoryShardsfor internal routing; new helper utilities centralize callback building, error/failure conversion, and internal/HTTP start logic.Written by Cursor Bugbot for commit 4b13977. This will update automatically on new commits. Configure here.