Skip to content

Nexus chasm cleanup#9843

Merged
bergundy merged 5 commits intotemporalio:nexus/hsm-to-chasm-migrationfrom
bergundy:nexus-chasm-cleanup
Apr 8, 2026
Merged

Nexus chasm cleanup#9843
bergundy merged 5 commits intotemporalio:nexus/hsm-to-chasm-migrationfrom
bergundy:nexus-chasm-cleanup

Conversation

@bergundy
Copy link
Copy Markdown
Member

@bergundy bergundy commented Apr 7, 2026

bergundy added 3 commits April 7, 2026 11:23
- Introduce RetryPolicyConfig struct and typed DC setting (replaces
  separate initial/max interval settings)
- Rename MaxConcurrentOperations to MaxConcurrentOperationsPerWorkflow
  with updated DC key
- Fix CallbackURLTemplate converter to use direct type assertion
- Add PrincipalType/PrincipalName to disallowed operation headers
Proto schema:
- Add started_time and closed_time to OperationState
- Remove attempt from timeout task messages (only invocation/backoff
  tasks track attempt)
- Reorder task messages (timeouts first, then invocation)

State machine:
- Increment attempt on schedule/reschedule (before task emission)
  rather than after attempt completion - makes attempt number
  approximate but available in the task
- Track StartedTime and ClosedTime in transitions
- Support explicit StartTime/CompleteTime in events for async
  completion race conditions
- Propagate Failure through Failed/Canceled transitions
- Add resolveUnsuccessfully() for terminal state bookkeeping
- Clear NextAttemptScheduleTime in all terminal transitions
- Base start-to-close timeout on actual start time, not schedule time
- Move RequestedTime assignment to Operation.Cancel()

Task handlers:
- Privatize all task handler types, options, and constructors
- Inject task handlers into Library via constructor
- Extract invocation interface with HTTP, system, and timeout
  implementations
- Unify error classification in newInvocationResult()
- Simplify Validate to return (bool, nil) for invalid states
- Fix OnCancelled -> OnCanceled spelling
- Fix fx module name
Workflow registry:
- Add Library interface with CommandHandlers() and EventDefinitions()
- Replace individual Register* methods with single Register(Library)
- Add EventDefinitionByGoType[D] for type-safe event lookup
- Add generic AddAndApplyHistoryEvent[D] free function
- Rename EventDefinition() to EventDefinitionByEventType()
- Privatize workflow Library struct, expose NewLibrary constructor
- Use serviceerror.NewInternalf for internal errors

Nexus workflow:
- Add library type implementing workflow Library interface
- Replace registerCommandHandlers/registerEvents with Library methods
- Use type-safe AddAndApplyHistoryEvent[D] in command handlers
- Consolidate two fx.Invoke calls into one
- Set initial Attempt to 0 (TransitionScheduled now increments)
- Rename chasmCtx to ctx
- Fix OnNexusOperationCancelled -> OnNexusOperationCanceled spelling
@bergundy bergundy requested review from a team as code owners April 7, 2026 18:35

func newLibrary() *Library {
return &Library{}
func newLibrary(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanos I left it up to you to break out the components only library.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanos there are some key behavior changes here: attempt is now increments on scheduled, start and close time can be provided externally (needed for async completion).

@stephanos stephanos self-requested a review April 7, 2026 20:30

var MaxConcurrentOperations = dynamicconfig.NewNamespaceIntSetting(
"nexusoperation.limit.operation.concurrency",
var MaxConcurrentOperationsPerWorkflow = dynamicconfig.NewNamespaceIntSetting(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have/need a limit for standalone?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is just pending nexus operations in a single workflow.

RequestTimeout: RequestTimeout.Get(dc),
MinRequestTimeout: MinRequestTimeout.Get(dc),
MaxConcurrentOperations: MaxConcurrentOperations.Get(dc),
MaxConcurrentOperations: MaxConcurrentOperationsPerWorkflow.Get(dc),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename the field, too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡

func (w *Workflow) AddAndApplyHistoryEvent(
// AddAndApplyHistoryEvent adds a history event to the workflow and applies the corresponding event definition,
// looked up by Go type. This is the preferred way to add and apply events as it provides go-to-definition navigation.
func AddAndApplyHistoryEvent[D EventDefinition](
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that a lot.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but we should consider moving all of the workflow events into chasm/lib/workflow so we can actually refer to them, otherwise we run into circular dependency issues that are annoying to break.

// Clear the next attempt schedule time when leaving BACKING_OFF state. This field is only valid in
// BACKING_OFF state.
o.NextAttemptScheduleTime = nil
o.ClosedTime = timestamppb.New(ctx.Now(o))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use CompleteTime if non-nil?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡

// The number of attempts made to deliver the start operation request.
// This number represents a minimum bound since the attempt is incremented after the request completes.
int32 attempt = 12;
// This number is approximate, it is incremeted when a task is added to the history queue.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This number is approximate, it is incremeted when a task is added to the history queue.
// This number is approximate, it is incremented when a task is added to the history queue.

// Special marker for Temporal->Temporal calls to indicate that the original failure should be unwrapped.
// Temporal uses a wrapper operation error with no additional information to transmit the OperationError over the network.
// The meaningful information is in the operation error's cause.
unwrapError := opErr.OriginalFailure.Metadata["unwrap-error"] == "true"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is OriginalFailure always set? In the branch before this is called I see if handlerErr.OriginalFailure != nil

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. it would only be nil if it is constructed in process, and we only do that for HandlerError instances. But good callout, I'll add that here.

}
return invocationResultFail{failure: failure}, nil
}
if opTimeoutBelowMinErr, ok := errors.AsType[*operationTimeoutBelowMinError](callErr); ok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit^2: is the missing newline above this if block intentional? It signals to me that these two belong together more than the other ones but I don't quite see why they do.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was because both of these are cases for errors from the nexus SDK but it's fine either way.

}

if errors.Is(callErr, context.DeadlineExceeded) || errors.Is(callErr, context.Canceled) {
// If timed out, don't leak internal info to the user
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If timed out, don't leak internal info to the user
// If timed out, don't leak internal info to the user.

Comment on lines -223 to -225
if errors.Is(callErr, ErrResponseBodyTooLarge) || errors.Is(callErr, ErrInvalidOperationToken) {
return nonRetryableFailResult(callErr)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken these two became invocationResultRetry now instead of being non-retryable?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

// saveResult is an UpdateComponent callback that saves the invocation outcome.
func (o *Operation) saveResult(
// saveResultInput is the input to the Operation.saveResult method used in UpdateComponent.
type saveResultInput struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be saveInvocationResultInput? Granted, a bit long but would match the other types and methods.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@bergundy bergundy merged commit 6421ece into temporalio:nexus/hsm-to-chasm-migration Apr 8, 2026
45 checks passed
@bergundy bergundy deleted the nexus-chasm-cleanup branch April 8, 2026 04:16
bergundy added a commit that referenced this pull request Apr 8, 2026
## What changed?

- [[Nexus-Chasm] Improve nexus operation dynamic
config](59a7068)
- [[Nexus-Chasm] Improve operation state machine and task
handlers](c6f1fa8)
- [[Nexus-Chasm] Refactor workflow registry and nexus workflow
integration](45d0cfc)

## Why?

- Cleanup
- Improved test coverage
- Bug fixes
bergundy added a commit that referenced this pull request Apr 9, 2026
## What changed?

- [[Nexus-Chasm] Improve nexus operation dynamic
config](59a7068)
- [[Nexus-Chasm] Improve operation state machine and task
handlers](c6f1fa8)
- [[Nexus-Chasm] Refactor workflow registry and nexus workflow
integration](45d0cfc)

## Why?

- Cleanup
- Improved test coverage
- Bug fixes
bergundy added a commit that referenced this pull request Apr 9, 2026
## What changed?

- [[Nexus-Chasm] Improve nexus operation dynamic
config](59a7068)
- [[Nexus-Chasm] Improve operation state machine and task
handlers](c6f1fa8)
- [[Nexus-Chasm] Refactor workflow registry and nexus workflow
integration](45d0cfc)

## Why?

- Cleanup
- Improved test coverage
- Bug fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants