Skip to content

Add caller-side Nexus Operation metrics.#10026

Closed
S15 wants to merge 18 commits intotemporalio:feature/nexus-standalonefrom
S15:nexus-metrics
Closed

Add caller-side Nexus Operation metrics.#10026
S15 wants to merge 18 commits intotemporalio:feature/nexus-standalonefrom
S15:nexus-metrics

Conversation

@S15
Copy link
Copy Markdown
Contributor

@S15 S15 commented Apr 22, 2026

What changed?

New metrics:

Counters

Metric Description
nexus_operation_success_count Successfully completed operations
nexus_operation_failed_count Failed operations
nexus_operation_cancel_count Cancelled operations
nexus_operation_terminate_count Operations terminated before completion
nexus_operation_timeout_count Operations timed out before completion

Histograms

Metric Description
nexus_operation_schedule_to_close_latency Time between schedule and close for sync and async operations
nexus_operation_schedule_to_start_latency Time between schedule and start for sync and async operations
nexus_operation_start_to_close_latency Time between start and close for async operations only

Labels

All metrics use the following labels:

Label Notes
namespace
nexus_endpoint
nexus_service Controlled by Dynamic config
nexus_operation Controlled by Dynamic config
workflowType

Latency metrics also include:

Label Values
outcome Succeeded, Failed, Canceled, Terminated, TimedOut

Timeout Count also includes:

Label Values
timeout_type StartToClose, ScheduleToStart, ScheduleToClose

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

Metrics emitting isn't nil safe, and enrich metrics could panic if misconfigured.

stephanos and others added 17 commits April 21, 2026 12:34
Add API boilerplate for standalone Nexus Operations.

- [x] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
Add Nexus Standalone feature flag.

Tests will be added to respective API impl.
Add Nexus Standalone Describe and Start handlers.

- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [x] added new functional test(s)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Nexus Standalone List and Count handlers.

- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [x] added new functional test(s)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## What changed?

Wire up Nexus Standalone with CHASM.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Roey Berman <roey@temporal.io>
## What changed?

Add identity to `TerminatedFailureInfo`.

PS: `CanceledFailureInfo` will be a separate PR.

## Why?

Allow clients to access identity via API.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [x] added new functional test(s)
@S15 S15 marked this pull request as ready for review April 22, 2026 18:25
@S15 S15 requested review from a team as code owners April 22, 2026 18:25
@S15 S15 marked this pull request as draft April 22, 2026 18:25
@S15 S15 changed the title Add caller-side nexus operation metrics. Add caller-side Nexus Operation metrics. Apr 22, 2026
@S15
Copy link
Copy Markdown
Contributor Author

S15 commented Apr 22, 2026

The currently failing tests are shared with the feature/nexus-standalone branch

@S15 S15 marked this pull request as ready for review April 22, 2026 18:42
Copy link
Copy Markdown
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great. I think we should be done in another round.

}
},
GoCtx: context.WithValue(context.Background(), nexusoperation.OperationContextKey, &nexusoperation.OperationContext{
NamespaceRegistry: nsRegistry,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the namespace registry in the context, the current namespace entry is already on the CHASM context,.

Comment on lines +15 to +34
var NexusOperationSuccessCount = metrics.NewCounterDef(
"nexus_operation_success_count",
metrics.WithDescription("Nexus Operations successfully completed."),
)
var NexusOperationFailedCount = metrics.NewCounterDef(
"nexus_operation_failed_count",
metrics.WithDescription("Nexus Operations failures."),
)
var NexusOperationCancelCount = metrics.NewCounterDef(
"nexus_operation_cancel_count",
metrics.WithDescription("Nexus Operations cancellations."),
)
var NexusOperationTerminateCount = metrics.NewCounterDef(
"nexus_operation_terminate_count",
metrics.WithDescription("Nexus Operations that were terminated before completion."),
)
var NexusOperationTimeoutCount = metrics.NewCounterDef(
"nexus_operation_timeout_count",
metrics.WithDescription("Nexus Operations that timed out before completion."),
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the fence with the naming here. On the one hand it's consistent with activities, OTOH, I would prefer to use the statuses and use the same tense for all outcomes (failed is past tense and inconsistent with the rest)

if store, ok := o.Store.TryGet(ctx); ok {
return store.OnNexusOperationCompleted(ctx, o, result, links)
}
metricsHandler, err := o.EnrichMetricsHandler(ctx)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing in a metrics handler, you can obtain it in the transition function.

func (o *Operation) EnrichMetricsHandler(ctx chasm.Context) (metrics.Handler, error) {
// nolint:revive // unchecked-type-assertion: intentional panic on missing context value
opCtx := ctx.Value(OperationContextKey).(*OperationContext)
namespaceName, err := opCtx.NamespaceRegistry.GetNamespaceName(namespace.ID(ctx.ExecutionKey().NamespaceID))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.NamespaceEntry().Name().String()

metrics.NexusEndpointTag(o.GetEndpoint()),
metrics.WorkflowTypeTag(WorkflowTypeTag),
}
if opCtx.MetricTagConfig != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just ensure that this is never nil. It shouldn't be.

}

func (o *Operation) emitOnFailedMetrics(handler metrics.Handler, closeTime time.Time) {
outcomeTag := metrics.OutcomeTag(nexusoperationpb.OPERATION_STATUS_FAILED.String())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lower case?

startedTime := o.GetStartedTime()
if startedTime != nil {
// Async operation that was started.
NexusOperationScheduleToStartLatency.With(handler).Record(startedTime.AsTime().Sub(scheduledTime), outcomeTag)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emit this on transition to start instead of only emitting it when the operation completes.

tags := []metrics.Tag{
metrics.NamespaceTag(namespaceName.String()),
metrics.NexusEndpointTag(o.GetEndpoint()),
metrics.WorkflowTypeTag(WorkflowTypeTag),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're in a workflow, you'll need to emit the workflow's type.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a method to the Store interface: WorkflowTypeTag() string.

func newComponentOnlyLibrary(namespaceRegistry namespace.Registry, dc *dynamicconfig.Collection) *componentOnlyLibrary {
return &componentOnlyLibrary{
namespaceRegistry: namespaceRegistry,
metricTagConfig: MetricTagConfiguration.Get(dc),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... it's okay to reuse this since I figure users will want to turn it on for both callers and handler but this config struct was meant for the frontend (handler side) originally. You'll want to document how this config is used now that it has other purposes and mention that HeaderTagMappings is not relevant on the caller side.

@stephanos stephanos force-pushed the feature/nexus-standalone branch from 38923fc to 340f40a Compare April 23, 2026 17:11
@stephanos stephanos requested review from a team as code owners April 23, 2026 19:09
@stephanos stephanos force-pushed the feature/nexus-standalone branch 3 times, most recently from ce242b7 to 397eca3 Compare April 23, 2026 20:21
@S15 S15 force-pushed the feature/nexus-standalone branch from 397eca3 to 7ae0421 Compare April 23, 2026 20:35
@stephanos stephanos force-pushed the feature/nexus-standalone branch 21 times, most recently from 378e20c to 80b781d Compare April 24, 2026 04:25
@stephanos stephanos deleted the branch temporalio:feature/nexus-standalone April 24, 2026 05:49
@stephanos stephanos closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants