Skip to content

Conversation

@swgillespie
Copy link
Contributor

What changed?

This metric is a counter that is bumped whenever an activity fails (i.e. is canceled or fails). This metric is bumped every time RespondActivityFailed/Canceled is called and thus gets bumped each time an activity retries and fails. This metric has five labels:

  1. The workflow type hosting the failed activity,
  2. The activity type of the failed activity,
  3. A string identifying the failure, right now "canceled" or "failed"
  4. The namespace, and
  5. The task queue that this activity ran on.

Why?

The intention is to use this metric to monitor for specific activity failures and potentially raise alerts on activity failures prior to failing a workflow.

How did you test it?

I ran this locally with a worker with failing activity:

$ http localhost:8000/metrics | grep activity_failed
# HELP activity_failed activity_failed counter
# TYPE activity_failed counter
activity_failed{activityType="Activity",failure="failed",namespace="default",service_name="history",taskqueue="hello_world",workflowType="Workflow"} 8

Potential risks

Adding new metrics with potentially high cardinality labels like namespace and task_queue are potential risks to downstream observability systems.

Documentation

Not applicable.

Is hotfix candidate?

No.

This metric is a counter that is bumped whenever an activity fails (i.e. is canceled or fails). This metric is bumped every time `RespondActivityFailed/Canceled` is called and thus gets bumped each time an activity retries and fails. This metric has five labels:

1. The workflow type hosting the failed activity,
2. The activity type of the failed activity,
3. A string identifying the failure, right now "canceled" or "failed"
4. The namespace, and
5. The task queue that this activity ran on.
@swgillespie swgillespie requested a review from a team as a code owner April 22, 2025 18:08
@swgillespie swgillespie marked this pull request as draft April 22, 2025 19:12
@swgillespie swgillespie closed this May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant