Add a metric activity_failed for tracking failed activity #7642
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changed?
This metric is a counter that is bumped whenever an activity fails (i.e. is canceled or fails). This metric is bumped every time
RespondActivityFailed/Canceledis called and thus gets bumped each time an activity retries and fails. This metric has five labels:Why?
The intention is to use this metric to monitor for specific activity failures and potentially raise alerts on activity failures prior to failing a workflow.
How did you test it?
I ran this locally with a worker with failing activity:
Potential risks
Adding new metrics with potentially high cardinality labels like
namespaceandtask_queueare potential risks to downstream observability systems.Documentation
Not applicable.
Is hotfix candidate?
No.