From c70020a6680d2ac43b752eef0ecb70eef5b9897e Mon Sep 17 00:00:00 2001 From: Stephan Behnke Date: Wed, 8 May 2024 16:32:29 -0700 Subject: [PATCH] Add docs describing retry behavior --- docs/architecture/retry.md | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 docs/architecture/retry.md diff --git a/docs/architecture/retry.md b/docs/architecture/retry.md new file mode 100644 index 000000000000..ce2eb3aeb48a --- /dev/null +++ b/docs/architecture/retry.md @@ -0,0 +1,38 @@ +# Retry Mechanisms + +## gRPC + +### Interceptor + +All services use the gRPC interceptor `interceptor.RetryableInterceptor` that can retry a failed gRPC request. + +Its behavior is defined by: +- `backoff.IsRetryable` which decides whether to retry based on the service error type +- `backoff.RetryPolicy` which decides how long to backoff first - or not retry at all + +The `RetryableInterceptor` delegates the actual retry behavior to `backoff.ThrottleRetryContext` which: +- never retries any context timeout/cancellation errors +- retries `ResourceExhausted` service errors using a custom retry policy + +Tip: To inspect each service's retry behavior, look for its call to `NewRetryableInterceptor`. + +### Service Error + +Service errors are specific Go errors that can generate a gRCP `Status` (see [status.proto](https://github.com/grpc/grpc/blob/master/src/proto/grpc/status/status.proto)). +A gRPC status contains a gRPC `Code` (see [code.proto](https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto)), a message and (optionally) a payload with more details. + +```go +type ServiceError interface { + error + Status() *status.Status +} +``` + +[api-go](https://github.com/temporalio/api-go/tree/master/serviceerror) defines most service errors: +- general-purpose errors +(such as `Canceled`, `NotFound` or `Unavailable`) +- specialized errors which carry more details +(such as `NamespaceNotActive` with the gRPC code `FailedPrecondition`) + +Furthermore, a few more Server-specific service errors are defined in this repository, such as +`ShardOwnershipLost` or `TaskAlreadyStarted`. \ No newline at end of file