Return Unavailable to frontend rpcs until healthy#5069
Conversation
common/rpc/interceptor/health.go
Outdated
There was a problem hiding this comment.
Why bother splitting the path? You can just check for a prefix:
| servicePrefix, _ := SplitMethodName(info.FullMethod) | |
| if servicePrefix == api.WorkflowServicePrefix { | |
| if strings.HasPrefix(info.FullMethod, api.WorkflowServicePrefix) { |
There was a problem hiding this comment.
fair point. all the other interceptors use that function so it seemed fitting
There was a problem hiding this comment.
also, comments welcome on whether it should apply to other services too. on second thought it should probably be workflowservice + operatorservice
There was a problem hiding this comment.
Are there services it shouldnt' apply to?
There was a problem hiding this comment.
I would argue not AdminService, since theoretically there might be something in there to repair an unhealthy frontend (although there's nothing like that now).
And also not grpc health service, since that should be able to return successful "not serving" responses.
Co-authored-by: Tim Deeb-Swihart <409226+tdeebswihart@users.noreply.github.com>
What changed?
Add an interceptor to return Unavailable to WorkflowService methods until the frontend considers itself "healthy", which currently means "membership is initialized".
Why?
Fixes #5015
How did you test it?
mostly manually
Potential risks
This adds a window of time where frontend can now return Unavailable where previously it might have succeeded or returned a different error code. Specifically note that client.Dial in go sdk (at least) will fail fast on this error and the caller will need to retry.