New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rpc panic recovery middleware #240
Conversation
@yutongp, thanks for your PR! By analyzing the history of the files in this pull request, we identified @anuptalwalkar, @alsamylkin and @sectioneight to be potential reviewers. |
modules/rpc/handler.go
Outdated
func panicRecovery(ctx context.Context) { | ||
if err := recover(); err != nil { | ||
stats.RPCPanicCounter.Inc(1) | ||
fx.Logger(ctx).Error("Panic recovered serving request", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I also log debug.Stack()
here as well? raven does send stack trace to sentry, but even users are not running with sentry, they will miss the stack trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an interceptor for error logs, and print the stack trace by default. @alsamylkin to confirm that it works in this case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also print some request specific information here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not log a stacktrace here, because err can technically not implement error interface :)
The easiest way to get the stacktrace, is to wrap err with errors.Wrap and log it, then you'll get a stacktrace for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of request specific information we are thinking about to log here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glibsm We are talking about logging additional information about panic: request, how can we stripe the PII information from it and safely log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I think there are too many threads in here that got mixed up
PII concern applies to all logging and sentry, lets not try to solve it only for YARPC
PII in response on failure is what we should be talking about here, and returning a custom error seems to satisfy the need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what's the conclusion we have here? keep what I have right now?
I agree obscuring PII is a seperate issue, before we have a story about that, not logging request make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not logging the request makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synced offline, it does make sense to not log request information, because it is not safe in general case.
modules/rpc/handler.go
Outdated
@@ -30,6 +30,8 @@ import ( | |||
"go.uber.org/yarpc/api/transport" | |||
) | |||
|
|||
const panicRespnose = "Server Error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: panicResponse
modules/rpc/handler.go
Outdated
fx.Logger(ctx).Error("Panic recovered serving request", "error", err) | ||
// rethrow panic back to yarpc | ||
// before https://github.com/yarpc/yarpc-go/issues/734 fixed, throw a generic error. | ||
panic(panicRespnose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be something along the lines of panic(err.Error()). server error is too generic and does not convey much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return general error message to client (since client can be something out of DC) so we don't leak something like: server use golang, server has nil pointer etc. Meaningful info can be present through introspector and sentry to service owners .
modules/rpc/handler.go
Outdated
stats.RPCPanicCounter.Inc(1) | ||
fx.Logger(ctx).Error("Panic recovered serving request", "error", err) | ||
// rethrow panic back to yarpc | ||
// before https://github.com/yarpc/yarpc-go/issues/734 fixed, throw a generic error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the issue, it looks like yarpc already implements a panic handler. Why does UberFx need one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yarpc only log to stderr, this is more for log to all hooks in fx like sentry. Of course, if filebeat is almost ready, it will be not that useful for us. But should be still useful for open source when they hook with sentry
modules/rpc/handler.go
Outdated
fx.Logger(ctx).Error("Panic recovered serving request", "error", err) | ||
// rethrow panic back to yarpc | ||
// before https://github.com/yarpc/yarpc-go/issues/734 fixed, throw a generic error. | ||
panic(panicResponse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we rethrow panic here? I thought the point of this filter was to not cause panics, like https://github.com/uber-go/fx/blob/master/modules/uhttp/filters.go#L102
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure what the correct behavior should be here, but I don't think we should just "swallow" the panic. We do want to let the caller know something went wrong
Even if we re-throw in this case, we got a chance to record a metric and log an error. Any other processing that we like can also be added easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Services can return errors, that's what we should do in this case. Panic here is an inconsistent behavior between different modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason I rethrow a panic is: here in https://github.com/yarpc/yarpc-go/blob/61fa69b4a04faa5d1dc57b5a2ce44f93c16e6485/api/transport/handler.go#L100
yarpc does have recover from panic and log the panic to std output. but I don't like how they return the panic msg back to client, so I rethrow a more generic msg...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, since they documented recovery, its fine with me to panic here.
@@ -88,3 +90,27 @@ func authorize(ctx context.Context, host service.Host) (context.Context, error) | |||
} | |||
return ctx, nil | |||
} | |||
|
|||
type panicInboundMiddleware struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this follows the naming convention with contextInboundMiddleware
and authInboundMiddleware
right now
maybe change all of them in another diff?
modules/rpc/handler.go
Outdated
@@ -30,6 +30,8 @@ import ( | |||
"go.uber.org/yarpc/api/transport" | |||
) | |||
|
|||
const panicResponse = "Server Error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two things:
- This var looks familiar from another diff
- Underscore the const variables, please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ya, I want to through a same general error msg for panic. do we need to share this const between
rpc
anduhttp
? - sure
modules/rpc/handler.go
Outdated
fx.Logger(ctx).Error("Panic recovered serving request", "error", err) | ||
// rethrow panic back to yarpc | ||
// before https://github.com/yarpc/yarpc-go/issues/734 fixed, throw a generic error. | ||
panic(panicResponse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure what the correct behavior should be here, but I don't think we should just "swallow" the panic. We do want to let the caller know something went wrong
Even if we re-throw in this case, we got a chance to record a metric and log an error. Any other processing that we like can also be added easily
0904456
to
d71bcf9
Compare
@@ -126,10 +126,12 @@ func (c *dispatcherController) addDefaultMiddleware(host service.Host) { | |||
cfg := yarpcConfig{ | |||
inboundMiddleware: []middleware.UnaryInbound{ | |||
contextInboundMiddleware{host}, | |||
panicInboundMiddleware{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think panic middleware should be the very first in the slice, to catch all possible issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then we don't have the logger with host in context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, ok
modules/rpc/handler_test.go
Outdated
@@ -116,3 +116,32 @@ func (f fakeOnewayHandler) HandleOneway(ctx context.Context, p *transport.Reques | |||
assert.NotNil(f.t, ctx) | |||
return errors.New("oneway handle") | |||
} | |||
|
|||
func TestInboundMiddleware_panic(t *testing.T) { | |||
defer testPanicHandler(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use t.Parallel()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, then maybe we should apply parallel for all tests in this file. Let me pull another PR to address that. I do want to do some refactor as well.
modules/rpc/handler_test.go
Outdated
assert.EqualValues(t, r, _panicResponse) | ||
} | ||
|
||
type panicUnaryHandler struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: name it alwaysPanicUnary?
|
||
func panicRecovery(ctx context.Context) { | ||
if err := recover(); err != nil { | ||
stats.RPCPanicCounter.Inc(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where this counter update is tested..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there way to test metrics count/timer is called? I would like to test in #251 as well. Maybe you can point me somewhere metrics is tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example:
https://github.com/uber-go/fx/blob/master/modules/task/execution_test.go#L172
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple nits, but overall looks good.
Lets wait for Tuesday, to let others to review.
modules/rpc/handler.go
Outdated
func panicRecovery(ctx context.Context) { | ||
if err := recover(); err != nil { | ||
stats.RPCPanicCounter.Inc(1) | ||
fx.Logger(ctx).Error("Panic recovered serving request", "error", errors.Errorf("panic: %+v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alsamylkin I cannot use errors.Wrap
since it takes in an error
type. let me know if I am doing it wrong. This is to get the stack trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fx.Loger(ctx).Error("Panic recovered serving request", "error", errors.Wrap(err, "panic in handler"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errors.Wrap
asked for an error
interface, recover
only returns interface{}
, so we cannot use errors.Wrap
here. I will change the text
modules/rpc/handler.go
Outdated
@@ -28,8 +28,12 @@ import ( | |||
"go.uber.org/fx/modules/rpc/internal/stats" | |||
"go.uber.org/fx/service" | |||
"go.uber.org/yarpc/api/transport" | |||
|
|||
"github.com/pkg/errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should be grouped with yarpc since it's a "3rd-party" import
refer to #240
I talked with @abhinav. He said the panic recovery middleware does not exist yet. If they add one, that one will be in rpcinit with go-common integration.
In that case, I think fx need one as well for sentry and metrics integrations.