Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rpc panic recovery middleware #240

Merged
8 commits merged into from Feb 15, 2017
Merged

Conversation

yutongp
Copy link
Contributor

@yutongp yutongp commented Feb 12, 2017

refer to #240
I talked with @abhinav. He said the panic recovery middleware does not exist yet. If they add one, that one will be in rpcinit with go-common integration.
In that case, I think fx need one as well for sentry and metrics integrations.

@mention-bot
Copy link

@yutongp, thanks for your PR! By analyzing the history of the files in this pull request, we identified @anuptalwalkar, @alsamylkin and @sectioneight to be potential reviewers.

func panicRecovery(ctx context.Context) {
if err := recover(); err != nil {
stats.RPCPanicCounter.Inc(1)
fx.Logger(ctx).Error("Panic recovered serving request", "error", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I also log debug.Stack() here as well? raven does send stack trace to sentry, but even users are not running with sentry, they will miss the stack trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an interceptor for error logs, and print the stack trace by default. @alsamylkin to confirm that it works in this case as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also print some request specific information here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not log a stacktrace here, because err can technically not implement error interface :)
The easiest way to get the stacktrace, is to wrap err with errors.Wrap and log it, then you'll get a stacktrace for free.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of request specific information we are thinking about to log here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glibsm We are talking about logging additional information about panic: request, how can we stripe the PII information from it and safely log.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I think there are too many threads in here that got mixed up

PII concern applies to all logging and sentry, lets not try to solve it only for YARPC

PII in response on failure is what we should be talking about here, and returning a custom error seems to satisfy the need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what's the conclusion we have here? keep what I have right now?
I agree obscuring PII is a seperate issue, before we have a story about that, not logging request make sense to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not logging the request makes sense?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced offline, it does make sense to not log request information, because it is not safe in general case.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 92.784% when pulling b05d5d2 on yutongp:yutong/rpc_panic into 13f8007 on uber-go:master.

@@ -30,6 +30,8 @@ import (
"go.uber.org/yarpc/api/transport"
)

const panicRespnose = "Server Error"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: panicResponse

fx.Logger(ctx).Error("Panic recovered serving request", "error", err)
// rethrow panic back to yarpc
// before https://github.com/yarpc/yarpc-go/issues/734 fixed, throw a generic error.
panic(panicRespnose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be something along the lines of panic(err.Error()). server error is too generic and does not convey much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return general error message to client (since client can be something out of DC) so we don't leak something like: server use golang, server has nil pointer etc. Meaningful info can be present through introspector and sentry to service owners .

stats.RPCPanicCounter.Inc(1)
fx.Logger(ctx).Error("Panic recovered serving request", "error", err)
// rethrow panic back to yarpc
// before https://github.com/yarpc/yarpc-go/issues/734 fixed, throw a generic error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the issue, it looks like yarpc already implements a panic handler. Why does UberFx need one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yarpc only log to stderr, this is more for log to all hooks in fx like sentry. Of course, if filebeat is almost ready, it will be not that useful for us. But should be still useful for open source when they hook with sentry

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 92.784% when pulling 0904456 on yutongp:yutong/rpc_panic into 13f8007 on uber-go:master.

fx.Logger(ctx).Error("Panic recovered serving request", "error", err)
// rethrow panic back to yarpc
// before https://github.com/yarpc/yarpc-go/issues/734 fixed, throw a generic error.
panic(panicResponse)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we rethrow panic here? I thought the point of this filter was to not cause panics, like https://github.com/uber-go/fx/blob/master/modules/uhttp/filters.go#L102

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure what the correct behavior should be here, but I don't think we should just "swallow" the panic. We do want to let the caller know something went wrong

Even if we re-throw in this case, we got a chance to record a metric and log an error. Any other processing that we like can also be added easily

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Services can return errors, that's what we should do in this case. Panic here is an inconsistent behavior between different modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason I rethrow a panic is: here in https://github.com/yarpc/yarpc-go/blob/61fa69b4a04faa5d1dc57b5a2ce44f93c16e6485/api/transport/handler.go#L100
yarpc does have recover from panic and log the panic to std output. but I don't like how they return the panic msg back to client, so I rethrow a more generic msg...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, since they documented recovery, its fine with me to panic here.

@@ -88,3 +90,27 @@ func authorize(ctx context.Context, host service.Host) (context.Context, error)
}
return ctx, nil
}

type panicInboundMiddleware struct{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this follows the naming convention with contextInboundMiddleware and authInboundMiddleware right now
maybe change all of them in another diff?

@@ -30,6 +30,8 @@ import (
"go.uber.org/yarpc/api/transport"
)

const panicResponse = "Server Error"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things:

  1. This var looks familiar from another diff
  2. Underscore the const variables, please

Copy link
Contributor Author

@yutongp yutongp Feb 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. ya, I want to through a same general error msg for panic. do we need to share this const between rpc and uhttp?
  2. sure

fx.Logger(ctx).Error("Panic recovered serving request", "error", err)
// rethrow panic back to yarpc
// before https://github.com/yarpc/yarpc-go/issues/734 fixed, throw a generic error.
panic(panicResponse)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure what the correct behavior should be here, but I don't think we should just "swallow" the panic. We do want to let the caller know something went wrong

Even if we re-throw in this case, we got a chance to record a metric and log an error. Any other processing that we like can also be added easily

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 92.779% when pulling d71bcf9 on yutongp:yutong/rpc_panic into 9de6ba0 on uber-go:master.

@@ -126,10 +126,12 @@ func (c *dispatcherController) addDefaultMiddleware(host service.Host) {
cfg := yarpcConfig{
inboundMiddleware: []middleware.UnaryInbound{
contextInboundMiddleware{host},
panicInboundMiddleware{},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think panic middleware should be the very first in the slice, to catch all possible issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then we don't have the logger with host in context

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, ok

@@ -116,3 +116,32 @@ func (f fakeOnewayHandler) HandleOneway(ctx context.Context, p *transport.Reques
assert.NotNil(f.t, ctx)
return errors.New("oneway handle")
}

func TestInboundMiddleware_panic(t *testing.T) {
defer testPanicHandler(t)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use t.Parallel()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, then maybe we should apply parallel for all tests in this file. Let me pull another PR to address that. I do want to do some refactor as well.

assert.EqualValues(t, r, _panicResponse)
}

type panicUnaryHandler struct{}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name it alwaysPanicUnary?


func panicRecovery(ctx context.Context) {
if err := recover(); err != nil {
stats.RPCPanicCounter.Inc(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where this counter update is tested..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there way to test metrics count/timer is called? I would like to test in #251 as well. Maybe you can point me somewhere metrics is tested.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits, but overall looks good.

Lets wait for Tuesday, to let others to review.

func panicRecovery(ctx context.Context) {
if err := recover(); err != nil {
stats.RPCPanicCounter.Inc(1)
fx.Logger(ctx).Error("Panic recovered serving request", "error", errors.Errorf("panic: %+v", err))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alsamylkin I cannot use errors.Wrap since it takes in an error type. let me know if I am doing it wrong. This is to get the stack trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fx.Loger(ctx).Error("Panic recovered serving request", "error", errors.Wrap(err, "panic in handler"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors.Wrap asked for an error interface, recover only returns interface{}, so we cannot use errors.Wrap here. I will change the text

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 92.779% when pulling e1c28fa on yutongp:yutong/rpc_panic into 9de6ba0 on uber-go:master.

@@ -28,8 +28,12 @@ import (
"go.uber.org/fx/modules/rpc/internal/stats"
"go.uber.org/fx/service"
"go.uber.org/yarpc/api/transport"

"github.com/pkg/errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be grouped with yarpc since it's a "3rd-party" import

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 92.776% when pulling c4574f3 on yutongp:yutong/rpc_panic into 9de6ba0 on uber-go:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 92.65% when pulling cc4b154 on yutongp:yutong/rpc_panic into 734b80d on uber-go:master.

@ghost ghost merged commit 0626e44 into uber-go:master Feb 15, 2017
@yutongp yutongp deleted the yutong/rpc_panic branch February 15, 2017 03:41
@yutongp yutongp mentioned this pull request Feb 15, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants