-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support graceful shutdown incase of multiple errors #617
Support graceful shutdown incase of multiple errors #617
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really, really good start - thanks for following up with a PR just a few days after learning Go! I'd like a few changes before we merge this. (I'm sorry that some of the existing code doesn't match what I'm asking of you - this project had a large change of ownership and direction between the beta and 1.0.)
Also, please make sure that the build passes. We can't merge anything that doesn't get a clean bill of health from Travis CI.
If any of this is confusing or doesn't make sense, schedule some time to pair with me - my calendar accurately reflects my availability.
service/manager.go
Outdated
@@ -1,4 +1,4 @@ | |||
// Copyright (c) 2017 Uber Technologies, Inc. | |||
// Copyright (c) 2018 Uber Technologies, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: this should be 2017-2018
, since copyright starts when we initially created the file. (Yes, this is the Official Legal Position on copyright dates. No, we don't actually follow it consistently...but ya gotta start somewhere.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for sharing the details.
service/manager_test.go
Outdated
require.NoError(t, s.addModule(moduleProvider2)) | ||
|
||
time.AfterFunc(10*time.Second, func() { | ||
log.Fatalf("Service dint shut down on its own for over 10 secs so forcefully killing it!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't actually need this, since go tests automatically time out. If you want a shorter timeout, use t.Fatalf
instead - that only terminates this test instead of making the whole process exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it never auto timed out for me. Guessing because the default timeout is 10 mins(https://golang.org/cmd/go/#hdr-Description_of_testing_flags) and I usually gave up waiting in ~ 5mins. Anyway updated this to t.Fatalf, good suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented before thorough testing my bad. Anyway using t.Fatalf doesn't kill the underlying stuck go process running manager.go so have to use log.Fatalf. My understanding is t.Fatalf calls FailNow which only kills the running test but does not stop other goroutines created during the test. Lmk if you know a better way to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still prefer to remove this. Now that you're done debugging, we shouldn't need this; the tests will only get stuck if there's a bug. If we're running into this, I'd rather adjust the default timeout via go test
command-line flags - it'll be chaos if every test chooses a timeout.
service/manager_test.go
Outdated
control := s.StartAsync() | ||
go func() { | ||
<-control.ExitChan | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this needs to be in a separate goroutine. You should be able to:
<- s.StartAsync()
require.Error(t, control.ServiceError)
assert.Contains(t, control.ServiceError.Error(), "can't start stubModule1")
assert.Contains(t, control.ServiceError.Error(), "can't start stubModule2")
If this hangs (which I expect it to, given the code above), there's a bug.
(And yes, I realize that this is copy-pasted from the existing test. Sorry.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you called it out. I never really understood the motivation for this separate go routine but since I was lacking context dint touch it.
service/manager.go
Outdated
} | ||
|
||
// emit all errors over the errChan channel for logging purposes | ||
errChan := make(chan Exit, len(errs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can actually make this quite a bit simpler: rather than attempting to send separate errors for each failing module, we can compose all failures into a single error. That should eliminate most of the complexity in handling channels and mutexes.
Use go.uber.org/multierr
, and try something like this:
if len(errs) > 0 {
combined := multierr.Combine(errs...)
errChan := make(chan Exit, 1)
errChan <- Exit{ ...stuff... }
zap.L().Error("Error starting modules", zap.Error(combined))
m.shutdownMu.Unlock()
return Control{...stuff...}
}
Since that brings some sanity to the mutex handling (lock once at the beginning of the method, unlock before every return), we can remove the duplicated unlocks and change the beginning of the method to
m.shutdownMu.Lock()
defer m.shutdownMu.Unlock()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for introducing me to go.uber.org/multierr, its great! Cleaned up the code as you suggested, so much easier to read now :)
Looks like the build is failing because of an unrelated (and now unnecessary) lint check - I'll open a PR to remove that check. |
0b5993a
to
78c5579
Compare
Two small nits, otherwise looks great. |
@@ -35,6 +35,7 @@ import: | |||
- package: github.com/uber/cherami-thrift | |||
subpackages: | |||
- .generated/go/cherami | |||
- package: go.uber.org/multierr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please pin a version here:
- package: go.uber.org/multierr
version: ^1
service/manager_test.go
Outdated
require.NoError(t, s.addModule(moduleProvider2)) | ||
|
||
time.AfterFunc(10*time.Second, func() { | ||
log.Fatalf("Service dint shut down on its own for over 10 secs so forcefully killing it!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still prefer to remove this. Now that you're done debugging, we shouldn't need this; the tests will only get stuck if there's a bug. If we're running into this, I'd rather adjust the default timeout via go test
command-line flags - it'll be chaos if every test chooses a timeout.
In case of multiple errors, manager.go is stuck because of the below 2 issues:
Added a TestStartManager_WithMultipleErrors that runs successfully with this branch but fails on the base branch(glue-dig-b3).