-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestMaintenanceSignal is failing #32
Comments
Yeah, there's definitely still some kind of race in the setup/teardown of the signal handlers. I gave justen bad advice on this. I'll have a PR with a fix shortly. |
I ran it again and it succeeded - so its a race https://travis-ci.org/justenwalker/containerbuddy/builds/94959813 Is it the order in which sendAndWaitForSignal blocks? <-sig
runtime.Gosched() vs. runtime.Gosched()
<-sig |
With that ordering it fails consistently when I run it locally. I suspect the mental model we're using here is flawed somehow about how signal handlers should be set up. |
I've got a good workaround and I'm testing it now on a WIP branch... we'll set up the signal handler in a helper and then mark that as such in a global bool in the test module. This way we verify that we've only set it up once; the state should be safe to reuse between the signal tests so long as each test covers different signals. |
runtime.Gosched()
<-sig
runtime.Gosched()
signal.Stop(sig) //May not be needed
func TestMaintenanceSignalRace(t *testing.T) {
for i := 0; i < 100; i++ {
TestMaintenanceSignal(t)
}
} |
The only problem with that is that it might only work on your machine (the test that's failing never failed on my machine before, for example, and it still doesn't). If we remove the need to reset the handlers, we can avoid the question entirely. I've got a PR coming to that effect. |
Figured - thats what I get for coding by coincidence |
Interesting... apparently I'm totally wrong. It worked the first time I pushed up the commit (on a very dirty work-in-progress branch) and then not afterwards: https://travis-ci.org/tgross/containerbuddy/builds/94967601 |
Working on this is a little tricky because I can't force the bug on my machine. |
@justenwalker I tried what you'd suggested in https://travis-ci.org/tgross/containerbuddy/builds/94972692 and it results in the same kinds of errors on Travis. I'm going to do some integration testing on this and make sure that the signal handler is actually working as I expect and that this isn't just a test artifact. |
Ok, so fortunately in an integration test this seems to work as we'd expect. That doesn't eliminate a possible race in the implementation that happens when signals arrive close together, of course. But it does lead me to believe that the problem is in the test rig. |
Think I got it: #33 |
@justenwalker I slept on this one a bit and woke up this morning with some new insights. We're making what is probably a classic mistake of trying to force behavior in the test by forcing the test thread to yield or by sleeping. We're trying to fix a concurrency bug within asynchronous signals, and we can't coerce the signals to be received by the I've spent a little time trying to force synchronicity into this and I think the best approach will be to factor out the signal handlers from the signal receivers such that we can unit test the signal handling code properly without worrying about concurrency, and then just have a simple test that verifies we've wired up the handlers correctly. I'll have a PR inbound for this shortly. |
I've opened #36 with what I believe to be the correct solution to the problem. |
Fix merged and build is now marked as passing. Thanks for the assist on this, @justenwalker |
Our first build on TravisCI failed https://travis-ci.org/joyent/containerbuddy/builds/94954762
The failing test is one that passed when run locally. I believe there's a race in the signal sending set up or tear-down.
I'm looking into it, but I'm going to cc @justenwalker because he's been in this area of the code recently.
The text was updated successfully, but these errors were encountered: