New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure GitHub auth test to remove flakiness #1358
Conversation
As best as I can tell, the flakiness here was coming from two places: 1. Plain ol' data races, where we were sharing a variable between threads, updating it one thread and reading it in another. Sometimes these wouldn't happen in the right order, causing failures. 2. Race conditions when interacting with the `clock` library, where (for example) we would kick off a piece of code that we knew would call `Sleep`, call `runtime.Gosched` to give it a chance to execute, and then tell the `clock` library to advance time. It looks like these, again, weren't always happening in order. A bunch of the complexity here stems from the fact that we're poking our clock from one end with the `Sleep` function, then poking it from the other end to tell it time has advanced. Since all we're relying on in our non-test code is a `sleep` function, we can fake this out with a small `struct`, then use the `mockRoundTripper` to keep track of what the "current time" is when it receives requests. Another change is using a channel to communicate when `pollAuthStatus` has completed, so that we can wait for this (and know nothing else is going to change) before testing the state of the system. Finally, I took the opportunity to restructure the tests a little bit to take advantage of Ginkgo's `Context`s, providing a slightly clearer separation between what scenario we're testing and what behavior we want to see.
Because we're no longer using a goroutine-based model of sleeping, we don't need to run the code under test in a separate goroutine: we can run it straight in the main test thread, saving us a level of indirection from wrapping the results up in an `outcome` struct. We could probably even get away without needing to use a channel to record the request timestamps, but that might keep us safe from another race condition further down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very smart solution. Nice work. Only questions for me.
|
||
// pollTimes is a convenience function to convert from a series of polling intervals | ||
// to their respective polling timestamps, relative to the sleeper type's starting time | ||
pollTimes := func(intervals []time.Duration) []time.Time { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Why a func
literal here? I might be missing it but I don't see what we are closing over (if that is the itnent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re correct that we’re not closing over anything here. The intent is to keep the helper function scoped to the tests where we’re using it, for test readability as much as anything else.
I’m not sure why Go doesn’t allow func
declarations within functions to use the same syntax as top-level ones; there’s probably some reason it would make the compiler more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the argument is "it's only ever used in this one place, so I put it right here"? Makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much clearer.
Closes: #1199
As best as I can tell, the flakiness here was coming from two places:
clock
library, where (for example) we would kick off a piece of code that we knew would callSleep
, callruntime.Gosched
to give it a chance to execute, and then tell theclock
library to advance time. It looks like these, again, weren't always happening in order.A bunch of the complexity here stems from the fact that we're poking our clock from one end with the
Sleep
function, then poking it from the other end to tell it time has advanced. Since all we're relying on in our non-test code is asleep
function, we can fake this out with a smallstruct
, then use themockRoundTripper
to keep track of what the "current time" is when it receives requests.Another change is using a channel to communicate when
pollAuthStatus
has completed, so that we can wait for this (and know nothing else is going to change) before testing the state of the system.Finally, I took the opportunity to restructure the tests a little bit to take advantage of Ginkgo's
Context
s, providing a slightly clearer separation between what scenario we're testing and what behavior we want to see.How did you test it?
I mostly tested this manually (and repeatedly), but also with a bit of help from Go’s race detector.