Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure GitHub auth test to remove flakiness #1358

Merged
merged 2 commits into from Feb 1, 2022

Conversation

dhwthompson
Copy link
Contributor

Closes: #1199

As best as I can tell, the flakiness here was coming from two places:

  1. Plain ol' data races, where we were sharing a variable between threads, updating it one thread and reading it in another. Sometimes these wouldn't happen in the right order, causing failures.
  2. Race conditions when interacting with the clock library, where (for example) we would kick off a piece of code that we knew would call Sleep, call runtime.Gosched to give it a chance to execute, and then tell the clock library to advance time. It looks like these, again, weren't always happening in order.

A bunch of the complexity here stems from the fact that we're poking our clock from one end with the Sleep function, then poking it from the other end to tell it time has advanced. Since all we're relying on in our non-test code is a sleep function, we can fake this out with a small struct, then use the mockRoundTripper to keep track of what the "current time" is when it receives requests.

Another change is using a channel to communicate when pollAuthStatus has completed, so that we can wait for this (and know nothing else is going to change) before testing the state of the system.

Finally, I took the opportunity to restructure the tests a little bit to take advantage of Ginkgo's Contexts, providing a slightly clearer separation between what scenario we're testing and what behavior we want to see.

How did you test it?

I mostly tested this manually (and repeatedly), but also with a bit of help from Go’s race detector.

As best as I can tell, the flakiness here was coming from two places:

1. Plain ol' data races, where we were sharing a variable between
   threads, updating it one thread and reading it in another. Sometimes
   these wouldn't happen in the right order, causing failures.
2. Race conditions when interacting with the `clock` library, where (for
   example) we would kick off a piece of code that we knew would call
   `Sleep`, call `runtime.Gosched` to give it a chance to execute, and
   then tell the `clock` library to advance time. It looks like these,
   again, weren't always happening in order.

A bunch of the complexity here stems from the fact that we're poking our
clock from one end with the `Sleep` function, then poking it from the
other end to tell it time has advanced. Since all we're relying on in
our non-test code is a `sleep` function, we can fake this out with a
small `struct`, then use the `mockRoundTripper` to keep track of what
the "current time" is when it receives requests.

Another change is using a channel to communicate when `pollAuthStatus`
has completed, so that we can wait for this (and know nothing else is
going to change) before testing the state of the system.

Finally, I took the opportunity to restructure the tests a little bit to
take advantage of Ginkgo's `Context`s, providing a slightly clearer
separation between what scenario we're testing and what behavior we want
to see.
Because we're no longer using a goroutine-based model of sleeping, we
don't need to run the code under test in a separate goroutine: we can
run it straight in the main test thread, saving us a level of
indirection from wrapping the results up in an `outcome` struct.

We could probably even get away without needing to use a channel to
record the request timestamps, but that might keep us safe from another
race condition further down the line.
Copy link
Contributor

@jpellizzari jpellizzari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very smart solution. Nice work. Only questions for me.


// pollTimes is a convenience function to convert from a series of polling intervals
// to their respective polling timestamps, relative to the sleeper type's starting time
pollTimes := func(intervals []time.Duration) []time.Time {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Why a func literal here? I might be missing it but I don't see what we are closing over (if that is the itnent).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re correct that we’re not closing over anything here. The intent is to keep the helper function scoped to the tests where we’re using it, for test readability as much as anything else.

I’m not sure why Go doesn’t allow func declarations within functions to use the same syntax as top-level ones; there’s probably some reason it would make the compiler more complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the argument is "it's only ever used in this one place, so I put it right here"? Makes sense to me.

pkg/services/auth/github_test.go Show resolved Hide resolved
Copy link
Contributor

@J-Thompson12 J-Thompson12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much clearer.

@dhwthompson dhwthompson merged commit a1f07f6 into main Feb 1, 2022
@dhwthompson dhwthompson deleted the deflakenize-github-auth-tests branch February 1, 2022 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix flaky unit test
3 participants