Restructure GitHub auth test to remove flakiness #1358

dhwthompson · 2022-01-27T23:16:57Z

Closes: #1199

As best as I can tell, the flakiness here was coming from two places:

Plain ol' data races, where we were sharing a variable between threads, updating it one thread and reading it in another. Sometimes these wouldn't happen in the right order, causing failures.
Race conditions when interacting with the clock library, where (for example) we would kick off a piece of code that we knew would call Sleep, call runtime.Gosched to give it a chance to execute, and then tell the clock library to advance time. It looks like these, again, weren't always happening in order.

A bunch of the complexity here stems from the fact that we're poking our clock from one end with the Sleep function, then poking it from the other end to tell it time has advanced. Since all we're relying on in our non-test code is a sleep function, we can fake this out with a small struct, then use the mockRoundTripper to keep track of what the "current time" is when it receives requests.

Another change is using a channel to communicate when pollAuthStatus has completed, so that we can wait for this (and know nothing else is going to change) before testing the state of the system.

Finally, I took the opportunity to restructure the tests a little bit to take advantage of Ginkgo's Contexts, providing a slightly clearer separation between what scenario we're testing and what behavior we want to see.

How did you test it?

I mostly tested this manually (and repeatedly), but also with a bit of help from Go’s race detector.

As best as I can tell, the flakiness here was coming from two places: 1. Plain ol' data races, where we were sharing a variable between threads, updating it one thread and reading it in another. Sometimes these wouldn't happen in the right order, causing failures. 2. Race conditions when interacting with the `clock` library, where (for example) we would kick off a piece of code that we knew would call `Sleep`, call `runtime.Gosched` to give it a chance to execute, and then tell the `clock` library to advance time. It looks like these, again, weren't always happening in order. A bunch of the complexity here stems from the fact that we're poking our clock from one end with the `Sleep` function, then poking it from the other end to tell it time has advanced. Since all we're relying on in our non-test code is a `sleep` function, we can fake this out with a small `struct`, then use the `mockRoundTripper` to keep track of what the "current time" is when it receives requests. Another change is using a channel to communicate when `pollAuthStatus` has completed, so that we can wait for this (and know nothing else is going to change) before testing the state of the system. Finally, I took the opportunity to restructure the tests a little bit to take advantage of Ginkgo's `Context`s, providing a slightly clearer separation between what scenario we're testing and what behavior we want to see.

Because we're no longer using a goroutine-based model of sleeping, we don't need to run the code under test in a separate goroutine: we can run it straight in the main test thread, saving us a level of indirection from wrapping the results up in an `outcome` struct. We could probably even get away without needing to use a channel to record the request timestamps, but that might keep us safe from another race condition further down the line.

jpellizzari

A very smart solution. Nice work. Only questions for me.

jpellizzari · 2022-01-28T16:06:02Z

pkg/services/auth/github_test.go

+
+		// pollTimes is a convenience function to convert from a series of polling intervals
+		// to their respective polling timestamps, relative to the sleeper type's starting time
+		pollTimes := func(intervals []time.Duration) []time.Time {


❓ Why a func literal here? I might be missing it but I don't see what we are closing over (if that is the itnent).

You’re correct that we’re not closing over anything here. The intent is to keep the helper function scoped to the tests where we’re using it, for test readability as much as anything else.

I’m not sure why Go doesn’t allow func declarations within functions to use the same syntax as top-level ones; there’s probably some reason it would make the compiler more complicated.

So the argument is "it's only ever used in this one place, so I put it right here"? Makes sense to me.

pkg/services/auth/github_test.go

J-Thompson12

This is much clearer.

dhwthompson added 2 commits January 27, 2022 15:04

dhwthompson added the exclude from release notes label Jan 27, 2022

dhwthompson requested review from jpellizzari and J-Thompson12 January 27, 2022 23:16

jpellizzari approved these changes Jan 28, 2022

View reviewed changes

J-Thompson12 approved these changes Jan 31, 2022

View reviewed changes

dhwthompson merged commit a1f07f6 into main Feb 1, 2022

dhwthompson deleted the deflakenize-github-auth-tests branch February 1, 2022 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure GitHub auth test to remove flakiness #1358

Restructure GitHub auth test to remove flakiness #1358

dhwthompson commented Jan 27, 2022

jpellizzari left a comment

jpellizzari Jan 28, 2022

dhwthompson Jan 28, 2022

jpellizzari Jan 28, 2022

J-Thompson12 left a comment

Restructure GitHub auth test to remove flakiness #1358

Restructure GitHub auth test to remove flakiness #1358

Conversation

dhwthompson commented Jan 27, 2022

jpellizzari left a comment

Choose a reason for hiding this comment

jpellizzari Jan 28, 2022

Choose a reason for hiding this comment

dhwthompson Jan 28, 2022

Choose a reason for hiding this comment

jpellizzari Jan 28, 2022

Choose a reason for hiding this comment

J-Thompson12 left a comment

Choose a reason for hiding this comment