Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fxtest: Enforceable Timeouts on Lifecycle #1203

Merged
merged 4 commits into from
May 29, 2024

Conversation

JacobOaks
Copy link
Contributor

In Fx, if starting/stopping the application takes longer than the context's timeout, the fx.App will bail early, regardless of the status of the currently running hooks.

This prevents stalling an application when hooks (accidentally) block forever.

In order to test hook behavior, Fx provides fxtest.Lifecycle to interact with. This Lifecycle is a simple wrapper around the actual fx Lifecycle type, meaning it does not check for timeout and bail early like fx.App does. This is an issue because:

  • It allows for long-running hooks (which should be considered bugs) to pass tests.
  • It allows for tests to completely stall for hooks that block forever.

See #1180 for more details.

This PR adds an option that can be passed to fxtest.NewLifecycle to cause it to immediately fail when context expires, similar to fx.App.

lc := fxtest.NewLifecycle(fxtest.EnforceTimeout(true))
lc.Append(fx.StartHook(func() { for {} }))
ctx, _ := context.WithTimeout(context.Background(), time.Second)
err := lc.Start(ctx) // Will return deadline exceeded after a second

This PR doesn't provide a way to test timeouts using RequireStart and RequireStop. However, since those don't take contexts anyways, my assumption is that usage of those APIs represents an intentional decision to not care about timeouts by the test writer.

However, if people feel differently, we can instead do something like expose two new APIs RequireStartWithContext(ctx) and RequireStopWithContext(ctx) or something (names obviously up for discussion).

In Fx, if starting/stopping the application takes longer than the context's timeout,
the fx.App will bail early, regardless of the status of the currently running hooks.

This prevents stalling an application when hooks (accidentally) block forever.

In order to test hook behavior, Fx provides fxtest.Lifecycle to interact with.
This Lifecycle is a simple wrapper around the actual fx Lifecycle type,
meaning it does not check for timeout and bail early like fx.App does.
This is an issue because:
* It allows for long-running hooks (which should be considered bugs) to pass tests.
* It allows for tests to completely stall for hooks that block forever.

See uber-go#1180 for more details.

This PR adds an option that can be passed to `fxtest.NewLifecycle`
to cause it to immediately fail when context expires, similar to fx.App.

```
lc := fxtest.NewLifecycle(fxtest.EnforceTimeout(true))
lc.Append(fx.StartHook(func() { for {} }))
ctx, _ := context.WithTimeout(context.Background(), time.Second)
err := lc.Start(ctx) // Will return deadline exceeded after a second
```

This PR doesn't provide a way to test timeouts using `RequireStart` and
`RequireStop`. However, since those don't take contexts anyways,
my assumption is that usage of those APIs represents an intentional decision
to not care about timeouts by the test writer.

However, if people feel differently, we can instead do something like expose two new APIs
`RequireStartWithContext(ctx)` and `RequireStopWithContext(ctx)` or something
(names obviously up for discussion).
@JacobOaks JacobOaks requested a review from abhinav May 21, 2024 14:22
Copy link

codecov bot commented May 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.51%. Comparing base (abda254) to head (1c5db75).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1203      +/-   ##
==========================================
- Coverage   98.57%   98.51%   -0.06%     
==========================================
  Files          34       34              
  Lines        2871     2900      +29     
==========================================
+ Hits         2830     2857      +27     
- Misses         35       36       +1     
- Partials        6        7       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JacobOaks JacobOaks requested a review from tchung1118 May 21, 2024 14:24
@sywhang sywhang mentioned this pull request May 21, 2024
Copy link
Collaborator

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it's a bit unexpected that RequireStart and RequireStop don't enforce the timeout, but I also understand, and agree with why—where would the timeout come from? We can expand those APIs in a backwards compatible manner in the future if necessary (e.g. with an option to control their timeouts, or the WithContext method variants which may be more obvious, if a bit verbose).

}

var _ fx.Lifecycle = (*Lifecycle)(nil)

// NewLifecycle creates a new test lifecycle.
func NewLifecycle(t TB) *Lifecycle {
func NewLifecycle(t TB, opts ...LifecycleOption) *Lifecycle {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, this is considered a breaking change per Go compatibility guarantees.
Adding variadic options to a function that didn't previously accept them still changes the signature.
It breaks cases like passing a reference to the function, for example.

I'll defer to y'all about whether you'd like to take the risk, since passing fastest.NewLifecycle would be a pretty niche use.

Alternatively, we can add one of NewLifecycleWith, NewLifecycleWithOptions, or NewStrictLifecycle instead. (The last defaulting to enforceTimeout = true. In that case, we could drop the EnforceTimeout option completely, and instead just add Options for future expansion.)

Copy link
Contributor Author

@JacobOaks JacobOaks May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - makes sense. I did consider that it is technically a breaking change, but I'm assuming that the impact radius is minimal/non-existent. To justify this assumption, I did run Uber's entire unit test suite with this PR and did not see any issues, so I'm personally comfortable with this change to avoid adding multiple NewLifecycleX APIs. Especially since any issues that do exist will (or at least, should) be isolated to test code.

That said, I'm open to adding a new API instead if somebody feels strongly about this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about this; will defer to other maintainers.
I think this is fine as-is. Passing fxtest.NewLifecycle as a function parameter is a niche enough use case that I expect zero breakage.

This isn't authoritative, but just searching public code on GitHub, there are no references to fxtest.NewLifecycle not followed by a (, so they're always invocations, and won't be broken.
https://github.com/search?q=%2Ffxtest%5C.NewLifecycle%5B%5E%5C%28%5D%2F+path%3A%2F.go%24%2F&type=code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 @sywhang or @tchung1118 - do you have strong opinions on this? Otherwise I will merge this as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as-is, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion against this either but I do prefer making a slightly breaking change in this API rather than overloading the API surface with a variant of NewLifecycle given how niche the set of use cases that break from this.

fxtest/lifecycle_test.go Outdated Show resolved Hide resolved
fxtest/lifecycle_test.go Outdated Show resolved Hide resolved
fxtest/lifecycle_test.go Outdated Show resolved Hide resolved
fxtest/lifecycle_test.go Outdated Show resolved Hide resolved
fxtest/lifecycle_test.go Outdated Show resolved Hide resolved
fxtest/lifecycle_test.go Show resolved Hide resolved
Copy link
Collaborator

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with one minor behavioral improvement suggested

fxtest/lifecycle.go Outdated Show resolved Hide resolved
Comment on lines +131 to +133
go func() {
c <- fn(ctx)
}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing: Just for the sake of easier/faster cleanup, we should wrap the context with a WithCancel here so that when this function returns because of a timeout, the context that fn(ctx) is operating on is also closed, and the function returns early (if possible).

Suggested change
go func() {
c <- fn(ctx)
}()
ctx, cancel := context.WithCancel(ctx)
defer cancel()
go func() {
c <- fn(ctx)
}()

Copy link
Contributor Author

@JacobOaks JacobOaks May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, shouldn't the context that fn is operating on already be closed if its deadline is exceeded? Or are you referring to a case where fn is only returning early when canceled and not exceeded deadline for some reason? I can add this to cover this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 You're right. My mistake.
The only way this function returns early is when the context has expired.

@JacobOaks
Copy link
Contributor Author

lgtm with one minor behavioral improvement suggested

Thank you for all your helpful review @abhinav!

Copy link
Contributor

@sywhang sywhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 :shipit:

@JacobOaks JacobOaks merged commit 9e6f6c2 into uber-go:master May 29, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants