New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filewatch: avoid accepting stale/duplicate events #4397
Conversation
This fixes the somewhat legitimate flakiness experienced by one of the tests (`TestCrashRebuildTwoContainersTwoImages`). Previously, we added logic to avoid missed live updates, but the safer logic means that it's possible to get a duplicate change: in the case of the test that was failing intermittently, it had two image targets that watched overlapping sets of files. If `foo` is changed, once `img1` target sees the event, it adds a record to its pending file set for `time.Now()`, and a build is triggered. Then, `img2` target sees the *same* event, and adds a record to its pending file set for `time.Now()`, which will _intentionally_ not be cleared after the build finishes because it appears to have been a change that came from _after_ the build was started. As a result, it'll end up triggering another build. The revised logic in this commit tries to handle cases safely but also avoid redundant triggered builds: 1) The event pre-dates the start time of the last finished build, so just drop it. (In practice, this is exceedingly unlikely to happen because it would mean an event was very delayed.) 2) There's a current build in progress * If it was started _after_ the file was seen AND already knows about the file, just ignore it. (This handles the case of the overlapping target watch sets as in the flaky test.) * Otherwise, use `time.Now()` so that it won't be erroneously cleared at the finish of the build. (This is the missed live update case.) It's still possible to get a redundant build in some cases, but that is impossible to avoid reliably given the current logic. Since we do make an effort to batch file events (within a target), the most likely cause would be two image targets with non-overlapping changed files, where the later-to-be-processed file change happened before the build was started, but we use `time.Now()` since it isn't in the list of affected files for the build. The likelihood of multiple almost instantaneous changes to files across targets is low enough that this is still an acceptable trade-off vs the alternative of missing updates. If extra builds become a common complaint, we could add a slight waiting period of ~100ms or so after the latest file change before we would start a build to increase the likelihood that changes across targets have successfully propagated.
For context, the failure in the tilt/internal/engine/buildcontroller_test.go Line 451 in 228fba6
We expect the flag to be exactly I actually originally tried to just put a wait further up in the test until all pending changes had been consumed, but they were getting "stuck" because buildcontrol fails the tilt/internal/store/build_result.go Lines 472 to 478 in 228fba6
That said, while I'm sure it's possible to band-aid that test, it is indicative of a potential cause for redundant builds due to timing issues, so seemed worth trying to improve the pending file change logic further. |
this might be breaking Windows tests?
FYI I had to drop out the first case (event received that's older than the last finished build) because the Windows tests are hitting timing issues. That was a totally new optimization, so it's not a huge surprise. I'll see if I can patch up the tests and re-add that in its own PR later. Realistically that case should never hit anyway 🤷 (I wanted to get this merged because the changes to exit controller in #4367 seem to exacerbate the flaky test failing due to timing changes) |
PendingConfigFileChanges had its own change tracking, which didn't get the fixes we applied in #4397 This removes that state completely and unifies Tiltfile change tracking with other stuff
PendingConfigFileChanges had its own change tracking, which didn't get the fixes we applied in #4397 This removes that state completely and unifies Tiltfile change tracking with other stuff
PendingConfigFileChanges had its own change tracking, which didn't get the fixes we applied in #4397 This removes that state completely and unifies Tiltfile change tracking with other stuff
This fixes the somewhat legitimate flakiness experienced by one of
the tests (
TestCrashRebuildTwoContainersTwoImages
).Previously, we added logic to avoid missed live updates, but the
safer logic means that it's possible to get a duplicate change:
in the case of the test that was failing intermittently, it had two
image targets that watched overlapping sets of files. If
foo
ischanged, once
img1
target sees the event, it adds a record to itspending file set for
time.Now()
, and a build is triggered.Then,
img2
target sees the same event, and adds a record to itspending file set for
time.Now()
, which will intentionally notbe cleared after the build finishes because it appears to have been
a change that came from after the build was started. As a result,
it'll end up triggering another build.
The revised logic in this commit tries to handle cases safely but
also avoid redundant triggered builds:
1) The event pre-dates the start time of the last finished build, sojust drop it. (In practice, this is exceedingly unlikely to happen
because it would mean an event was very delayed.)
2) There's a current build in progress
* If it was started after the file was seen AND already knows
about the file, just ignore it. (This handles the case of the
overlapping target watch sets as in the flaky test.)
* Otherwise, use
time.Now()
so that it won't be erroneouslycleared at the finish of the build. (This is the missed live
update case.)
NOTE: The first case was removed because it's causing issues with Windows tests due to timing issues -- will fix & re-add in its own PR because it's really a "new" optimization
It's still possible to get a redundant build in some cases, but that
is impossible to avoid reliably given the current logic. Since we do
make an effort to batch file events (within a target), the most likely
cause would be two image targets with non-overlapping changed files,
where the later-to-be-processed file change happened before the build
was started, but we use
time.Now()
since it isn't in the list ofaffected files for the build. The likelihood of multiple almost
instantaneous changes to files across targets is low enough that this
is still an acceptable trade-off vs the alternative of missing updates.
If extra builds become a common complaint, we could add a slight
waiting period of ~100ms or so after the latest file change before we
would start a build to increase the likelihood that changes across
targets have successfully propagated.