-
Notifications
You must be signed in to change notification settings - Fork 38
Remove race conditions from stream piping. r=garndt #332
Conversation
This certainly seems a lot nicer, I wasn't aware of promisepipe! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious to see how the tests turn out, but this looks promising (no pun intended)
I don't understand streams well enough to say much about this. But Greg seems happy! |
Appears the tests aren't happy with this. We'll need to investigate those. |
Yeah, I expected that, had to push the PR to run the tests on CI. |
scopes.hasPrefixedScopes is an async function and any call to it must await. For reasons we probably we never know, the code used to pass until we changed the code in [1]. Now it returns a pending promise. We fix this bad behavior by awaiting hasPrefixedScopes. [1] taskcluster#332
scopes.hasPrefixedScopes is an async function and any call to it must await. For reasons we probably we never know, the code used to pass until we changed it in [1]. Now it returns a pending promise. We fix this bad behavior by awaiting hasPrefixedScopes. [1] taskcluster#332
4c2ab21
to
0fec51d
Compare
5510cee
to
bdd6f53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a general improvement..
But I don't really believe that you are missing end
events..
No event can happen when you're waiting for it in the same synchronize function. There is no awaits before you create the promise...
// state1
stream1.pipe(stream2);
// state2
await waitForEvent(stream1, 'end');
// state3 No events can fire between state1 and state2. If they do, the events goes on to the event queue and gets fired when the promise is awaited. NO, I'm wrong... I realize as I'm writing.... events can fire... you can call all the event handlers synchronously... And then we would miss the event... In that case the fix could have been. const done = waitForEvent(stream1, 'end');
stream1.pipe(stream2);
await done; But yeah, using this magic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are some rather heroic measures to keep tests running. Nice work! If these tests continue to cause issues, I'd be happy deleting the run-in-parallel test (we're pretty sure docker can run multiple tasks in parallel now, and that's unlikely to regress), for example.
a58068a
to
9d93033
Compare
got is a more stream friendly library.
In several parts of the code there is a common pattern to combine nodejs pipe method and waitForEvent: stream1.pipe(stream2); await waitForEvent(stream1, 'end'); It turns out this code is racy and may end up in a deadlock. If the 'end' event arrives before the 'end' event, waitForEvent will get stuck forever. We solve it by using the promisepipe package, which supplies a promised version of the pipe function and replace the code above with promisepipe.
* fs.exits is an asynchronous callback function, replace it by its sync version. * lastUsed isn't necessarily monotonic, this really depends on how fast caches are added.
Some binary packages of nodejs randomly fail to build during yarn install, causing the task to fail.
Parallel task test infers if tasks ran in parallel by testing if they ran in a bounded amount of time. This methodology is weak because the time a task takes to complete may vary dramastically because the virtual machine where the tests execute may be under heavy load. Theoretically we should apply some statistics by repeating the tests a certain number of times, removing the outliers and calculate the average time along with its variance, and then check if the results are acceptable. As the development time urges, we take a simpler approach: we run the tests until the time spent to execute the tasks is bounded, at a maximum of 10 tries, and if at least one try succeeds, the test passes. As the parallelism of a task, by itself, is something deterministic, if we get one green iteration, it is enough.
The reasoning behind blocking this test on development environment was because we didn't want to mess up developer machine iptables configuration. We the tests are intended to run under a Vagrant machine, we reenable them.
Upload failures are, in most of the cases, due to a very short time external problem; it doesn't make sense to wait for several minutes to try to upload again. That said, we set the maximum back off time to 30 seconds. As we retry 10 times, we must adjust the exponential factor to make the timeout at most 30 seconds in worst case to avoid timeout saturation.
Dashes and underlines in the container image name may cause patterns that are invalid in docker reference. We replace all dashes and underlines by '0' to make sure no such invalid patterns are built. We also fix the unlink call, which expects a callback instead of returning a Promise.
The strategy to simulate a network failure when testing the artifact upload retry feature is to add a rule to iptables when we detect the worker is uploading an artifact to S3. Due to the asynchronous nature of nodejs, other parts of the worker may run while waiting for I/O response, and with iptables blocking outgoing traffic, the unrelated network calls will be blocked. The solution is to block only the outgoing packets to Amazon S3 service. We do so by downloading the IP ranges from Amazon and filtering those that belong to the us-west-2 region and whose service is S3. We then block those IP ranges.
the app.use in koa package must receive a generator, even if we don't yield anything.
For indexed image pulling tests, we have a fixed task containing a test image. Unfortunatelly, the task expired and broke the tests. We expand the create_lz4_images script to also upload tar and zst images and use a unique task for image pulling tests. We also update the task IDs accordingly.
waitForEvent installs a event handler and then waits for it be called. But between the event sink and the handler, the event may be trigged which would cause the Promise waiting forever.
0bc848d
to
6a5d502
Compare
As we use an event catching mechanism to simulate a network down for retry upload, during spawning of iptables process, the upload may resume and finish before the S3 IPs are blocked. To avoid a false negative in these tests, we retry a few times before mark the test as failed.
6a5d502
to
4f5563b
Compare
In several parts of the code there is a common pattern to combine nodejs
pipe method and waitForEvent:
stream1.pipe(stream2);
await waitForEvent(stream1, 'end');
It turns out this code is racy and may end up in a deadlock. If the 'end'
event arrives before the 'end' event, waitForEvent will get stuck
forever.
We solve it by using the promisepipe package, which supplies a promised
version of the pipe function and replace the code above with
promisepipe.
Update: this branch evolved and now includes a lot of fix for intermittent test failures.