Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test of sleep has flaked in CI #4010

Closed
rk-for-zulip opened this issue Mar 28, 2020 · 6 comments · Fixed by #4061
Closed

Test of sleep has flaked in CI #4010

rk-for-zulip opened this issue Mar 28, 2020 · 6 comments · Fixed by #4061

Comments

@rk-for-zulip
Copy link
Contributor

In a recent PR with no changes to either async.js or async-test.js, one of the tests for "sleep" flaked, mysteriously sleeping for only 999 milliseconds rather than 1000.

 FAIL  src/utils/__tests__/async-test.js
  ● sleep (real) › waits for approximately the right number of milliseconds
    expect(received).toBeLessThanOrEqual(expected)
    Expected: <= 999
    Received:    1000
      45 |     // `sleep` should sleep for the specified number of milliseconds (and not,
      46 |     // for example, that many seconds or microseconds).
    > 47 |     expect(expectedMs).toBeLessThanOrEqual(actualMs);
         |                                ^
      48 |     expect(actualMs).toBeLessThan(10 * expectedMs); // [α]
      49 | 
      50 |     // [α] In theory, we can't be sure of this test; the time between the
      at _callee2$ (src/utils/__tests__/async-test.js:47:32)
      at tryCatch (node_modules/regenerator-runtime/runtime.js:62:40)
      at Generator.invoke [as _invoke] (node_modules/regenerator-runtime/runtime.js:296:22)
      at Generator.prototype.(anonymous function) [as next] (node_modules/regenerator-runtime/runtime.js:114:21)
      at tryCatch (node_modules/regenerator-runtime/runtime.js:62:40)
      at invoke (node_modules/regenerator-runtime/runtime.js:152:20)
      at node_modules/regenerator-runtime/runtime.js:162:13
      at tryCallOne (node_modules/promise/lib/core.js:37:12)
      at node_modules/promise/lib/core.js:123:15
      at flush (node_modules/asap/raw.js:50:29)

Screenshot_20200328_153252

@agrawal-d
Copy link
Member

Can you please share a link to the PR? This is an interesting situation!

@rk-for-zulip
Copy link
Contributor Author

Can you please share a link to the PR? This is an interesting situation!

It is, isn't it! The PR in question was #3958.

@agrawal-d
Copy link
Member

agrawal-d commented Mar 30, 2020

Okay, so luckily it appears to be a one-off situation. It may be because we use Date.now() to measure the time deltas here, which relies on the system clock.
Maybe this article is relevant: https://www.sitepoint.com/discovering-the-high-resolution-time-api/

@chrisbobbe
Copy link
Contributor

Okay, so luckily it appears to be a one-off situation.

This is partly lucky, and partly very unlucky. 🙂 A "flake" like this, where the test sometimes fails when nothing is actually wrong, reduces people's confidence in the tests, and makes it more tempting for people to ignore failures instead of learning from them.

What's definitely lucky is that, as I understand it, Ray and Greg have a lot of experience with some pretty bad test flakes on projects before they were employed at Zulip, so avoiding them in Zulip has been a priority. @ray-kraesig added the use of Lolex, in #3886, for most tests involving time, to avoid exactly this kind of flake.

Corresponding to this test's naming as "(real)", as opposed to the other one that says "(ideal)", it looks like it doesn't use Lolex; it actually waits 1000-ish milliseconds, following the instruction to wait 1000 milliseconds. The reason for keeping a non-Lolex sleep test is best explained in the commit message for 019cc09e3:

The original test remains in place, to test that sleep also has
reasonably robust behavior in the presence of a non-ideal clock.

Apparently, on that failure, the actual sleep time was 999 milliseconds, which is surprising to me; I considered the instructed number of milliseconds to be an inclusive lower bound on the actual number of milliseconds. But now I'm looking, I haven't found documentation to confirm that.

@Maskedman99
Copy link
Contributor

Maskedman99 commented Apr 21, 2020

PR #4045 failed Travis-CI due to this, ( my PR failing again while resolving conflicts :( )
https://travis-ci.org/github/zulip/zulip-mobile/builds/677579703
commit 309b7ff

rk-for-zulip added a commit to rk-for-zulip/zulip-mobile that referenced this issue Apr 21, 2020
Sometimes, `setTimeout(1000)` will fire after only 999 milliseconds.
This appears to be due to an upstream bug in Node.js itself [1].

Closes zulip#4010.

[1] nodejs/node#26578
gnprice added a commit to rk-for-zulip/zulip-mobile that referenced this issue Apr 24, 2020
When a test fails, Jest's error messages lean quite hard on this
expectation.  Here's a failure from zulip#4010:

  ● sleep (real) › waits for approximately the right number of milliseconds

    expect(received).toBeLessThanOrEqual(expected)

    Expected: <= 999
    Received:    1000

(Plus the "received" and "1000" are highlighted in red, and the
"expected" and "999" in green.)

The truth is that we expected a value >= 1000 and received 999,
so having it shown backward comes out pretty confusing.
Flip the arguments so Jest shows them the right way around.
@gnprice
Copy link
Member

gnprice commented Apr 24, 2020

Ray found the cause of this -- here's his commit message for the fix (which I just merged):

async: Fudge test value to defeat flake in `sleep` test.

Sometimes, `setTimeout(1000)` will fire after only 999 milliseconds.
This appears to be due to an upstream bug in Node.js itself [1].

Closes #4010.

[1] https://github.com/nodejs/node/issues/26578

As I wrote on the PR: it's reassuring, in a way, that this turns out to be a specific bug in Node with an understood cause that puts an upper bound on its effect. Less reassuring that the maintainers seem pretty blasé about whether to do anything about it. :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants