Tests timing out don't say timeout #233

remy · 2016-02-10T17:13:42Z

We've got much more complex suite of tests, but we're finding that we're getting reports of "missing plan" as a failing test, and yet the total number of passing tests doesn't actually match the plan.

This is one example, the only clue that it's actually timed out is the 30s, but otherwise it's not clear.

I'm not sure if this is something that's combined with request, but I'm able to replicate using it.

I've created a package tap-test-timeout-fail that replicates the issue. You'll need to cd into the dir and run npm test to see the issue - this is what it gives me (plan count issue, rather than timeout):

The index.js is this:

var test = require('tap').test;
var request = require('request');

test('test', t => {
  t.plan(2);

  t.pass('normal');
  request({
    url: `http://hang.leftlogic.com/?${32 * 1000}`,
  }, () => {
    t.pass('request complete');
  });
});

Note that the hang url will not respond until 32 seconds - i.e. outside the default timeout.

Should this actually show a timeout error? I tried replicating with unref'ing a timer, but tap exited early, so I don't think it's that.

The text was updated successfully, but these errors were encountered:

isaacs · 2016-02-11T08:04:31Z

Yeah, this is a bug for sure. Here's a simpler reproduction case:

var t = require('./')

var opt = {}
if (!isNaN(process.argv[2])) {
  opt.timeout = +process.argv[2]
}

t.test('test', opt, function (t) {
  t.plan(2)
  t.pass('this is fine')
  setTimeout(function (res) {
    t.pass('request complete')
  }, 2000)
})

Running directly with an argument, it prints the timeout pretty nicely:

$ node t.js 1000
TAP version 13
    # Subtest: test
    1..2
    ok 1 - this is fine
    not ok 2 - timeout!
      ---
      expired: test
      timeout: 1000
      at:
        line: 8
        column: 3
        file: t.js
      source: |
        t.test('test', opt, function (t) {
      ...
    # failed 1 of 2 tests
not ok 1 - test # time=1011.291ms
  ---
  timeout: 1000
  at:
    line: 8
    column: 3
    file: t.js
  results:
    plan:
      start: 1
      end: 2
    count: 2
    pass: 1
    ok: false
    fail: 1
    time: 1011.291
  source: |
    t.test('test', opt, function (t) {
  ...

1..1
# failed 1 of 1 tests
# time=2033.463ms

However, when executed with tap, the parent test runner process sends a SIGTERM, which it interprets as being a failure (which is correct), but hte output is kinda funky. Note the missing test in the planned grandchild test, then the 1..0 of the child t.js test, and then the top level getting that sigterm failure (indicating that the child test ended with a signal, instead of exiting normally).

$ tap -t1 t.js -Rtap
TAP version 13
    # Subtest: t.js
        # Subtest: test
        1..2
        ok 1 - this is fine
    1..0
not ok 1 - t.js # time=1021.292ms
  ---
  results:
    ok: false
    count: 0
    pass: 0
    plan:
      start: 1
      end: 0
  command: /usr/local/bin/node
  timeout: 1000
  signal: SIGTERM
  arguments:
    - t.js
  file: t.js
  failure: timeout
  ...

1..1
# failed 1 of 1 tests
# time=1041.976ms

The "classic" reporter doesn't handle that funky output well at all, and kinda barfs on it:

$ tap -t1 t.js
t.js 1..0
t.js .................................................. 1/2 1s
  test
  not ok test count !== plan
    +++ found
    --- wanted
    -2
    +1
    results:
      ok: false
      count: 1
      pass: 1
      plan:
        start: 1
        end: 2

total ................................................. 1/2


  1 passing (1s)
  1 failing

What's happening is that it's seeing invalid TAP (plan not matching test count), and (correctly?) printing that out as some kind of disaster, but then missing the bigger (and more actionable) problem, that there was a timeout.

This patch is a pretty easy solution:

diff --git a/lib/root.js b/lib/root.js
index 2e16a81..5bdf2e1 100644
--- a/lib/root.js
+++ b/lib/root.js
@@ -96,5 +96,5 @@ onExit(function (code, signal) {
     at: null
   })

-  tap.end()
+  tap.endAll()
 })

which changes the output to be this:

$ tap t.js -t1
t.js .................................................. 1/3 1s
  test
  not ok test unfinished: test
    at:
      line: 8
      column: 3
      file: t.js
    plan: 2
    count: 1
    source: |
      t.test('test', opt, function (t) {

  not ok test point left in queue: not ok - received SIGTERM with pending event queue activity
    requests: []
    handles:
      - type: Timer
        msecs: 2000

total ................................................. 1/3


  1 passing (1s)
  2 failing

It still doesn't actually say "timeout", which is reasonable to infer if a SIGTERM comes, but it's a step in the right direction.

Another solution, maybe worth exploring, is to use the TAP_TIMEOUT environment var that the parent can set for spawned child procs. The problem there is that it's very hard to distinguish between "the root tap in the parent test timed out" and "the child's root tap timed out based on an environment variable", and also it's totally unclear why that might've happened (an open socket or whatever) which is why the current SIGTERM behavior exists (but it's not being done properly, it seems).

It seems like perhaps the best behavior would be to let the parent runner proc send the SIGTERM, but interpret SIGTERM the same as a test timeout, but include any active handles at the time of getting the signal, so you'd get something like this:

$ tap t.js -t1
t.js .................................................. 1/2 1s
  child test
  not ok timeout!
    signal: SIGTERM
    handles:
      - type: Timer
        msecs: 20000

total ................................................. 1/2


  1 passing (1s)
  1 failing

The issue there is that then you might (in some desert island scenario) have a test where the process is getting a SIGTERM, and you don't want TAP to handle it, or it doesn't indicate a timeout.

Fixes #233, provided that we can assume that SIGTERM always indicates a timeout from the calling process.

lgomez · 2016-02-21T20:06:41Z

Hm....

I think I may be seeing the same issue but wanted to share it to confirm. I've got:

process.env.NODE_ENV = 'test';
var test = require('tap').test;
var config = require('../config');
var server = require('../lib/server');

test('Starting server', function (t) {
  t.plan(2)
  t.pass('Server starting');
  server.listen(config.server.port, function () {
    t.pass('Server started');
  });
});

And I get:

test/00-setup.js ...................................... 2/3 30s
  not ok received SIGTERM with pending event queue activity
    requests: []
    handles:
      - type: Server
        events:
          - connection
          - clientError
          - close
          - error
          - listening
          - secureConnection
          - upgrade
          - checkContinue
          - request
        connectionKey: '6::::8080'

Is this what you are referring to?

Thanks

isaacs · 2016-02-21T22:58:17Z

@lgomez The issue you're running into is that node-tap runs each test script as a separate process. This is generally good, because you don't have to worry about state leaking between tests, and you always have a clean process environment.

However, if you have your tests structured like you would in mocha or lab, where the first test script starts a server, and then the others all do stuff with it, and then the last one closes it, then that obviously wont work, because the 00-setup.js test will always time out.

I usually approach this in one of two ways:

Every test file starts the server and tears it down.
Run all the tests in a single JavaScript file.
The 00-setup.js script spawns the server in a detached child process, and puts the child process pid in a file. Then a zz-cleanup.js script reads the pid file, and kills the child process.

Option 1 is simpler, but a little bit slower, and if you want to test things that modify state (create a user, then have that user do a thing, then verify that another user can see the thing, etc.) then it won't work out so great, because the state is gone between processes.

Using Option 2 is pretty easy, but not as organized and can get quite messy. Also, you give up all the advantages of having clean process env for each test!

Option 3 is a bit more tricky, but I've been meaning to write a module to make it easier, since I've hand-rolled this for a few different things now. https://github.com/npm/npm-registry-couchapp/blob/master/test/00-setup.js has an example of doing this with CouchDB, but the same basic approach would work just fine for a node server as well.

isaacs · 2016-02-22T06:13:28Z

@lgomez Check out https://github.com/tapjs/t-up

lgomez · 2016-02-23T00:02:26Z

Haha.. You beat me to it. I was planning to code the detached process based on your couchapp reference from before.

This is one of those cases where I just realized I'm overcomplicating my tests based on optimistic expectations instead of keeping them simple and tabling this for when they actually need to share a server. The fact is my current case doesn't have enough tests and I can just keep it to a single file.

Thank you for tup though... I'm going to use it soon.

isaacs · 2016-02-23T17:49:42Z

@remy Can you try with tap version 5.7 and see if it fails more helpfully?

remy · 2016-04-07T12:08:50Z

Sorry for the super late reply - but yes. It's all good (though I'm about to raise a new bug related to timeouts! 😄 )

playground · 2016-08-23T04:21:13Z

What is the solution to received SIGTERM with pending event queue activity?
I'm running tap 6.3.2

Here's the repo https://github.com/playground/sample

I'm getting timeout error even though the tests passed.

user.test.js
var request = require('request');

var app = require('../server');

var test = require('tap').test;

test('First test!', function (assert) {
assert.plan(1);

request.get('http://localhost:3000/api/users', function(err, res) {
let json = JSON.parse(res.body);
assert.equal(json[0], 'John', 'name should be John');
//assert.equal(json[1], 'Joe', 'name should be John');
assert.end();
});
});

test('Second test!', function (assert) {
assert.plan(1);

request.get('http://localhost:3000/api/users', function(err, res) {
let json = JSON.parse(res.body);
assert.equal(json[0], 'John', 'name should be John');
//assert.equal(json[1], 'Joe', 'name should be Joe');
assert.end();
});
});

Server running on port 3000
First test!
✓ name should be John

Second test!
✓ name should be John

timeout!

2 passing (2s)
1 failing

test/user.test.js timeout!:
timeout!

isaacs · 2016-08-23T15:48:48Z

@playground The I believe I did, Bob answer is "Don't leave event queue activity running long enough to get a timeout and SIGTERM".

Answered in #293.

isaacs added a commit that referenced this issue Feb 11, 2016

Assume that SIGTERM = timeout, and surface to user better

61c6185

Fixes #233, provided that we can assume that SIGTERM always indicates a timeout from the calling process.

isaacs added a commit that referenced this issue Feb 11, 2016

Assume that SIGTERM = timeout, surface to user

bf0b2b3

Fixes #233, provided that we can assume that SIGTERM always indicates a timeout from the calling process.

isaacs mentioned this issue Feb 11, 2016

Assume that SIGTERM = timeout, surface to user #235

Merged

isaacs added a commit that referenced this issue Feb 20, 2016

Assume that SIGTERM = timeout, surface to user

5a641a4

Fixes #233, provided that we can assume that SIGTERM always indicates a timeout from the calling process.

isaacs closed this as completed in #235 Feb 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests timing out don't say timeout #233

Tests timing out don't say timeout #233

remy commented Feb 10, 2016

isaacs commented Feb 11, 2016

lgomez commented Feb 21, 2016

isaacs commented Feb 21, 2016

isaacs commented Feb 22, 2016

lgomez commented Feb 23, 2016

isaacs commented Feb 23, 2016

remy commented Apr 7, 2016

playground commented Aug 23, 2016

isaacs commented Aug 23, 2016

Tests timing out don't say timeout #233

Tests timing out don't say timeout #233

Comments

remy commented Feb 10, 2016

isaacs commented Feb 11, 2016

lgomez commented Feb 21, 2016

isaacs commented Feb 21, 2016

isaacs commented Feb 22, 2016

lgomez commented Feb 23, 2016

isaacs commented Feb 23, 2016

remy commented Apr 7, 2016

playground commented Aug 23, 2016

isaacs commented Aug 23, 2016