Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests timing out don't say timeout #233

Closed
remy opened this issue Feb 10, 2016 · 9 comments
Closed

Tests timing out don't say timeout #233

remy opened this issue Feb 10, 2016 · 9 comments

Comments

@remy
Copy link

remy commented Feb 10, 2016

We've got much more complex suite of tests, but we're finding that we're getting reports of "missing plan" as a failing test, and yet the total number of passing tests doesn't actually match the plan.

This is one example, the only clue that it's actually timed out is the 30s, but otherwise it's not clear.

screen shot 2016-02-10 at 16 30 23

I'm not sure if this is something that's combined with request, but I'm able to replicate using it.

I've created a package tap-test-timeout-fail that replicates the issue. You'll need to cd into the dir and run npm test to see the issue - this is what it gives me (plan count issue, rather than timeout):

screen shot 2016-02-10 at 17 12 31

The index.js is this:

var test = require('tap').test;
var request = require('request');

test('test', t => {
  t.plan(2);

  t.pass('normal');
  request({
    url: `http://hang.leftlogic.com/?${32 * 1000}`,
  }, () => {
    t.pass('request complete');
  });
});

Note that the hang url will not respond until 32 seconds - i.e. outside the default timeout.

Should this actually show a timeout error? I tried replicating with unref'ing a timer, but tap exited early, so I don't think it's that.

@isaacs
Copy link
Member

isaacs commented Feb 11, 2016

Yeah, this is a bug for sure. Here's a simpler reproduction case:

var t = require('./')

var opt = {}
if (!isNaN(process.argv[2])) {
  opt.timeout = +process.argv[2]
}

t.test('test', opt, function (t) {
  t.plan(2)
  t.pass('this is fine')
  setTimeout(function (res) {
    t.pass('request complete')
  }, 2000)
})

Running directly with an argument, it prints the timeout pretty nicely:

$ node t.js 1000
TAP version 13
    # Subtest: test
    1..2
    ok 1 - this is fine
    not ok 2 - timeout!
      ---
      expired: test
      timeout: 1000
      at:
        line: 8
        column: 3
        file: t.js
      source: |
        t.test('test', opt, function (t) {
      ...
    # failed 1 of 2 tests
not ok 1 - test # time=1011.291ms
  ---
  timeout: 1000
  at:
    line: 8
    column: 3
    file: t.js
  results:
    plan:
      start: 1
      end: 2
    count: 2
    pass: 1
    ok: false
    fail: 1
    time: 1011.291
  source: |
    t.test('test', opt, function (t) {
  ...

1..1
# failed 1 of 1 tests
# time=2033.463ms

However, when executed with tap, the parent test runner process sends a SIGTERM, which it interprets as being a failure (which is correct), but hte output is kinda funky. Note the missing test in the planned grandchild test, then the 1..0 of the child t.js test, and then the top level getting that sigterm failure (indicating that the child test ended with a signal, instead of exiting normally).

$ tap -t1 t.js -Rtap
TAP version 13
    # Subtest: t.js
        # Subtest: test
        1..2
        ok 1 - this is fine
    1..0
not ok 1 - t.js # time=1021.292ms
  ---
  results:
    ok: false
    count: 0
    pass: 0
    plan:
      start: 1
      end: 0
  command: /usr/local/bin/node
  timeout: 1000
  signal: SIGTERM
  arguments:
    - t.js
  file: t.js
  failure: timeout
  ...

1..1
# failed 1 of 1 tests
# time=1041.976ms

The "classic" reporter doesn't handle that funky output well at all, and kinda barfs on it:

$ tap -t1 t.js
t.js 1..0
t.js .................................................. 1/2 1s
  test
  not ok test count !== plan
    +++ found
    --- wanted
    -2
    +1
    results:
      ok: false
      count: 1
      pass: 1
      plan:
        start: 1
        end: 2

total ................................................. 1/2


  1 passing (1s)
  1 failing

What's happening is that it's seeing invalid TAP (plan not matching test count), and (correctly?) printing that out as some kind of disaster, but then missing the bigger (and more actionable) problem, that there was a timeout.

This patch is a pretty easy solution:

diff --git a/lib/root.js b/lib/root.js
index 2e16a81..5bdf2e1 100644
--- a/lib/root.js
+++ b/lib/root.js
@@ -96,5 +96,5 @@ onExit(function (code, signal) {
     at: null
   })

-  tap.end()
+  tap.endAll()
 })

which changes the output to be this:

$ tap t.js -t1
t.js .................................................. 1/3 1s
  test
  not ok test unfinished: test
    at:
      line: 8
      column: 3
      file: t.js
    plan: 2
    count: 1
    source: |
      t.test('test', opt, function (t) {

  not ok test point left in queue: not ok - received SIGTERM with pending event queue activity
    requests: []
    handles:
      - type: Timer
        msecs: 2000

total ................................................. 1/3


  1 passing (1s)
  2 failing

It still doesn't actually say "timeout", which is reasonable to infer if a SIGTERM comes, but it's a step in the right direction.

Another solution, maybe worth exploring, is to use the TAP_TIMEOUT environment var that the parent can set for spawned child procs. The problem there is that it's very hard to distinguish between "the root tap in the parent test timed out" and "the child's root tap timed out based on an environment variable", and also it's totally unclear why that might've happened (an open socket or whatever) which is why the current SIGTERM behavior exists (but it's not being done properly, it seems).

It seems like perhaps the best behavior would be to let the parent runner proc send the SIGTERM, but interpret SIGTERM the same as a test timeout, but include any active handles at the time of getting the signal, so you'd get something like this:

$ tap t.js -t1
t.js .................................................. 1/2 1s
  child test
  not ok timeout!
    signal: SIGTERM
    handles:
      - type: Timer
        msecs: 20000

total ................................................. 1/2


  1 passing (1s)
  1 failing

The issue there is that then you might (in some desert island scenario) have a test where the process is getting a SIGTERM, and you don't want TAP to handle it, or it doesn't indicate a timeout.

isaacs added a commit that referenced this issue Feb 11, 2016
Fixes #233, provided that we can assume that SIGTERM always indicates
a timeout from the calling process.
isaacs added a commit that referenced this issue Feb 11, 2016
Fixes #233, provided that we can assume that SIGTERM always indicates a
timeout from the calling process.
isaacs added a commit that referenced this issue Feb 20, 2016
Fixes #233, provided that we can assume that SIGTERM always indicates a
timeout from the calling process.
@lgomez
Copy link

lgomez commented Feb 21, 2016

Hm....

I think I may be seeing the same issue but wanted to share it to confirm. I've got:

process.env.NODE_ENV = 'test';
var test = require('tap').test;
var config = require('../config');
var server = require('../lib/server');

test('Starting server', function (t) {
  t.plan(2)
  t.pass('Server starting');
  server.listen(config.server.port, function () {
    t.pass('Server started');
  });
});

And I get:

test/00-setup.js ...................................... 2/3 30s
  not ok received SIGTERM with pending event queue activity
    requests: []
    handles:
      - type: Server
        events:
          - connection
          - clientError
          - close
          - error
          - listening
          - secureConnection
          - upgrade
          - checkContinue
          - request
        connectionKey: '6::::8080'

Is this what you are referring to?

Thanks

@isaacs
Copy link
Member

isaacs commented Feb 21, 2016

@lgomez The issue you're running into is that node-tap runs each test script as a separate process. This is generally good, because you don't have to worry about state leaking between tests, and you always have a clean process environment.

However, if you have your tests structured like you would in mocha or lab, where the first test script starts a server, and then the others all do stuff with it, and then the last one closes it, then that obviously wont work, because the 00-setup.js test will always time out.

I usually approach this in one of two ways:

  1. Every test file starts the server and tears it down.
  2. Run all the tests in a single JavaScript file.
  3. The 00-setup.js script spawns the server in a detached child process, and puts the child process pid in a file. Then a zz-cleanup.js script reads the pid file, and kills the child process.

Option 1 is simpler, but a little bit slower, and if you want to test things that modify state (create a user, then have that user do a thing, then verify that another user can see the thing, etc.) then it won't work out so great, because the state is gone between processes.

Using Option 2 is pretty easy, but not as organized and can get quite messy. Also, you give up all the advantages of having clean process env for each test!

Option 3 is a bit more tricky, but I've been meaning to write a module to make it easier, since I've hand-rolled this for a few different things now. https://github.com/npm/npm-registry-couchapp/blob/master/test/00-setup.js has an example of doing this with CouchDB, but the same basic approach would work just fine for a node server as well.

@isaacs
Copy link
Member

isaacs commented Feb 22, 2016

@lgomez Check out https://github.com/tapjs/t-up

@lgomez
Copy link

lgomez commented Feb 23, 2016

Haha.. You beat me to it. I was planning to code the detached process based on your couchapp reference from before.

This is one of those cases where I just realized I'm overcomplicating my tests based on optimistic expectations instead of keeping them simple and tabling this for when they actually need to share a server. The fact is my current case doesn't have enough tests and I can just keep it to a single file.

Thank you for tup though... I'm going to use it soon.

@isaacs
Copy link
Member

isaacs commented Feb 23, 2016

@remy Can you try with tap version 5.7 and see if it fails more helpfully?

@remy
Copy link
Author

remy commented Apr 7, 2016

Sorry for the super late reply - but yes. It's all good (though I'm about to raise a new bug related to timeouts! 😄 )

@playground
Copy link

What is the solution to received SIGTERM with pending event queue activity?
I'm running tap 6.3.2

Here's the repo https://github.com/playground/sample

I'm getting timeout error even though the tests passed.

user.test.js
var request = require('request');

var app = require('../server');

var test = require('tap').test;

test('First test!', function (assert) {
assert.plan(1);

request.get('http://localhost:3000/api/users', function(err, res) {
let json = JSON.parse(res.body);
assert.equal(json[0], 'John', 'name should be John');
//assert.equal(json[1], 'Joe', 'name should be John');
assert.end();
});
});

test('Second test!', function (assert) {
assert.plan(1);

request.get('http://localhost:3000/api/users', function(err, res) {
let json = JSON.parse(res.body);
assert.equal(json[0], 'John', 'name should be John');
//assert.equal(json[1], 'Joe', 'name should be Joe');
assert.end();
});
});

Server running on port 3000
First test!
✓ name should be John

Second test!
✓ name should be John

  1. timeout!

2 passing (2s)
1 failing

  1. test/user.test.js timeout!:
    timeout!

@isaacs
Copy link
Member

isaacs commented Aug 23, 2016

@playground The I believe I did, Bob answer is "Don't leave event queue activity running long enough to get a timeout and SIGTERM".

Answered in #293.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants