Skip to content

SIGINT causes hang, callbacks never hit #33

@kevinburke

Description

@kevinburke

Sourcegraph is using thread-loader to build webpack packages. Sending Ctrl+C (SIGINT) to gulp while webpack is running causes the process to hang.

I think this is what is happening:

  1. thread-loader starts worker subprocesses and starts sending work to them.
  2. thread-loader receives work via pitch() from webpack, calls run(), sends it to subprocess, tries to read from the subprocess pipe.
  3. user sends SIGINT. because all of the processes are in the same process group, they all get the signal. the subprocesses quit.
  4. an 'end' event is sent on the pipe. we don't listen for that event in readBuffer.js. as a result, we never hit the workerPool.run() callback in index.js, or the callback from readBuffer(), and we're hung.

It is possible there is a race - I can reproduce this issue about 85% of the time. The other 15% I get a stack trace like this:

Error: This socket has been ended by the other party
    at Socket.writeAfterFIN [as write] (net.js:402:12)
    at PoolWorker.writeJson (/Users/kevin/src/github.com/sourcegraph/sourcegraph/node_modules/thread-loader/dist/WorkerPool.js:94:22)
    at PoolWorker.run (/Users/kevin/src/github.com/sourcegraph/sourcegraph/node_modules/thread-loader/dist/WorkerPool.js:74:12)
    at WorkerPool.distributeJob (/Users/kevin/src/github.com/sourcegraph/sourcegraph/node_modules/thread-loader/dist/WorkerPool.js:366:20)
    at /Users/kevin/src/github.com/sourcegraph/sourcegraph/node_modules/async/queue.js:10:5
    at Object.process (/Users/kevin/src/github.com/sourcegraph/sourcegraph/node_modules/async/internal/queue.js:175:17)
    at /Users/kevin/src/github.com/sourcegraph/sourcegraph/node_modules/async/internal/queue.js:82:19
    at Immediate.<anonymous> (/Users/kevin/src/github.com/sourcegraph/sourcegraph/node_modules/async/internal/setImmediate.js:27:16)
    at runCallback (timers.js:694:18)
    at tryOnImmediate (timers.js:665:5)
    at processImmediate (timers.js:647:5)
    at process.topLevelDomainCallback (domain.js:121:23)

Oddly, I cannot reproduce this issue when I load why-is-node-running - I get the stack trace and a quit 100% of the time instead of ~15% of the time. Perhaps this has to do with the overhead of the instrumentation causing the parent process to lose the race every time.

I have a patch which essentially involves sending an end event to onWorkerMessage when readBuffer fails. This still involves some ugly logging but at least the process terminates when you send a signal to it.

See also mafintosh/why-is-node-running#41.
See also https://github.com/sourcegraph/sourcegraph/issues/186.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions