Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH Commands Occasionally Hanging #271

Closed
johnnyman727 opened this issue Sep 1, 2015 · 8 comments · Fixed by #543
Closed

SSH Commands Occasionally Hanging #271

johnnyman727 opened this issue Sep 1, 2015 · 8 comments · Fixed by #543

Comments

@johnnyman727
Copy link
Contributor

@rwaldron have you seen SSH commands just hanging recently? It seems like every 3rd or 4th time I deploy, the CLI just hangs. It's not failing on any particular bash command, one of them just ends up not closing. Perhaps it closes before event listener is ready?

@rwaldron
Copy link
Contributor

rwaldron commented Sep 1, 2015

Nope, I haven't experienced this issue, sorry :/

@johnnyman727
Copy link
Contributor Author

Here is a little clue:

➜  t2-demo  DEBUG=deploy t2 run index.js --timeout 3 --lan
INFO Looking for your Tessel...
INFO Connected to Harold over LAN
  deploy Executing command: +0ms stopRunningScript
  deploy Command: +75ms stopRunningScript Completed
  deploy Executing command: +2ms deleteFolder

The deleteFolder command never closes. This could be an issue with the USB Daemon. Perhaps it's getting packets out of sync and isn't able to close the process.

This is pretty hard to debug b/c it's very occasional for me (maybe 5% of deploys).

@johnnyman727
Copy link
Contributor Author

Scratch that - it is not an issue with the USB Daemon - as you can see in the logs above, this is a LAN connection.

@Student007
Copy link
Member

@johnnyman727 does #342 fix this ?

@johnnyman727
Copy link
Contributor Author

@Student007 it's hard to say because the issue comes up so rarely and inconsistently. I will try it out though.

@johnnyman727
Copy link
Contributor Author

@Student007 I was able to reproduce the issue and unfortunately, nether close or end events get called :/

@johnnyman727
Copy link
Contributor Author

Another clue:
It appears that the logic gets stuck after the command is run?

➜  t2-cli  t2 rename Frank
INFO Looking for your Tessel...
INFO Connected to Tessel-02A337692559 over LAN
^C%                                                               

➜  t2-cli  t2 rename Frank
INFO Looking for your Tessel...
INFO Connected to Frank over LAN
WARN Name of device is already Frank

@johnnyman727
Copy link
Contributor Author

I believe I've tracked down the source of this error. We run simpleExec when we just want to execute a remote command and get stdout back to the calling function. The issue arises when the command is executed but then the process completes and closes before we set up the event listener for the close event within the receive function. The function is just sitting there forever waiting for an event that was already called.

There are two ways I can think of to resolve this:

  1. Collapse simpleExec and receive into one function so avoid the overhead of returning Promises that wrap event watchers:
Tessel.prototype.simpleExec = function(command) {
  // Stop processes and delete everything in the folder
  return this.connection.exec(command)
    .then((remote) => {
      var error = '';
      var received = new Buffer(0);

      remote.stderr.on('data', function(buffer) {
        error += buffer.toString();
      });

      remote.stdout.on('data', function(buffer) {
        received = Buffer.concat([received, buffer]);
      });

      remote.once('close', function() {
        if (error) {
          return reject(new Error(error));
        } else {
          return resolve(received);
        }
      });
    });
};

The downside being a fair amount of code refactoring (all the functions that call .receive would have to be updated to call simpleExec and, more importantly, it doesn't completely resolve the root cause of the problem. You could still have a process that completes before the .then statement is executed and event listeners set up (albeit a much smaller likelihood than the current implementation).

  1. Check if the process is closed before setting up the closed event listener and resolve immediately:
Tessel.prototype.receive = function(remote) {
  var error = '';
  var received = new Buffer(0);

  // End early if this process already completed
  if (remote._readableState.ended === true) {
    return Promise.resolve(received);
  }

  remote.stderr.on('data', function(buffer) {
    error += buffer.toString();
  });

  remote.stdout.on('data', function(buffer) {
    received = Buffer.concat([received, buffer]);
  });

  return new Promise(function(resolve, reject) {
    console.log('waiting for close...', remote);
    remote.once('close', function() {
      console.log('closed. was there an error in receive?', error.length, error);
      if (error) {
        return reject(new Error(error));
      } else {
        return resolve(received);
      }
    });
  });
};

The major downside here being that neither Node streams nor the ssh2 remote stream object have convenient, public properties or methods for confirming that the stream has already closed. There is a chance that the implementation may change in future Node versions and render this fix broken.

I'm in favor of fix 2. I think it the most simple, logical, and robust but I do wish there was a public API for checking stream lifecycle state.

Any other thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants