Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect ports in ssh connection to docker container #654

Open
jspaaks opened this issue Jul 25, 2019 · 7 comments
Open

incorrect ports in ssh connection to docker container #654

jspaaks opened this issue Jul 25, 2019 · 7 comments
Assignees
Labels

Comments

@jspaaks
Copy link
Member

jspaaks commented Jul 25, 2019

I had some trouble with creating an ssh connection to a docker container while preparing materials for a xenon-cli v3 based tutorial, which in turn was based on the earlier xenon-cli v2 tutorial. Here are some details of my setup:

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
$ docker --version
Docker version 18.09.3, build 774a1f4
$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.04.1)
OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.04.1, mixed mode, sharing)
$ echo $JAVA_HOME
/usr/lib/jvm/java-11-openjdk-amd64
$ mkdir -p ~/.local/bin/xenon
$ cd ~/.local/bin/xenon
$ wget https://github.com/xenon-middleware/xenon-cli/releases/download/v3.0.0/xenon-cli-shadow-3.0.0.tar
$ tar -xvf xenon-cli-shadow-3.0.0.tar
$ echo '' >> ~/.bashrc
$ echo '#Add xenon cli to the PATH:' >> ~/.bashrc
$ echo 'PATH=$PATH:~/.local/bin/xenon/xenon-cli-shadow-3.0.0/bin' >> ~/.bashrc
$ source ~/.bashrc

I started the docker slurm container with

$ docker run --detach --publish 10022:22 --hostname slurm17 nlesc/xenon-slurm:17

as per the tutorial text. I tried to connect to it with:

$ ssh -p 10022 xenon@localhost
$ exit

which worked as normal.

But then (slight digression from the tutorial text, probably needed as a result of the xenon 2 -> 3 upgrade; location now includes the schema part)

$ xenon scheduler slurm --location ssh://localhost:10022 --username xenon --password javagat queues
ssh adaptor: Connection setup to localhost:10022 failed!

I think I've tracked down the place where things go wrong:

session = client.connect(username, host, port).verify(timeout).getSession();

Even though we give it port 10022, somehow inside of the .connect() it uses 22 and then it fails (at least at some point I got an error message from inside .connect() that said something like connection id = xenon@localhost:22).

Anyway, restarting the docker container with

$ docker run --detach --publish 22:22 --hostname slurm17 nlesc/xenon-slurm:17

and then

$ xenon scheduler slurm --location ssh://localhost:22 --username xenon --password javagat queues

gives the expected response:

Available queues: mypartition, otherpartition
Default queue: mypartition

I'd be interested to hear if this is indeed a bug. Could be I'm just doing something wrong.

Sidenote: I expect we are missing a session.close() or something, because it takes 1 minute or so to return focus to the user after the answer is printed in the terminal.

@jmaassen
Copy link
Member

I cannot reproduce the issue with the connection setup. I do see the delay to return focus though

@sverhoeven
Copy link
Member

Also can't reproduce port mismatch inside a VirtualBox VM and on bare metal Linux.

@jmaassen
Copy link
Member

After some debugging, it turns out the delay in exiting the application is caused by the mina layer used by SSHD.

The SSHD implementation can use different communication layers: standard java sockets, MINA, or Netty. To select an implementation you can use different dependencies:

  • sshd-core
  • sshd-mina
  • sshd-netty

We were using the sshd-mina in Xenon 3.0.0. Unfortunately, the mina library creates a thread pool internally which does not seems to shutdown immediately when the application tries to exit.

Switching to sshd-core solves the problem.

@jmaassen
Copy link
Member

For reference sshd-netty seems to have a similar problem. The hanging threads are named differently, but also prevent the JVM from shutting down.

Almost seems like we are supposed to explicitly shut down something which we forget?

@jmaassen
Copy link
Member

Turns out we were missing a SshClient.stop() in SSHConnection.close(). The different ssh sessions created by the client were all shut down properly, but the client itself wasn't. For the core implementation this is not a problem, but both the mina and netty implementations have an internal thread pool per client and therefore need to be properly shut down.

@sverhoeven
Copy link
Member

On https://github.com/apache/mina-sshd/blob/master/docs/dependencies.md the sshd-mina and sshd-netty dependencies are explained.

Looks like sshd-mina is the legacy socket implementation. Maybe we should switch to the default or netty implementation?

@jspaaks
Copy link
Member Author

jspaaks commented Aug 1, 2019

did some more digging into the connection error thing.
tried sidestepping my normal .ssh settings

mv ~/.ssh ~/.ssh-sidelined
mkdir ~/.ssh
chmod 700 ~/.ssh

Then

xenon scheduler slurm --location ssh://localhost:10022 --username xenon --password javagat queues

works as normal.

Next, I copied files from ~/.ssh-sidelined to ~/.ssh to see where things break

  • copied a bunch of *.pem files, all good
  • copied id_rsa and id_rsa.pub, works
  • copied authorized_keys, still good
  • even known_hosts with 74 entries, still good
  • but then copying config breaks xenon scheduler slurm --location ssh://localhost:10022 --username xenon --password javagat queues
    • Further digging down intoconfig, disabled all lines with #, works again
    • I have a Port 22 somewhere in there, I really don;t know why or what it does, but by toggling the line on and off with # I was able to generate an error or pass, respectively. Not sure what to do about it, but at least it works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants