Torque integration tests broken? #677

jmaassen · 2021-01-18T11:55:38Z

When I run the integration tests for xenon, the Torque tests fail with the following error:

java.lang.IllegalArgumentException: No internal port '22' for container 'torque': com.palantir.docker.compose.connection.Container$$Lambda$96/0x0000000840137840@2d2ea655
	at com.palantir.docker.compose.connection.Container.lambda$port$11(Container.java:91)

Other integrations test that use docker images (such as slurm and gridengine) seem to work as expected.

The text was updated successfully, but these errors were encountered:

jmaassen · 2021-01-18T12:39:06Z

When starting the docker image manually like so:

 docker run --detach --name xenon-torque --hostname xenon-torque --publish 10022:22 --cap-add SYS_RESOURCE xenonmiddleware/torque

and running the liveTest like this:

./gradlew liveTest -Dxenon.scheduler=torque -Dxenon.username=xenon -Dxenon.password=javagat -Dxenon.scheduler.location=ssh://localhost:10022 -Dxenon.scheduler.workdir=/home/xenon

the test run succesfully. So it seems there is an issue in the testing framework itself, not the code or docker image. Maybe the healthcheck succeeds too quickly?

jmaassen · 2021-01-18T13:13:00Z

Starting the docker images with docker compose:

docker-compose -f torque-5.0.0.yml up

and running the live tests in the same fashion:

./gradlew liveTest -Dxenon.scheduler=torque -Dxenon.username=xenon -Dxenon.password=javagat -Dxenon.scheduler.location=ssh://localhost:32830 -Dxenon.scheduler.workdir=/home/xenon

does not work. It results in the same error as with the integration tests.

sverhoeven · 2021-01-18T13:51:39Z

Works for me

docker --version
Docker version 20.10.2, build 2291f61

docker image inspect xenonmiddleware/torque | jq '.[0].RepoDigests'
[
  "xenonmiddleware/torque@sha256:5a98982c2ad0cefc6994004ce4da69e68b8f0c4d596a9732dd7574e75f2153d4"
]

gradlew integrationTest --tests '*torque*'
Starting a Gradle Daemon, 2 incompatible Daemons could not be reused, use --status for details

Deprecated Gradle features were used in this build, making it incompatible with Gradle 6.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/5.4.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 59s
6 actionable tasks: 2 executed, 4 up-to-date

PS. xenonmiddleware/torque is the only Docker image where we don't install the scheduler/fs ourselves.

sverhoeven · 2021-01-18T13:59:29Z

Using livetest command also works for me. Did have to prime the known_hosts file by logging in manually before calling gradle.

Also saw that the healthcheck is causing 2021-01-18 13:58:45,612 CRIT reaped unknown pid 5465) in docker compose log.

jmaassen · 2021-01-18T14:23:32Z

The digest of xenonmiddleware/torque matches. I do have an older version of docker though: Docker version 19.03.8, build afacb8b7f0

jmaassen · 2021-01-18T14:29:16Z

I also see the unknown pid message, but also these (when I start docker-compose manually):

Starting docker-compose_torque_1 ... done
Attaching to docker-compose_torque_1
torque_1  | /usr/lib/python2.6/site-packages/supervisor/options.py:295: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
torque_1  |   'Supervisord is running as root and it is searching '
torque_1  | 2021-01-18 14:27:22,213 CRIT Supervisor running as root (no user in config file)
torque_1  | 2021-01-18 14:27:22,215 INFO supervisord started with pid 1
torque_1  | 2021-01-18 14:27:23,217 INFO spawned: 'pbsmom' with pid 16
torque_1  | 2021-01-18 14:27:23,218 INFO spawned: 'sshd' with pid 17
torque_1  | 2021-01-18 14:27:23,219 INFO spawned: 'pbssched' with pid 18
torque_1  | 2021-01-18 14:27:23,220 INFO spawned: 'pbsserver' with pid 20
torque_1  | 2021-01-18 14:27:23,220 INFO spawned: 'trqauthd' with pid 21
torque_1  | 2021-01-18 14:27:23,248 CRIT reaped unknown pid 24)
torque_1  | 2021-01-18 14:27:23,345 INFO exited: pbssched (exit status 0; not expected)
torque_1  | 2021-01-18 14:27:23,352 INFO gave up: pbssched entered FATAL state, too many start retries too quickly
torque_1  | 2021-01-18 14:27:24,444 INFO success: pbsmom entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
torque_1  | 2021-01-18 14:27:24,444 INFO success: sshd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
torque_1  | 2021-01-18 14:27:24,444 INFO success: pbsserver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
torque_1  | 2021-01-18 14:27:24,444 INFO success: trqauthd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
torque_1  | 2021-01-18 14:27:24,444 CRIT reaped unknown pid 58)
torque_1  | 2021-01-18 14:27:25,604 CRIT reaped unknown pid 77)
torque_1  | 2021-01-18 14:27:26,779 CRIT reaped unknown pid 96)

does pbssched fail?

sverhoeven · 2021-01-18T14:48:08Z

I have the same log message, but when I log in the pbs_sched process is running and qsub work as expected.

jmaassen · 2021-01-18T15:13:48Z

Updating docker from 19.03.8 to 20.10.2 did not help, but updating docker-compose from 1.25.0 to 1.27.4 seems to have squashed this bug.

jmaassen · 2021-01-18T15:14:17Z

Resolved as a docker-compose version issue.

jmaassen added Bug Torque Adaptor labels Jan 18, 2021

jmaassen closed this as completed Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torque integration tests broken? #677

Torque integration tests broken? #677

jmaassen commented Jan 18, 2021

jmaassen commented Jan 18, 2021

jmaassen commented Jan 18, 2021

sverhoeven commented Jan 18, 2021

sverhoeven commented Jan 18, 2021

jmaassen commented Jan 18, 2021

jmaassen commented Jan 18, 2021

sverhoeven commented Jan 18, 2021

jmaassen commented Jan 18, 2021

jmaassen commented Jan 18, 2021

Torque integration tests broken? #677

Torque integration tests broken? #677

Comments

jmaassen commented Jan 18, 2021

jmaassen commented Jan 18, 2021

jmaassen commented Jan 18, 2021

sverhoeven commented Jan 18, 2021

sverhoeven commented Jan 18, 2021

jmaassen commented Jan 18, 2021

jmaassen commented Jan 18, 2021

sverhoeven commented Jan 18, 2021

jmaassen commented Jan 18, 2021

jmaassen commented Jan 18, 2021