Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

travis_start_sauce_connect invalid retries leading to travis_stop_sauce_connect killing an invalid PID, leading to tunnel not closing #7178

Closed
mangui opened this issue Jan 18, 2017 · 10 comments

Comments

@mangui
Copy link

mangui commented Jan 18, 2017

Hi there, we are using Travis/SauceLabs on https://github.com/dailymotion/hls.js, and
while investigating an issue with Travis Saucelabs tunnel not closing appropriately, although travis_stop_sauce_connect was called,
I found out the following from Travis logs :https://travis-ci.org/dailymotion/hls.js/jobs/193023923

if you unfold Starting Sauce Connect
you could see that Tunnel is setup appropriately on first try. as we can see sc PID is 1941

Extracting Sauce Connect
Waiting for Sauce Connect readyfile
18 Jan 14:19:13 - Sauce Connect 4.4.2, build 3154 c8dd102-dirty
18 Jan 14:19:13 - Using CA certificate bundle /etc/ssl/certs/ca-certificates.crt.
18 Jan 14:19:13 - Using CA certificate verify path /etc/ssl/certs.
18 Jan 14:19:13 - Starting up; pid 1941
18 Jan 14:19:13 - Command line arguments: sc-4.4.2-linux//bin/sc -i 1809.9 -f sauce-connect-ready-25615 -l /home/travis/sauce-connect.log 
18 Jan 14:19:13 - Log file: /home/travis/sauce-connect.log
18 Jan 14:19:13 - Pid file: /tmp/sc_client-1809.9.pid
18 Jan 14:19:13 - Timezone: UTC GMT offset: 0h
18 Jan 14:19:13 - Using no proxy for connecting to Sauce Labs REST API.
18 Jan 14:19:13 - Resolving saucelabs.com to 162.222.75.243 took 2 ms.
18 Jan 14:19:14 - Started scproxy on port 38903.
18 Jan 14:19:14 - Please wait for 'you may start your tests' to start your tests.
18 Jan 14:19:14 - Starting secure remote tunnel VM...
18 Jan 14:19:18 - Secure remote tunnel VM provisioned.
18 Jan 14:19:18 - Tunnel ID: ed84d9aca5674fe4a3f9d6bbb5a1e98b
18 Jan 14:19:19 - Secure remote tunnel VM is now: booting
18 Jan 14:19:21 - Secure remote tunnel VM is now: running
18 Jan 14:19:21 - Using no proxy for connecting to tunnel VM.
18 Jan 14:19:21 - Resolving tunnel hostname to 162.222.75.92 took 12ms.
18 Jan 14:19:21 - Starting Selenium listener...
18 Jan 14:19:21 - Establishing secure TLS connection to tunnel...
18 Jan 14:19:22 - Selenium listener started on port 4445.
18 Jan 14:19:34 - Sauce Connect is up, you may start your tests.
~/build/dailymotion/hls.js

I am not clear why, but although the tunnel seems to be setup properly, the script is retrying.
these retries are happening everytime (see all batches here for example)

The command "eval travis_start_sauce_connect" failed. Retrying, 2 of 3
...
The command "eval travis_start_sauce_connect" failed. Retrying, 3 of 3

the unit tests then work as the tunnel was setup correctly at first attempt.

the issue is that after test is finished, the tunnel is not destroyed on calling travis_stop_sauce_connect, which is trying to kill ${_SC_PID}

but at that time, ${_SC_PID} is not matching with the right sauce_connect process
/home/travis/build.sh: line 356: kill: (2073) - No such process

SC_PID is 2073 instead of 1941.

as we can see SC_PID is retrieved just after launching sc
https://github.com/travis-ci/travis-build/blob/1e005eb00653b81bd8a4a64b76ed604c9ca52b94/lib/travis/build/addons/sauce_connect/templates/sauce_connect.sh#L60

=> SC_PID contains the PID value of the 3rd retry, not the one from the working tunnel
=> I guess SC_PID needs to be persisted only if sc command was successful.
=> second question is why did the retries happened...

from the code it seems that travis_start_sauce_connect returns failure
although I don't see this line printed in the logs.

I am suspecting that the return value of travis_start_sauce_connect is not correct.

any help to investigate and fix this problem would be greatly appreciated, thanks !
Guillaume aka mangui

@BanzaiMan
Copy link
Contributor

The current logic is like this:

function travis_start_sauce_connect() {
  ⋮
  sc … &
  _SC_PID="$!"

  echo "Waiting for Sauce Connect readyfile"
  while test ! -f ${sc_readyfile} && ps -f $_SC_PID >&/dev/null; do
    sleep .5
  done

  if test ! -f ${sc_readyfile}; then
    echo "readyfile not created"
  fi

  popd

  test -f ${sc_readyfile}
  return $?
}

The function results with the result whether or not ${sc_readyfile} is a regular file. I assumed it was a regular file, but maybe it is not.

@tjenkinson
Copy link

Maybe the -e flag would be better then?
http://ss64.com/bash/test.html

@jpommerening
Copy link

Hi, encountered the same issue.

@BanzaiMan: $sc_readyfile seems to be a relative path and you're popding before doing the final test -f

@mangui
Copy link
Author

mangui commented Jan 19, 2017

thanks @BanzaiMan
is this change effective immediately on Travis ?

@BanzaiMan
Copy link
Contributor

@mangui Patience, young Padawan. We need to deploy it! :-D

@jpommerening
Copy link

That was quick, though 🙃

@mangui
Copy link
Author

mangui commented Jan 19, 2017

hehe no issue !
tks for the quick resolution

@BanzaiMan
Copy link
Contributor

Deployed now. Let us know how it goes for you!

@mangui
Copy link
Author

mangui commented Jan 19, 2017

good 4 me, verified on https://travis-ci.org/dailymotion/hls.js/jobs/193452771

@adriano-di-giovanni
Copy link

Sauce Connect fails to start everytime on OS X images.
https://travis-ci.org/adriano-di-giovanni/cordova-plugin-shared-preferences/jobs/334833608

I can't figure out why. The .travis.yml file is the same as the one from cordova-plugin-device (that is known to be working).

The error seems to be

dyld: lazy symbol binding failed: Symbol not found: _clock_gettime
  Referenced from: /private/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/sc.XXXX.PfqqhmZJ/sc-4.4.11-osx/bin/sc (which was built for Mac OS X 10.12)
  Expected in: /usr/lib/libSystem.B.dylib
dyld: Symbol not found: _clock_gettime
  Referenced from: /private/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/sc.XXXX.PfqqhmZJ/sc-4.4.11-osx/bin/sc (which was built for Mac OS X 10.12)
  Expected in: /usr/lib/libSystem.B.dylib

Do I have to upgrade OS X images or there is something to fix?

Thanks,
Adriano

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants