Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

czmq selftests hang or die with zeromq-2 #1825

Closed
jimklimov opened this issue Dec 28, 2017 · 9 comments
Closed

czmq selftests hang or die with zeromq-2 #1825

jimklimov opened this issue Dec 28, 2017 · 9 comments

Comments

@jimklimov
Copy link
Member

jimklimov commented Dec 28, 2017

Several czmq selftests just do not pass with zeromq-2, some others time out over the limit Travis CI gives to the job (10 minutes of quietness in the log, 30 minutes overall).

If the corresponding features are not supported in the old zeromq release, they should get excluded from the test (e.g. like zproc skips itself for zeromq older than v4). If they are expected to work, these tests and/or https://github.com/zeromq/zeromq2-x Git HEAD codebase should be fixed, I guess.

  • zbeacon takes 17 minutes of quiet to end (not sure if it just timed out or was really "OK"):
$ ./src/czmq_selftest -v -t zbeacon
Running czmq test 'zbeacon'...
 * zbeacon: 
I: 17-12-28 11:12:32 zbeacon: API command=CONFIGURE
I: 17-12-28 11:12:32 zbeacon: interface=eth0 address=192.168.56.74 broadcast=192.168.56.255
I: 17-12-28 11:12:32 zbeacon: configured, hostname=192.168.56.74
I: 17-12-28 11:12:32 zbeacon: API command=CONFIGURE
I: 17-12-28 11:12:32 zbeacon: interface=eth0 address=192.168.56.74 broadcast=192.168.56.255
I: 17-12-28 11:12:32 zbeacon: configured, hostname=192.168.56.74
I: 17-12-28 11:12:32 zbeacon: API command=PUBLISH
I: 17-12-28 11:12:32 zbeacon: API command=SUBSCRIBE
I: 17-12-28 11:12:32 zbeacon: API command=$TERM
I: 17-12-28 11:12:32 zbeacon: API command=$TERM
OK

$ date
Thu Dec 28 11:29:21 UTC 2017
  • zgossip : assertion fails
$ time ./src/czmq_selftest -v -t zgossip
Running czmq test 'zgossip'...
 * zgossip: 
D: 17-12-28 11:31:36 server:     API command=BIND
D: 17-12-28 11:31:37    310:server                           : start:
D: 17-12-28 11:31:37    310:server                           :     HELLO
D: 17-12-28 11:31:37    310:server                           :         $ get first tuple
D: 17-12-28 11:31:37    310:server                           :         > have tuple
D: 17-12-28 11:31:37    310:server                           : have tuple:
D: 17-12-28 11:31:37    310:server                           :     finished
D: 17-12-28 11:31:37    310:server                           :         > connected
D: 17-12-28 11:31:37    310:server                           : connected:
D: 17-12-28 11:31:37    310:server                           :     PING
D: 17-12-28 11:31:37    310:server                           :         $ send PONG
D: 17-12-28 11:31:37    310:server                           :         > connected
D: 17-12-28 11:31:37 server:     API command=$TERM
D: 17-12-28 11:31:37 base:     API command=SET
D: 17-12-28 11:31:37 base:     API command=BIND
D: 17-12-28 11:31:37    886:base                             : start:
D: 17-12-28 11:31:37    886:base                             :     HELLO
D: 17-12-28 11:31:37    886:base                             :         $ get first tuple
D: 17-12-28 11:31:37    886:base                             :         > have tuple
D: 17-12-28 11:31:37    886:base                             : have tuple:
D: 17-12-28 11:31:37    886:base                             :     finished
D: 17-12-28 11:31:37    886:base                             :         > connected
D: 17-12-28 11:31:37    886:base                             : connected:
D: 17-12-28 11:31:37    886:base                             :     PUBLISH
D: 17-12-28 11:31:37    886:base                             :         $ store tuple if new
D: 17-12-28 11:31:37    886:base                             : connected:
D: 17-12-28 11:31:37    886:base                             :     forward
D: 17-12-28 11:31:37    886:base                             :         $ get tuple to forward
D: 17-12-28 11:31:37    886:base                             :         $ send PUBLISH
D: 17-12-28 11:31:37    886:base                             :         > connected
D: 17-12-28 11:31:37    886:base                             :         > connected
D: 17-12-28 11:31:37    886:base                             : connected:
D: 17-12-28 11:31:37    886:base                             :     PUBLISH
D: 17-12-28 11:31:37    886:base                             :         $ store tuple if new
D: 17-12-28 11:31:37    886:base                             : connected:
D: 17-12-28 11:31:37    886:base                             :     forward
D: 17-12-28 11:31:37    886:base                             :         $ get tuple to forward
D: 17-12-28 11:31:37    886:base                             :         $ send PUBLISH
D: 17-12-28 11:31:37    886:base                             :         > connected
D: 17-12-28 11:31:37    886:base                             :         > connected
D: 17-12-28 11:31:37    886:base                             : connected:
D: 17-12-28 11:31:37    886:base                             :     expired
D: 17-12-28 11:31:37    886:base                             :         $ terminate
D: 17-12-28 11:31:37    887:base                             : start:
D: 17-12-28 11:31:37    887:base                             :     HELLO
D: 17-12-28 11:31:37    887:base                             :         $ get first tuple
D: 17-12-28 11:31:37    887:base                             :         > have tuple
D: 17-12-28 11:31:37    887:base                             : have tuple:
D: 17-12-28 11:31:37    887:base                             :     ok
D: 17-12-28 11:31:37    887:base                             :         $ send PUBLISH
D: 17-12-28 11:31:37    887:base                             :         $ get next tuple
D: 17-12-28 11:31:37    887:base                             :         > have tuple
D: 17-12-28 11:31:37    887:base                             : have tuple:
D: 17-12-28 11:31:37    887:base                             :     ok
D: 17-12-28 11:31:37    887:base                             :         $ send PUBLISH
D: 17-12-28 11:31:37    887:base                             :         $ get next tuple
D: 17-12-28 11:31:37    887:base                             :         > have tuple
D: 17-12-28 11:31:37    887:base                             : have tuple:
D: 17-12-28 11:31:37    887:base                             :     finished
D: 17-12-28 11:31:37    887:base                             :         > connected
lt-czmq_selftest: src/zgossip.c:572: zgossip_test: Assertion `(!strcmp ((key), ("inproc://alpha-2")))' failed.
Aborted

real	0m0.531s
user	0m0.012s
sys	0m0.004s
@jimklimov
Copy link
Member Author

Configured the codebases about same way as travis does it :

libzmq2$ libtoolize; autoreconf ; automake --add-missing
libzmq2$ ./configure CFLAGS=-I/tmp/stable_zmq/include CPPFLAGS=-I/tmp/stable_zmq/include CXXFLAGS=-I/tmp/stable_zmq/include LDFLAGS=-L/tmp/stable_zmq/lib PKG_CONFIG_PATH=/tmp/stable_zmq/lib/pkgconfig --prefix=/tmp/stable_zmq --with-docs=no --without-documentation --with-libsodium
libzmq2$ make install

czmq$ ./configure CFLAGS=-I/tmp/stable_zmq/include CPPFLAGS=-I/tmp/stable_zmq/include CXXFLAGS=-I/tmp/stable_zmq/include LDFLAGS=-L/tmp/stable_zmq/lib PKG_CONFIG_PATH=/tmp/stable_zmq/lib/pkgconfig --prefix=/tmp/stable_zmq --with-docs=no --without-documentation --with-uuid=yes --with-libsystemd=yes --with-liblz4=yes
czmq$ make check

@bluca
Copy link
Member

bluca commented Dec 28, 2017

Yeah I mentioned this here: #1762 (comment)
weird thing is that it doesn't look like it's anything we did - one day it started hanging. I think it was together with a change in the Travis CI image. Did you manage to reproduce it locally?

@jimklimov
Copy link
Member Author

Yes, "screenshots" above are from my local debian8 vm. So it is reproducible off-travis too.

@bluca
Copy link
Member

bluca commented Dec 29, 2017

Oh that's great, I couldn't manage to reproduce it on my deb9 laptop. Any idea what the problem is?

@jimklimov
Copy link
Member Author

Nope, did not dig into code so far and not sure when/if I would - so feel free to overtake ;)

@bluca
Copy link
Member

bluca commented Dec 29, 2017

Tried a while back but could not repro - what's the config on the VM? Amd64 or i386? Cpu / ram? And what's the host cpu spec?

@jimklimov
Copy link
Member Author

X86-64, I think 2 or 4 virtual cores and 2 or 4gb RAM allocated - not near it today.
Host is a non-Xeon HP Zbook 15 G3, can't be more specific ATM.

@jimklimov
Copy link
Member Author

Intel i7-6820HQ @ 2.70 GHz on host; 2 vcpu and 2gb on VM.

@jimklimov
Copy link
Member Author

I'd hazard a guess that there was some change in another piece of equation, e.g. libsodium or something further along?

bluca added a commit to bluca/czmq that referenced this issue Mar 30, 2018
…ller

Solution: remove it. It is a backward-compat macro to use only with
the zmq_poll API, which used to take nanoseconds until libzmq 3.x
where it was changed to milliseconds.
In the zbeacon test it was used with zpoller_wait, which itself used it,
which means that when building with libzmq 2.x the timeout was twice
multiplied by 1000 and caused the test to hang.
Also remove it from the zmq_poller usage, which is new and has always
taken milliseconds.

Fixes zeromq#1825
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants