Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zmq_proxy_steerable #4600

Merged
merged 2 commits into from Oct 12, 2023
Merged

Fix zmq_proxy_steerable #4600

merged 2 commits into from Oct 12, 2023

Conversation

nnog
Copy link

@nnog nnog commented Oct 12, 2023

Continuing Brett's work on zmq_proxy_steerable here: #4598

2 problems addressed (split into 2 commits):

  1. PAUSE/RESUME commands were seemingly mixed up and the test didn't cover it.
  2. The reimplementation changed behaviour by always replying for the simple commands PAUSE, RESUME, TERMINATE, so ZMQ_PAIR usage would have to change when migrating.

I agree that REP makes the most sense for the control socket type, but prior semantics of the control socket pretty much dictated the use ZMQ_PAIR, so this just restores backwards compatibility.

New test assertions with existing src/proxy.cpp:

steer: sending STATISTICS - post-pause                                             
stats: client pkts out: 15 worker pkts out: 32 { 30 345 64 736 64 736 30 345 }     
tests/test_proxy_steerable.cpp:465:test_proxy_steerable:FAIL: Expected 575 Was 1081

@bluca
Copy link
Member

bluca commented Oct 12, 2023

[  113s] tests/test_proxy_steerable.cpp: In function 'void test_proxy_steerable()':
[  113s] tests/test_proxy_steerable.cpp:461:23: error: unused variable 'bytes' [-Werror=unused-variable]
[  113s]      unsigned long int bytes = statistics (proxy_control, "post-pause");

@nnog
Copy link
Author

nnog commented Oct 12, 2023

bytes is referenced in test macros on lines 465, 472. How are you building/where are you seeing this? I can't find this in any CI logs. Thanks

@bluca
Copy link
Member

bluca commented Oct 12, 2023

bytes is referenced in test macros on lines 465, 472. How are you building/where are you seeing this? I can't find this in any CI logs. Thanks

It's in any of the failed OBS CI jobs listed below, they are all 32bit builds

@bluca
Copy link
Member

bluca commented Oct 12, 2023

please rebase and squash the fixup commit

George Cockshull added 2 commits October 12, 2023 15:22
Problem: the new reimplementation of zmq_proxy_steerable had PAUSE/RESUME
that didn't follow expected behaviour. Possibly mixed up. Test didn't properly
cover the issue.

Solution: improve test coverage, fix the proxy command parsing.

I had no knowledge of the pre-MPL-2.0 implementation. This fix is based
on documented semantics and prior API usage. I contribute this
under MPL-2.0.
Problem: reimplemented zmq_proxy_steerable control socket type is not
backwards compatible - replies are always sent.

In the past, zmq_proxy_steerable never sent a reply for
commands that weren't STATISTICS, so only really worked with PAIR and didn't
work at all with REP. Now it only supports REP and PAIR semantics changed. This
breaks compatibility with PAIR in a subtle and slightly annoying way if
HWMs are hit without reading the replies.

Solution: Add a check to send the empty reply only for
REP control sockets. This restores backwards compatibility, and supports
REP, PAIR, and SUB (for non-reply commands).

I had no knowledge of the pre-MPL-2.0 implementation. This fix is based
on docs and prior API usage. I contribute this under MPL-2.0.
@emusgrave
Copy link

For posterity, when the previous implementation used a sub socket for the control, it made it easy to publish one TERMINATE that would control many listeners. I have code that spawns worker threads which are subscribed to that same socket address and therefore can safely close down with the proxy. I believe this was from one of the original examples of the usage of the proxy.

@emusgrave
Copy link

I was just able to test this PR and things are looking good again for using a sub socket for the TERMINATE control.
Thanks @nnog and @bluca. I was banging my head against the wall for a bit when I had upgraded to 4.3.4 and my proxies stopped working.

@bluca bluca merged commit 6b80df1 into zeromq:master Oct 12, 2023
63 of 67 checks passed
@nnog
Copy link
Author

nnog commented Oct 12, 2023

I was just able to test this PR and things are looking good again for using a sub socket for the TERMINATE control. Thanks @nnog and @bluca. I was banging my head against the wall for a bit when I had upgraded to 4.3.4 and my proxies stopped working.

Yes, SUB control socket will work now as long as you only send non-replying commands. It will still encounter ENOTSUP and exit the proxy call with -1 if you do send STATISTICS. Which from a developer's perspective is probably better than silently dropping the command.

@bluca
Copy link
Member

bluca commented Oct 12, 2023

The test is flacky and fails on slow architectures:

[ 2361s] FAIL: tests/test_proxy_steerable
[ 2361s] ================================
[ 2361s]
[ 2361s] tests/test_proxy_steerable.cpp:458:test_proxy_steerable:FAIL: Expression Evaluated To FALSE

https://build.opensuse.org/build/network:messaging:zeromq:git-stable/Fedora_Rawhide/s390x/libzmq/_log

@swelborn
Copy link

@bluca @nnog is there a release timeline for this? In 4.3.5, can you use PUB/SUB sockets? thanks for your work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants