Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REQ sockets terminate TCP connection after first heartbeat if ZMQ_HEARTBEAT_IVL is set. #3060

Closed
zpincus opened this issue Apr 27, 2018 · 1 comment

Comments

@zpincus
Copy link

zpincus commented Apr 27, 2018

Issue description

I have verified that the ZMTP 3.1 heartbeat protocol works great with PUB/SUB sockets. However, when I tried to turn it on for a REQ/REP pair, I encountered immediate failures on the REQ-socket side.

Looking at traffic with wireshark shows that after the first cycle of PING/PONG packets, the REQ socket sends a TCP FIN, and then silently fails to send any further data or attempt to reconnect. This behavior occurs regardless of whether the heartbeating is enabled on the REQ side or REP side, or whether timeouts/ttl are enabled or disabled.

Environment

  • libzmq version (commit hash if unreleased): 4.2.3
  • I tested using pyzmq 17.0.0 to keep the code simple, but TCP errors sure seem like an issue in the underlying libzmq.
  • OS: OS X 10.12.6

Minimal test code

Here's an "echo server" script:

import zmq
c = zmq.Context()
s = c.socket(zmq.REP)
s.bind('tcp://127.0.0.1:5555')
while True:
    s.send(s.recv())

And the matching REQ "client":

import zmq
import time
c = zmq.Context()
s = c.socket(zmq.REQ)
s.HEARTBEAT_IVL = 5000
s.HEARTBEAT_TIMEOUT = 50000
s.connect('tcp://127.0.0.1:5555')
i = 0
while True:
    s.send(str(i).encode())
    print(s.recv())
    i += 1
    time.sleep(1)

Running both simultaneously results in the disconnects described below.

The results are the same if the heartbeating is turned on on the server-side instead, with:

s.HEARTBEAT_IVL = 5000
s.HEARTBEAT_TTL = 50000

(Here setting the TTL instead of the TIMEOUT, for symmetry.)

If neither TIMEOUT nor TTL are set, the ping/pong protocol should be enabled, but there should be no disconnections on missed heartbeats. Even in this case, however, the REQ socket still disconnects after the first ping/pong.

What's the actual result?

The client can properly send and receive for 5 seconds (or whatever the heartbeat interval is set to). As soon as a heartbeat packet is sent and received (or received and sent, if it originated on the server), I observe that the client sends a TCP FIN. Subsequent sends will "succeed" on the client side, but no actual data will be sent. (This was tested with wireshark capturing all communication on the loopback interface to/from TCP port 5555.)

What's the expected result?

The expected result is that the REQ socket will continue to send heartbeats and if a disconnect is detected, try to reconnect.

For example, the following works just fine with a PUB/SUB pair.

"Server":

import zmq
import time
c = zmq.Context()
s = c.socket(zmq.PUB)
s.bind('tcp://127.0.0.1:5555')
i = 0
while True:
    s.send(str(i).encode())
    i += 1
    time.sleep(1)

"Client":

import zmq
c = zmq.Context()
s = c.socket(zmq.SUB)
s.HEARTBEAT_IVL = 5000
s.HEARTBEAT_TIMEOUT = 50000
s.connect('tcp://127.0.0.1:5555')
s.subscribe('')
while True:
    print(s.recv())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants