Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugout network lead to server cannot receive packet #2224

Closed
wenjians opened this issue Nov 23, 2016 · 14 comments
Closed

Plugout network lead to server cannot receive packet #2224

wenjians opened this issue Nov 23, 2016 · 14 comments

Comments

@wenjians
Copy link

Hi all,
I wrote a Windows (Win 10) service that using c#, the service itself is quite simple, which is something like log service. the expectation is client send request to service and service don't need response to the client, so dealer/route mode are selected (http://zguide.zeromq.org/page:all#The-Asynchronous-Client-Server-Pattern), but we don't have bottom worker because no load balance is need.

the service is written with C# as Win10 service and client is written with C++. and the ZeroMQ version is 4.0.4, which is the latest. server code is similar as https://github.com/metadings/zguide/blob/master/examples/C%23/flserver3.cs.

it works well in normal case, when we test in some network abnormal case:

  • plugout the network link for about 30 seconds and plugin again, the communicaiton will continue when link connected.
  • plugout the network link for more than 1 minutes and plugin again, we found the C# server could not receive any packet, but when we capture the network with Wireshark, the TCP communicaiton is okay.

Is anyone has similar issues, and is there any idea how to solve this issue? what confused me is that why TCP communicaiton looks fine but server could not receive packet?

Best Regards,
Kevin

@bluca
Copy link
Member

bluca commented Nov 23, 2016

4.0.4 is very old at this point. Does this still happen with 4.2.0?

@wenjians
Copy link
Author

wenjians commented Nov 24, 2016

there is only 4.0.4 available for Windows installer: http://zeromq.org/distro:microsoft-windows
I will download the source code and build a version of 4.2.0.

@bluca
Copy link
Member

bluca commented Jan 6, 2017

Have you managed to try with 4.2.x?

@wenjians
Copy link
Author

Sorry for late reply. I tested with ZeroMQ 4.2, it doesn't work as well.
and it is strange for the following combination with sample code.
1, client (dealer) and server (router) both written in C++, it works well.
2, client (dealer in C++) and server (router in C#), it doesn't work. when we install C# with PM, automatic version is 4.1, will it be issue that client/server don't use the same version?
we will try to make client/server use the same ZeroMQ library and test again.

@metadings
Copy link
Member

Yes, using the C# version, please replace amd64/libzmq.so and/or amd64/libzmq.dll to your own version. I don't have time and machines to create a new version / to rebuild that thing every time.

@bluca
Copy link
Member

bluca commented Jan 19, 2017

The protocol itself is compatible between those versions. If as you're saying you can't reproduce the problem using only the libzmq library, then I would recommend moving the issue to the tracker of the binding you are using as there's not much we can do here.

If you can reproduce the issue using only the libzmq library, or find that the root cause is here somewhere, please feel free to reopen this issue.

@bluca bluca closed this as completed Jan 19, 2017
@wenjians
Copy link
Author

wenjians commented Jan 20, 2017

Hi,
I found that there are something wrong with previous test case. The issue is still there.

there are following test cases:
Test Case 1)
disconnect the network of client
start the application (server and client)
after sometime and connect the network
it works fine, server can receive packet

Test Case 2)
start the application (server and client)
disconnect the network of client (for more than 1 minutes)
connect the network
server can NOT receive the packet

Test Case 3)
same as Test Case 2, but the disconnect time is shorter, e.g. less than 10 seconds, then server can still receive packet from client after reconnected.

It could be reproduced each time, I use visual C++ 20013, ZeroMQ 4.2, Win10.

the sample code is similar as following.

void client_run()
{
context_t ctx_(1);
socket_t client_socket_(ctx_, ZMQ_DEALER)

char identity[100] = { 0 };
int i = time(0) % 10;
sprintf(identity, "Identity of the client is %d", i);
printf("%s\n", identity);

client_socket_.setsockopt(ZMQ_IDENTITY, identity, strlen(identity)+1);
client_socket_.connect("tcp://192.168.1.158:5000");

zmq::pollitem_t items[] = { { client_socket_, 0, ZMQ_POLLIN, 0 } };
int request_nbr = 0;
try
{
    while (true)
    {
        Sleep(1000);
        zmq::message_t msg;
        zmq::poll(items, 1, 10);
        if (items[0].revents & ZMQ_POLLIN)
        {
            zmq::message_t msg;
            client_socket_.recv(&msg);
            int msgSize = msg.size();
            if (msgSize > 0)
            {
                char * buffer = new char[msgSize + 1];
                memset(buffer, 0, msgSize + 1);
                memcpy(buffer, msg.data(), msgSize);
                Sleep(10);
                printf("%s\n", buffer);
                delete[] buffer;
            }
        }

        char request_string[200] = { 0 };
        sprintf(request_string, "request #%d", ++request_nbr);
        client_socket_.send(request_string, strlen(request_string) + 1);
        printf("%s\n", request_string);
    }
}
catch (std::exception &e) {}

}

void server_run()
{
context_t ctx_(1);
socket_t frontend_(ctx_, ZMQ_ROUTER);
int major, minor, patch;
zmq_version(&major, &minor, &patch);

std::cout << "ZeroMQ version: " << major << "." << minor << "." << patch << std::endl;

frontend_.bind("tcp://*:5000");

try {
    while (true)
    {
        zmq::message_t identity;
        zmq::message_t msg;
        zmq::message_t copied_id;
        zmq::message_t copied_msg;


        frontend_.recv(&identity);
        frontend_.recv(&msg);

        if (msg.size() > 0)
        {
            char * buffer = new char[msg.size() + 1];
            memset(buffer, 0, msg.size() + 1);
            memcpy(buffer, msg.data(), msg.size());
            Sleep(10);
            printf("Received %s\n", buffer);

            copied_id.copy(&identity);
            copied_msg.copy(&msg);

            char request_string[100] = { 0 };
            sprintf(request_string, "Received %s from %s\n", msg.data(), identity.data());
            frontend_.send(identity, ZMQ_SNDMORE);
            frontend_.send(request_string, sizeof(request_string));
            delete[] buffer;
        }
    }
}
catch (std::exception &)
{
}

}

@bluca
Copy link
Member

bluca commented Jan 20, 2017

Can you provide a test case that does not use cppzmq bindings? That's an external binding and as such could skew the results

@bluca bluca reopened this Jan 20, 2017
@wenjians
Copy link
Author

Thanks a lot for your continuously help. I made an example based on libzmq directly with C. The server cannot receive any packet when plug out (more than one minutes) the network link and plug in again.

the following is the code that I used.

int client_dealer_dealer_c()
{
void* content = zmq_ctx_new();;
void* client_socket = zmq_socket(content, ZMQ_DEALER);

char identity[100] = { 0 };
int i = time(0) % 1000;
sprintf(identity, "Identity of the client is %d", i);
printf("%s\n", identity);

zmq_setsockopt(client_socket, ZMQ_IDENTITY, identity, strlen(identity) + 1);

int linger = 0;
zmq_setsockopt(client_socket, ZMQ_LINGER, &linger, sizeof(linger));

zmq_connect(client_socket, "tcp://192.168.1.160:5000");

int request_nbr = 0;
while (1)
{
    char request_string[200] = { 0 };
    sprintf(request_string, "request #%d", ++request_nbr);
    zmq_send(client_socket, request_string, strlen(request_string) + 1, ZMQ_DONTWAIT);
    printf("%s\n", request_string);


    zmq_pollitem_t items[] = { { client_socket, 0, ZMQ_POLLIN, 0 } };
    zmq_poll(&items[0], 1, 500);

    if (items[0].revents & ZMQ_POLLIN)
    {
        //  We got a reply from the server, must match sequence
        char rcv_buffer[256];
        memset(rcv_buffer, 0, 256);
        int rcv_byte = zmq_recv(client_socket, rcv_buffer, 255, ZMQ_DONTWAIT);
        if (rcv_byte > 0)
        {
            printf("%s\n", rcv_buffer);
        }
    }
}

}

void zmq_server_task_router_c()
{
void* context = zmq_ctx_new();
void* serverSocket = zmq_socket(context, 6 /ZMQ_ROUTER/);

std::string endpoint = "tcp://*:5000";
int rc = zmq_bind(serverSocket, endpoint.c_str());


while (true)
{
    char identity[256];
    char delimiter[256];
    char rcv_message[256];

    memset(identity, 0, 256);
    memset(delimiter, 0, 256);
    memset(rcv_message, 0, 256);

    rc = zmq_recv(serverSocket, identity, 255, 0);
    rc = zmq_recv(serverSocket, rcv_message, 255, 0);

    std::string snd_message = std::string("Received (") + rcv_message + ") from (" + identity + ")";
    std::cout << snd_message << std::endl;

    zmq_send(serverSocket, identity, strlen(identity)+1, 2); // ZMQ_SNDMORE==2;
    zmq_send(serverSocket, snd_message.c_str(), snd_message.size(), 0);
}

zmq_close(serverSocket);
zmq_ctx_destroy(context);

}

@wenjians
Copy link
Author

wenjians commented Jan 23, 2017

I use the wireshark capture the TCP communication, and find that

  • the are reconnecting happen in ZeroMQ layer, i.e. both dealer/router send message with "ready" string inside the packet to peer
  • there is TCP REQ/ACK for the communication for all following message, i.e. server received the packet from TCP perspective, but this packet is not received by application.

So it looks good but application server didn't receive any packet.

is there any timeout in ZeroMQ? because if we disconnect the network and connect again in short time, say 10 seconds. the communication works after network is plugged in.

I analysed the wireshark capture (both client/server based on Win10) with 10 second and 1 minutes, and found that after reconnecting.

  • disconnect 10 seconds: the same client port address used before/after connection
  • disconnect more than 1 minute: different client port address is used.
    it is possible that ZeroMQ take as different connection? but they use same ZeroMQ identity, so is possible that ZeroMQ take it as different connection with same identity so ignore the communication?

I also run the client in virtual machine (based on Linux) to communicate with host (Windows 10), I found that even it is disconnect fore more than 1 minutes, the communication is still okay. So it looks like there are some difference between Windows/Linux ZeroMQ communication.

@wenjians
Copy link
Author

wenjians commented Jan 23, 2017

I did some more test with following test case, Dealer (Linux VM, client), Router (Win10 host, server).

  • disconnect for 5 minutes, it works after reconnect, client port keep same
  • disconnect for 11 minutes 30 seconds, it works after reconnect, client port keep same
  • disconnect for 12 minutes 30 seconds, it works after reconnect, client port keep same
  • disconnect for 21 minutes, it doesn't work after reconnect, client port changed.

so it might related with TCP keep alive, but when I check the parameter of Linux TCP

  • tcp_keep_alive_intvl: 75 seconds
  • tcp_keep_alive_probes: 9
    so tcp should be discard for 750 seconds (even take count = 10), i.e. 12 minutes 30 seconds.

so the communication lost might related with port changes (i.e. TCP connection changes), but I don't know detail how ZeroMQ detail with this.

@wenjians
Copy link
Author

I tried change the TCP keep alive parameter, it doesn't work, but fortunately, there is an option ZMQ_ROUTER_HANDOVER support from 4.1 which can switch to new connection with same identity. I tested with this option, and it works !!!

This issue can be closed, thanks for supporting.

@wenjians
Copy link
Author

only one issue in this case is that there are some messages will lost during the hand over, at least all message in buffer of old TCP connection will lost. But I didn't find any rule which message will lost in fact of time or others.

@bluca
Copy link
Member

bluca commented Jan 24, 2017

Great, happy to know that option helps your use case, thanks for reporting back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants