Yet another attempt to improve event subscription with minimal changes to the existing code base#404
Yet another attempt to improve event subscription with minimal changes to the existing code base#404
Conversation
|
Please find attached an OpenDocument spreadsheet listing all the cases I could identify (there might be more) where the events do not work as expected in Tango 9 and what is wrong for each specific case. I tested this PR with only a few cases listed on this spreadsheet and I saw the same wrong behaviours as before. One important point is that when a user subscribes to an event with a specific hostname in his TANGO_HOST or FQDN, I think he should always get the same hostname in the EventData object from the callback as the one used during the subscription. @taurel, please do not hesitate to contradict me if the initial idea was different or if you think this does not make sense. As an example, here is one case which does not work, even with a single computer:
In this case, some events are received but the name (hostname part) in the events does not correspond to the name used at the time of the subscription (except for the synchronous call which is OK in this case). I didn't test it with the code from this PR, but #350 should be easy to verify with a simple setup. @Ingvord please try to setup your computers in order to be able to reproduce the problems, else it will be very painful to debug and fix all these problems. Do you need some C++ client event subscriber examples? Hoping this helps, |
|
@t-b I think we can consider there are several bugs here... |
|
@bourtemb could you please attach your C++ client code as well? |
|
Here it is: |
|
@bourtemb , super! Thanks!!! |
|
@bourtemb , Please find attached an updated version of your spread sheet: EventsTests.ods.zip. I have tested current codebase against "case 17" - all except netalias. It seems to work apart from the attribute name substitution. Here is the way how I did it:
In this setup Tango host is running in a docker, but exports port 10000 to localhost. Both client and server are running on the same machine but use different TANGO_HOST/FQDN to subscribe to an event. Could you please test the code in your environment (with netaliases)? |
|
Hi, I did some simple tests with TangoTest device server on a single machine. |
|
It looks like the duplicate event after the synchronous call does not happen all the time... I will need deeper investigations to try to reproduce it systematically. |
|
It looks like I can reproduce the duplicated event issue systematically when I do the following:
At that point you should see the same event received twice. From that point on, if you restart any client, it will be systematic on the client side. And if you start several clients, they will all see this duplicated event. |
|
@bourtemb , thanks a lot! I will have a look into it... |
|
Hi, Here is an updated version of the spreadsheet with the results of the tests using an host having a network alias defined: EventsTests-reynald-tests-08Dec2017.zip. |
bb5d79f to
9f73d5d
Compare
|
About the problem with the duplicated event, this is what I understood after talking with @taurel: |
|
@bourtemb , Was it you under "tango-tickets-migrator"? :) Thanks for the info, good to know! In fact I have seen this in the code, there is a comment at zmqeventconsumer.cpp#2024: But this behavior observed (two times event after reconnection) was due to this: Note hardcoded And obviously (after a few hours of debugging) it sent the same event twice to the same client, but due to different Changing I have tested cases B18-24 (except netalias... I tried to setup docker network with aliases, but failed miserably) with reconnection and now they all green (I have not highlighted though - awaiting your approval). |
|
Hi Igor, |
|
@bourtemb , well, this is wiered... I can not reproduce your case localy. Here is my output: Could you please attach your server's out put ( |
|
You are not in the same conditions... You are using another device server, which is pushing events by code. |
|
@bourtemb , nope. Here is CORBA::Any *IOIncValue::execute(Tango::DeviceImpl *device, TANGO_UNUSED(const CORBA::Any &in_any))
{
((static_cast<DevTest *>(device))->attr_event[2])++;
((static_cast<DevTest *>(device))->attr_event64[0])++;
return insert();
}
It simply increases the value of the attribute which is then detected by the polling thread. |
|
You are using a spectrum attribute, right? It's a different type. I don't see the problem with float_spectrum_ro attribute from TangoTest for instance. |
|
OK, it makes sense now. Thanks! |
|
Here are the logs from the server started with -v5: |
|
I got a core dumped on the client side when trying to subscribe to a tango 9 version of TangoTest device server: Here is the backtrace: In this case, it seems the local_callback_key variable is reset to "" at event_consumer.cpp:1899: |
|
@bourtemb , I have been able to track the source of the problem you have observed (this one). This is again due to client_lib being wrong on the server side: Note string::size_type pos = event.find(EVENT_COMPAT);
if (pos != string::npos)
{
client_lib_str = event.substr(pos + 3,1);
stringstream ss;
ss << client_lib_str;//extracted 5
ss >> client_lib;
event.erase(0,EVENT_COMPAT_IDL5_SIZE);
}Should be fixed in 11f826f |
|
It looks like 11f826f is not fixing the duplicated event I was reporting and is also introducing other incompatibility issues with old clients/servers. |
|
It looks very strange to me to have a variable named EVENT_COMPAT_IDL5 and to assign it a value like "idl6_". |
|
Working on tango-9-lts may simplify your life too. |
|
About the network alias issues, there is one case which works now (except compatibility and duplicated events issues), when the client subscribes with FQAN = tango://<network_alias>.<domain_name>:<port_number>/my/device/name/attr... 👍 |
I would wish to think so too... but. There is a clear evidence:
The last line is in attribute.cpp#5945: if (count(client_lib[i].begin(), client_lib[i].end(), _l) == 0)
client_lib[i].push_back(_l);
vector<int> &client_libs = attr.get_client_lib(CHANGE_EVENT);//returns client_lib[i] from 2)
vector<int>::iterator ite;
string ev_name = EventName[CHANGE_EVENT];
bool inc_ctr = true;
cout3 << "EventSupplier::detect_and_push_change_event(): iterating over client libs" << endl;
for (ite = client_libs.begin();ite != client_libs.end();++ite)
{
// Now it has two ite values - 5&6 -> push_event is called twice
//...
cout3 << "push_event" << endl;
push_event(device_impl,
ev_name,
filterable_names,
filterable_data,
filterable_names_lg,
filterable_data_lg,
sent_value,
attr_name,
except,
inc_ctr);
The only way to fix this (as I see it) is to tell the server the exact version of the client (hence f49cb02) But, unlike ZmqEventSubscriptionChange, EventConfirmSubscription accepts DevVarStringArray for multiple events and checks the input argument like this: if ((in_data->length() == 0) || (in_data->length() % 3) != 0)
{
TangoSys_OMemStream o;
o << "Wrong number of input arguments: 3 needed per event: device name, attribute/pipe name and event name"
<< endl;
Except::throw_exception((const char *) API_WrongNumberOfArgs, o.str(),
(const char *) "DServer::event_confirm_subscription");
}
So basically I can not add another argument that specifies client version: //THIS WON'T WORK
if ((in_data->length() == 0) || (in_data->length() % 4) != 0)Because for instance both //eventcmds.cpp#1027
for (size_t i = 0; i < nb_event; i++)
{
string dev_name, obj_name, event, obj_name_lower;
int base = i * 3;
dev_name = (*argin)[base];
obj_name = (*argin)[base + 1];
event = (*argin)[base + 2];This way I ended up with 11f826f
At the moment this does not have much difference. In fact I am not sure this PR is even back portable :( |
I have just got an idea: did you mean - there is no client_lib==6 in Tango 9, so this problem might not occur? |
Yes, that's what I meant. |
|
Closed due to #423 |



see #315, #385