-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full Yellow Local - Fix Remoted tests in 4.2 #1530
Comments
While the issues were being created, it was possible to see that the same errors message were showing in different tests. I add the list of error messages related to multiple issues.
|
General Research: Done. It allowed creating each fail contained on this issue. |
I did a full run of the remoted tests on a branch that combines the changes on the #1625 and those on the #1624. The progress went much further than it was before but there were failures in a test that in fact had no failure cases in #1573. We are going to have to take a closer look at the tests
|
Update including changes of #1624 and #1625I performed full
|
2021-07-27Used Wazuh-QA branch:
The following errors happened in the second running:
The origin of both errors is the same:
|
2021-07-28 Update including changes of #1662I performed full
|
Update 28-07-21After merging #1662 the fail I show the results:
|
2021-07-28Used Wazuh-QA branch:
|
Update 28-07-21Test Results:
Note: |
2021-07-30Used Wazuh-QA branch:
The same test has failed in all rounds:
Therefore I have run it standalone using the same branch:
After confirming that errors also occur I ran it again using the
The failure is repeated every time. Although this test appears in the issue #1573 (case 1), the error detected is different:
Since the problem seems to be API related I attach these logs. |
Update 30-07-21It is the last execution with all changes.
|
2021-07-30Used Wazuh-QA branch:
The first round shows failures that should ve analyzed in the next tests:
|
I have launched three rounds of Tests results
|
02-08-21Used Wazuh-QA branch:
|
2021-08-02Wazuh-QA branch: The test
UpdateThese are the results of the tests with the whole test_remoted folder.
UpdateThe remoted folder was tested again, but this time without the
|
03-08-21Used Wazuh-QA branch:
|
I spent some time analyzing the issues in
In all the executions in which we observe failures, they were caused because the Man In The Middle didn't find the event that it was looking for. Having said that, I analyzed the MITM implementation and how it is used in these tests. Essentially for these tests the MITM will monitor the UDP socket
Considering the fact that From this analysis we can make the next conclusions:
We are analyzing the second item to see whether or not is the cause of the issue. |
03/08/21The intermittent failures of Many different approaches are being carried out to read the UpdateThe whole test_remoted folder was run with this new_ossec.conf file. Basically, all the modules are disabled and all the localfile entries were removed. The purpose of this approach is to reduce the socket usage and verify the tests behavior in this situation. Here are the results
The tests results suggest a more consistent result, but more testing is needed to be sure. |
03/08/21Continuing with the analysis made in #1530 (comment), I modified the Wazuh testing framework to print int the console the events received by the Man In The Middle from the https://github.com/wazuh/wazuh-qa/tree/1618-poc-mitm-monitor After this, I performed multiples executions of the tests in order to reproduce the issue and confirm the thinking that we have related to the QueueMonitor not analyzing the events as fast as possible to avoid the timeout. We didn't manage to reproduce the issue so, the next step will be to run the tests in multiples environments in paralell. |
04/08/21These are the results of the whole remoted tests folder but disabling all the modules including logcollector.
These results may suggest that the socket saturation is a possible cause, but still we can't be sure. UpdateThese tests were repeated 18 times, with the same configuration than before minus the authd daemon. We are not seeing an improvement in the overall results, we can discard the hypothesis of a socket saturation. In the other hand, some extra debugging messages were added to the QA framework to analyze the messages that enter the queue and the messages that are read by the monitor. The test
These results may indicate an issue in the message inyector and not in the QueueMonitor, because the queue is clearly not flooded. UpdateNew debug messages were added to check if the sender threads were really sending the messages. We can conclude that every time the test fails, both agent's threads send the corresponding messages. But they get lost in the way and they don't get to the socket. |
04/08/21I performed full executions of the remoted tests with some messages printed in order to have more information in case of failures. Although the issue in the
|
04-08-21
|
05/08/21Some debugging messages were added to remoted to verify the correct reception and sending of the test messages. It was possible to see in the failed test that remoted receives both messages, but fails while trying to send one of them to the socket. That is the reason why the QA framework is unable to receive the message. More tests are being carried out to determine if the ManInTheMiddle approach of the test is responsible for the socket communication error. DetailsThese are the messages generated by the test. The threads send the message, but only one of them gets in the queue
Were we have the messages printed by remoted related to the agents 11075 and 11076
The message for the agent 11075 is never sent to the queue. UpdateThe tests were run again, but some extra debugging messages were added in mq_op.c, os_net.c and secure.c ( It can be seen that the connection with the socket fails regularly, even in tests that pass. The messages from the agent 15556 are received and sent right away
But the agent 15555 needs a re-connection to queue to send it successfully
This behavior is found in every test, and this explains why some messages were found duplicated: it seems that the messages arrives regardless the errors
Failed test case analysisAnalyzing a failed case (ossec_queue_87.log,
But the message to queue is sent by remoted
ConclusionThe queue interception is unreliable: sometimes the manager fails to send a message but it arrives anyway, and sometimes it successfully sends a message but it never arrives to the socket. The best option is to change the test structure, and validate the message received by remoted with another method: the agent's status update, an alert generation, etc. |
A new issue was created with enhancement proposal #1690. The debugging process of the failed tests would be much faster if the required information were easily available. |
TESTS EXECUTION11-08-21
Note: Comments deleted was added bellow in a common table. 10-08-21
08-09-21All the remoted tests were run after merging the fix in #1692.
06-08-21All the remoted tests were run after the merging the fix in #1692.
|
Closed by #1717 |
Description
This issue is part of the #1516. After the research #1493, it has been detected fails and warnings in
remoted
tests for Wazuh manager. In order to sanitize current tests is necessary to fixremoted
tests.Test Execution - Results
Settings
Packages details
Environment
Setup Local_internal_opcion.conf
Development
This fix should be developed in a branch from 4.2, and it must be merged in the epic issue full-green branch
1516-4.2.0-full-green
In order to finish this issue the following tasks should be fulfilled:
remoted
integration tests. :1530-full-yellow-remoted
to1516-4.2.0-full-green
The text was updated successfully, but these errors were encountered: