Full Yellow Local - Fix Remoted tests in 4.2 #1530

juliamagan · 2021-07-01T08:17:03Z

Description

This issue is part of the #1516. After the research #1493, it has been detected fails and warnings in remoted tests for Wazuh manager. In order to sanitize current tests is necessary to fix remoted tests.

Test Execution - Results

Server	Local	Jenkins	Status
Test Execution 1	Results	Build	🔴
Test Execution 2	Results	Build	🔴
Test Execution 3	Results	Build	🔴

Settings

Packages details

Jenkins branch	QA branch
4.2	1516-4.2.0-full-green

Type	Format	Architecture	Version	Revision	Tag	File name
Server	rpm	x86_64	4.2.0	40207	v4.2.0-rc8	wazuh-manager-4.2.0-1.1493.x86_64.rpm

Environment

Provider	Box	OS	CPU	Memory
Vagrant	centos/8	CentOS 8	2	1024

Setup Local_internal_opcion.conf

remoted.debug=2
wazuh_database.interval=1
wazuh_db.commit_time=2
wazuh_db.commit_time_max=3
monitord.rotate_log=0

Development

This fix should be developed in a branch from 4.2, and it must be merged in the epic issue full-green branch 1516-4.2.0-full-green

Tests	Warnings
test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py	8
test_remoted/test_configuration/test_basic_configuration_connection_valid.py	8
test_remoted/test_configuration/test_basic_configuration_ipv6.py	2
test_remoted/test_configuration/test_basic_configuration_local_ip_valid.py	2
test_remoted/test_configuration/test_basic_configuration_queue_size_too_big.py	1
test_remoted/test_configuration/test_basic_configuration_queue_size_valid.py	3
test_remoted/test_configuration/test_basic_configuration_rids_closing_time_valid.py	8
test_remoted/test_configuration/test_basic_configuration_syslog_allowed_denied_ips_valid.py	5
test_remoted/test_configuration/test_basic_configuration_syslog_denied_ips.py	1

In order to finish this issue the following tasks should be fulfilled:

The text was updated successfully, but these errors were encountered:

damarisg · 2021-07-06T20:19:09Z

While the issues were being created, it was possible to see that the same errors message were showing in different tests.
But the method involved was different, so when starting to work, you should synchronize with those involved in the different issues that affect the same error message.

I add the list of error messages related to multiple issues.

Error messages	Issues
`Could not find the remoted key loading log`	#1572 - #1578 - #1576 - #1574

damarisg · 2021-07-06T20:19:48Z

General Research: Done. It allowed creating each fail contained on this issue.

DProvinciani · 2021-07-22T13:45:00Z

I did a full run of the remoted tests on a branch that combines the changes on the #1625 and those on the #1624.

The progress went much further than it was before but there were failures in a test that in fact had no failure cases in #1573. We are going to have to take a closer look at the tests test_remoted/test_configuration as there seem to be some flaky tests as well. For example:

test_basic_configuration_connection_invalid_protocol pass in my local environment but in issue Fix 'test_remoted/test_configuration' tests in 4.2 #1573 it is reported as failing.
test_basic_configuration_connection_valid pass in my local environment but in issue Fix 'test_remoted/test_configuration' tests in 4.2 #1573 it is reported as failing.
test_basic_configuration_ipv6 fails in my local environment but the issue Fix 'test_remoted/test_configuration' tests in 4.2 #1573 does not report any failures.

DProvinciani · 2021-07-22T22:05:01Z

Update including changes of #1624 and #1625

I performed full remoted tests executions in order to determine the global status of the tests of this component including the fixes in #1624 and #1625. I got some intermittent failures that should be analyzed.

Test Executions	Date	By	Status
test_remoted-r1.log	2021-07-22	Diego	🟡
test_remoted-r2.log	2021-07-22	Diego	`test_syslog_message.py::test_syslog_message` 🔴
test_remoted-r3.log	2021-07-22	Diego	`test_agents_switching_protocols.py::test_agents_switching_protocols`🔴

mdengra · 2021-07-27T14:12:55Z

2021-07-27

Used Wazuh-QA branch: 1570-test-socket-communication
Test results with the default settings in the ossec.conf:

Test Executions	Date	By	Status
test_remoted_local_r1.log	2021-07-27	Miguel	🟡
test_remoted_local_r2.log	2021-07-27	Miguel	🔴
test_remoted_local_r3.log	2021-07-27	Miguel	🟡

The following errors happened in the second running:

FAILED test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_56000]
FAILED test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP,UDP_56000]

The origin of both errors is the same:

                        logger.error(f"Results expected: {accum_results}")
>                   raise TimeoutError(error_message)
E                   TimeoutError: Could not find the expected event in queue socket

/usr/local/lib/python3.6/site-packages/wazuh_testing/tools/monitoring.py:456: TimeoutError

DProvinciani · 2021-07-28T06:21:12Z

2021-07-28 Update including changes of #1662

I performed full remoted tests executions in order to determine the global status of the tests of this component including the fixes in #1662. I got some intermittent failures that should be analyzed.

Test Executions	Date	By	Status
test_remoted-r1.log	2021-07-28	Diego	`test_agent_communication/test_multi_agent_protocols_communication.py` 🔴
test_remoted-r2.log	2021-07-28	Diego	🟡
test_remoted-r3.log	2021-07-28	Diego	🟡

damarisg · 2021-07-28T11:42:57Z

Update 28-07-21

After merging #1662 the fail test_agents_switching_protocols.py reported by Diego was fixed.

I show the results:

Test Executions	Date	By	Status
ResultsAgentCommunication.log	2021-07-28	Seyla	🟡
ResultsAgentCommunication2.log	2021-07-28	Seyla	🟡
ResultsAgentCommunication3.log	2021-07-28	Seyla	🟡

mdengra · 2021-07-28T12:10:27Z

2021-07-28

Used Wazuh-QA branch: 1530-full-yellow-remoted
Test results with the default settings in the ossec.conf:

Test Executions	Date	By	Status
test_remoted_local_r1.log	2021-07-28	Miguel	🟡
test_remoted_local_r2.log	2021-07-28	Miguel	🟡
test_remoted_local_r3.log	2021-07-28	Miguel	🟡

damarisg · 2021-07-28T16:57:42Z

Update 28-07-21

Test Results:

Test Executions	Date	By	Status
FullYellowRemoted.log	2021-07-28	Seyla	🔴 ( #1573)
FullYellowRemoted2.log	2021-07-28	Seyla	🟡
FullYellowRemoted3.log	2021-07-28	Seyla	🔴 (#1575)

Note:
Each test run alone works successfully and therefore is closed, but we can see that they failed in 2 random executions when I execute the Remoted cycle complete.

mdengra · 2021-07-30T09:30:10Z

2021-07-30

Used Wazuh-QA branch: 1571-fix-socket-error-in-ar-test
Test results with the default settings in the ossec.conf:

Test Executions	Date	By	Status
test_remoted_local_r1.log	2021-07-30	Miguel	🔴
test_remoted_local_r2.log	2021-07-30	Miguel	🔴
test_remoted_local_r3.log	2021-07-30	Miguel	🔴

The same test has failed in all rounds:

=========================== short test summary info ============================
FAILED test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py::test_invalid_protocol[Testing,UDP_secure_1514]
FAILED test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py::test_invalid_protocol[TCP,Testing_secure_1514]
FAILED test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py::test_invalid_protocol[TCP,UDP,Testing_secure_1514]
FAILED test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py::test_invalid_protocol[Testing,UDP_syslog_514]

Therefore I have run it standalone using the same branch:

Test Executions	Date	By	Status
test_basic_configuration_connection_invalid_protocol.py_local_r1.log	2021-07-30	Miguel	🔴
test_basic_configuration_connection_invalid_protocol.py_local_r2.log	2021-07-30	Miguel	🔴
test_basic_configuration_connection_invalid_protocol.py_local_r3.log	2021-07-30	Miguel	🔴

After confirming that errors also occur I ran it again using the 1530-full-yellow-remoted branch:

Test Executions	API log	Date	By	Status
test_basic_configuration_connection_invalid_protocol.py_local_r1.log	api_r1.log	2021-07-30	Miguel	🔴
test_basic_configuration_connection_invalid_protocol.py_local_r2.log	api_r2.log	2021-07-30	Miguel	🔴
test_basic_configuration_connection_invalid_protocol.py_local_r3.log	api_r3.log	2021-07-30	Miguel	🔴

The failure is repeated every time. Although this test appears in the issue #1573 (case 1), the error detected is different:

    def get_manager_configuration(section=None, field=None):
        """Get Wazuh manager configuration response from API using GET /manager/configuration
.
.
.   
        response = requests.get(api_query, headers=api_details['auth_headers'], verify=False)
    
>       assert response.json()['error'] == 0, f"Wazuh API response status different from 0: {response.json()}"
E       AssertionError: Wazuh API response status different from 0: {'title': 'Wazuh Internal Error', 'detail': 'Timeout executing API request', 'dapi_errors': {'node01': {'error': 'Timeout executing API request', 'logfile': 'WAZUH_HOME/logs/api.log'}}, 'error': 3021}

/usr/local/lib/python3.6/site-packages/wazuh_testing/api.py:128: AssertionError

Since the problem seems to be API related I attach these logs.

damarisg · 2021-07-30T17:29:46Z

Update 30-07-21

It is the last execution with all changes.

Test Executions	Date	By	Status
FullYellowRemoted.log	2021-07-30	Seyla	🟡
FullYellowRemoted2.log	2021-07-28	Seyla	🟡
FullYellowRemoted3.log	2021-07-28	Seyla	🟡

DProvinciani · 2021-07-30T20:21:19Z

2021-07-30

Used Wazuh-QA branch: 1530-full-yellow-remoted
Test results with the default settings in the ossec.conf:

Test Executions	Date	By	Status
test_remoted-full_r1.log	2021-07-30	Diego	🔴
test_remoted-full_r2.log	2021-07-30	Diego	🟡
test_remoted-full_r3.log	2021-07-30	Diego	🟡

The first round shows failures that should ve analyzed in the next tests:

FAILED wazuh-qa/tests/integration/test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_1514]
FAILED wazuh-qa/tests/integration/test_remoted/test_socket_communication/test_syslog_message.py::test_syslog_message[syslog_UDP_514-dummy_bad_formatted_pri_header]

juliamagan · 2021-08-02T11:10:22Z

I have launched three rounds of test_remoted/test_configuration/test_basic_configuration_connection_invalid_protocol.py and everything seems to work properly.

Tests results

	CentOS 8 Manager	Reports
R1	🟡	Test results
R2	🟡	Test results
R3	🟡	Test results

damarisg · 2021-08-02T12:26:25Z

02-08-21

Used Wazuh-QA branch: 1530-full-yellow-remoted

Test Executions	By	Status
FullRemoted.log	Seyla	🔴
FullRemoted2.log	Seyla	🔴
FullRemoted3.log	Seyla	🔴

pereyra-m · 2021-08-02T15:52:37Z

2021-08-02

Wazuh-QA branch: 1530-full-yellow-remoted

The test test_remoted/test_socket_communication/test_syslog_message.py was run isolated on a clean environment many times, but all the tests were successful

Test Executions	Date	By	Status
test_syslog_message_1.log	2021-08-02	Matias	🟡
test_syslog_message_2.log	2021-08-02	Matias	🟡
test_syslog_message_3.log	2021-08-02	Matias	🟡
test_syslog_message_4.log	2021-08-02	Matias	🟡
test_syslog_message_5.log	2021-08-02	Matias	🟡
test_syslog_message_6.log	2021-08-02	Matias	🟡
test_syslog_message_7.log	2021-08-02	Matias	🟡
test_syslog_message_8.log	2021-08-02	Matias	🟡
test_syslog_message_9.log	2021-08-02	Matias	🟡
test_syslog_message_10.log	2021-08-02	Matias	🟡

Update

These are the results of the tests with the whole test_remoted folder.

Test Executions	Date	By	Status
test_remoted_1.log	2021-08-02	Matias	🔴
test_remoted_2.log	2021-08-02	Matias	🔴
test_remoted_3.log	2021-08-02	Matias	🟡
test_remoted_4.log	2021-08-02	Matias	🟡
test_remoted_5.log	2021-08-02	Matias	🟡
test_remoted_6.log	2021-08-02	Matias	🟡
test_remoted_7.log	2021-08-02	Matias	🟡
test_remoted_8.log	2021-08-02	Matias	🔴
test_remoted_9.log	2021-08-02	Matias	🔴
test_remoted_10.log	2021-08-02	Matias	🟡

test_remoted_1 failed in test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_1514]
test_remoted_2 failed in test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP_1514]
test_remoted_3 failed in test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP_1514]
test_remoted_9 failed in test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP,UDP_56000]

Update

The remoted folder was tested again, but this time without the -x parameter, so the test won't stop if it fails

Test Executions	Date	By	Status
test_remoted_rerun_1.log	2021-08-02	Matias	🔴
test_remoted_rerun_2.log	2021-08-02	Matias	🟡
test_remoted_rerun_3.log	2021-08-02	Matias	🟡
test_remoted_rerun_4.log	2021-08-02	Matias	🟡
test_remoted_rerun_5.log	2021-08-02	Matias	🟡
test_remoted_rerun_6.log	2021-08-02	Matias	🟡
test_remoted_rerun_7.log	2021-08-02	Matias	🟡
test_remoted_rerun_8.log	2021-08-02	Matias	🟡
test_remoted_rerun_9.log	2021-08-02	Matias	🟡
test_remoted_rerun_10.log	2021-08-02	Matias	🟡

test_remoted_rerun_1.log failed in test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_1514]

Rebits · 2021-08-03T14:43:50Z

03-08-21

Used Wazuh-QA branch: 1530-full-yellow-remoted

Test Executions	Status
R1	🔴

DProvinciani · 2021-08-03T15:16:06Z

I spent some time analyzing the issues in test_multi_agent_protocols_communication.py reported in the next comments:

In all the executions in which we observe failures, they were caused because the Man In The Middle didn't find the event that it was looking for.

Having said that, I analyzed the MITM implementation and how it is used in these tests. Essentially for these tests the MITM will monitor the UDP socket queue/sockets/queue. The way in which it will monitor the queue will be:

Rename the queue to queue.original.
Read the queue socket from the MITM.

Considering the fact that analysisd also reads from that queue, the tests also stop the execution of that daemon durin the test execucion.

From this analysis we can make the next conclusions:

The MITM implementation doesn't work as a MITM in the UDP connections. It is just a change of the receiver.
If the queue receives many events, it could run into the same issue tan the FileMonitor in with it doesn't find the event before achieving the timeout.

We are analyzing the second item to see whether or not is the cause of the issue.

pereyra-m · 2021-08-03T15:45:19Z

03/08/21

The intermittent failures of test_multi_agents_protocols_communication may be related to a socket saturation, so the test is unable to read the required message.

Many different approaches are being carried out to read the queue socket (strace, kernel probes, etc), so we can probe the message is present but the test failed.
Also, even more modules are being removed from the default configuration, to reduce the messages in the socket.

Update

The whole test_remoted folder was run with this new_ossec.conf file. Basically, all the modules are disabled and all the localfile entries were removed. The purpose of this approach is to reduce the socket usage and verify the tests behavior in this situation. Here are the results

Test Executions	Date	By	Status
test_remoted_no_logcollector_1.log	2021-08-03	Matias	🟡
test_remoted_no_logcollector_2.log	2021-08-03	Matias	🟡
test_remoted_no_logcollector_3.log	2021-08-03	Matias	🟡
test_remoted_no_logcollector_4.log	2021-08-03	Matias	🟡
test_remoted_no_logcollector_5.log	2021-08-03	Matias	🟡
test_remoted_no_logcollector_6.log	2021-08-03	Matias	🟡
test_remoted_no_logcollector_7.log	2021-08-03	Matias	🟡
test_remoted_no_logcollector_8.log	2021-08-03	Matias	🟡
test_remoted_no_logcollector_9.log	2021-08-03	Matias	🟡
test_remoted_no_logcollector_10.log	2021-08-03	Matias	🟡

The tests results suggest a more consistent result, but more testing is needed to be sure.

DProvinciani · 2021-08-03T21:28:15Z

03/08/21

Continuing with the analysis made in #1530 (comment), I modified the Wazuh testing framework to print int the console the events received by the Man In The Middle from the queue socket. The changes are implemented in the next branch:

https://github.com/wazuh/wazuh-qa/tree/1618-poc-mitm-monitor

After this, I performed multiples executions of the tests in order to reproduce the issue and confirm the thinking that we have related to the QueueMonitor not analyzing the events as fast as possible to avoid the timeout. We didn't manage to reproduce the issue so, the next step will be to run the tests in multiples environments in paralell.

juliamagan · 2021-08-04T09:14:08Z

04-08-21

Used Wazuh-QA branch: 1530-full-yellow-remoted

Test Executions	Status
R1	🔴
R2	🔴
R3	🔴

pereyra-m · 2021-08-04T12:15:14Z

04/08/21

These are the results of the whole remoted tests folder but disabling all the modules including logcollector.
This is a re-run of the tests in this comment.

Test Executions	Date	By	Status
test_remoted_no_logcollector_rerun_1.log	2021-08-04	Matias	🟡
test_remoted_no_logcollector_rerun_2.log	2021-08-04	Matias	🟡
test_remoted_no_logcollector_rerun_3.log	2021-08-04	Matias	🟡
test_remoted_no_logcollector_rerun_4.log	2021-08-04	Matias	🟡
test_remoted_no_logcollector_rerun_5.log	2021-08-04	Matias	🔴
test_remoted_no_logcollector_rerun_6.log	2021-08-04	Matias	🟡
test_remoted_no_logcollector_rerun_7.log	2021-08-04	Matias	🟡
test_remoted_no_logcollector_rerun_8.log	2021-08-04	Matias	🟡
test_remoted_no_logcollector_rerun_9.log	2021-08-04	Matias	🟡
test_remoted_no_logcollector_rerun_10.log	2021-08-04	Matias	🟡

These results may suggest that the socket saturation is a possible cause, but still we can't be sure.

Update

These tests were repeated 18 times, with the same configuration than before minus the authd daemon.
It failed 2 times, here are the results:
test_remoted_no_logcollector-auth.zip

We are not seeing an improvement in the overall results, we can discard the hypothesis of a socket saturation.

In the other hand, some extra debugging messages were added to the QA framework to analyze the messages that enter the queue and the messages that are read by the monitor.

The test test_multi_agent_protocols_communication.py was run 252 times, and it failed 39 times.
Here we have the logs queue_test.zip. We can conclude that

Every time the test passes, both agents send the corresponding logs and both messages are extracted from the queue. Sometimes, one of them is repeated (this may be due to the sender, the agent simulator)
Every time the test fails, the missing message didn't enter the queue.

These results may indicate an issue in the message inyector and not in the QueueMonitor, because the queue is clearly not flooded.

Update

New debug messages were added to check if the sender threads were really sending the messages.
The test test_multi_agent_protocols_communication.py was run 41 times, it failed 6 times. Here are the results
multi_agent_debug.zip.

We can conclude that every time the test fails, both agent's threads send the corresponding messages. But they get lost in the way and they don't get to the socket.
These results discard a failure in the sending threads.

DProvinciani · 2021-08-04T16:55:29Z

04/08/21

I performed full executions of the remoted tests with some messages printed in order to have more information in case of failures. Although the issue in the test_multi_agent_protocols_communication.py was not reproduced, the executions showed issues in the test test_syslog_message.py that should be analyzed.

Test Executions	Date	By	Status
remoted-full-r1.log	2021-08-04	Diego	🔴
remoted-full-r2.log	2021-08-04	Diego	🔴
remoted-full-r3.log	2021-08-04	Diego	🔴
remoted-full-r4.log	2021-08-04	Diego	🔴

damarisg · 2021-08-04T20:13:54Z

04-08-21

Test Executions	Date	By	Status
FullRemoted.log	2021-08-04	Seyla	🔴
FullRemoted2.log	2021-08-04	Seyla	🔴
FullRemoted3.log	2021-08-04	Seyla	🟡

damarisg · 2021-08-05T12:43:04Z

Full Execution - JENKINS

Results Full Remoted

Test Executions	Date	By	Status
Results1	05-08-21	Seyla	🟡
Results2	05-08-21	Seyla	🟡
Results3	05-08-21	Seyla	🟡

pereyra-m · 2021-08-05T14:42:44Z

05/08/21

Some debugging messages were added to remoted to verify the correct reception and sending of the test messages.
Here are the results:

It was possible to see in the failed test that remoted receives both messages, but fails while trying to send one of them to the socket. That is the reason why the QA framework is unable to receive the message.

More tests are being carried out to determine if the ManInTheMiddle approach of the test is responsible for the socket communication error.

Details

These are the messages generated by the test. The threads send the message, but only one of them gets in the queue

wazuh-qa/tests/integration/test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[UDP_1514] 2021-08-05T13:49:28.1628171368
2021/08/05 13:49:28 wazuh-remoted[547749] debug_op.c:70 at _log(): DEBUG: Logging module auto-initialized
2021/08/05 13:49:28 wazuh-remoted[547749] main.c:130 at main(): DEBUG: Wazuh home directory: /var/ossec
2021/08/05 13:49:28 wazuh-remoted[547749] main.c:148 at main(): DEBUG: This is not a worker
2021-08-05 13:49:59,001 - wazuh_testing - ERROR - Could not find the expected event in queue socket
2021-08-05 13:49:59,002 - wazuh_testing - ERROR - Results accumulated: 0
2021-08-05 13:49:59,002 - wazuh_testing - ERROR - Results expected: 1
Agent simulator sending msg: b'!11075!#AES:\x18^\xe2\xdee\xc9S\x9f\xe2F\xafp\x1b\x9dw\x8c+\xdf@\xc9>\x10iv\xd5\xa4\xffR2~\x9f\x18Y\\\xe3\xd6\xf4\x9b\xad \r\xbd-\xdcD-\xc1\x07/\xd7\x0eg\xeb\x1fJ\xfd\x80\x95\xbe\xcf\x0f\xe2h\xb5\xe6\xf7\x92R\xad\xca.\x0c\x10\xb1!\x98\xb6\r\xba\xce\x02\n\x81\xcf\xa39\x91\xac\xf8P\xc8\x96\xb9\x93l\xe4\xbe*t\xcbU\x04\\6\xb4\xa3F\x06G\x88\xd2^\xc7\x96i\x89K\x81\xf1\xd1\xc1caS\xad1\xe8\x7f'
Agent simulator sending msg: b"!11076!#AES:\xbc\xa0\xd0\xb3^\xeaZl\x10el\xa1-\x058\xf3\xd0\x9b-\xc7\x83\x9b\\\xc2\xf2GC_\xf6\xbaB1\xb6\xd0x~`\xc9\xcc\xea\xcb\xf1\x04\x85\xa9{<\xfa\xa5\xdb\xa3\xbb\xed\x05{\xde\x16\xdf\xe3D\x10\xcf\x89H\xc2\xa4\xb8\x1f\xce\n\t\x8abBUo\x9d\r'9K\x91\xcdp{&\xa3\xc7\xd3\xa6\xe5\xc6\x03\xdf@\xa3\xfaW\xc2\x0ft,\x89\xae^\xb8\xda\xee\x8d0\x89\x93\x06\xd4(3\xdc-\xddT\xd7\xabI\xbe\xaf\xc2'\x96"
Inserting item in queue: b'1:[11076] (6-2948d7f5-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11076\x00'
Got msg from queue: b'1:[11076] (6-2948d7f5-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11076\x00'
FAILED

Were we have the messages printed by remoted related to the agents 11075 and 11076

2021/08/05 13:49:38 wazuh-remoted[547749] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11075) ([11075] (5-6400c8ef-debian7) any)
2021/08/05 13:49:38 wazuh-remoted[547749] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11076) ([11076] (6-2948d7f5-debian7) any)
2021/08/05 13:49:38 wazuh-remoted[547749] os_net.c:506 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[11076] (6-2948d7f5-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 11076) (113)

The message for the agent 11075 is never sent to the queue.

Update

The tests were run again, but some extra debugging messages were added in mq_op.c, os_net.c and secure.c (
modified files.zip
).
The ossec.log file was saved for every test.

It can be seen that the connection with the socket fails regularly, even in tests that pass.
Here is an example:
ossec_queue_2.log
queue_test_2.log

The messages from the agent 15556 are received and sent right away

2021/08/05 23:45:11 wazuh-remoted[791292] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15556) ([15556] (2-6c8107e9-debian7) any)

2021/08/05 23:45:11 wazuh-remoted[791292] os_net.c:508 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[15556] (2-6c8107e9-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15556) (113) 

2021/08/05 23:45:12 wazuh-remoted[791381] mq_op.c:47 at StartMQ(): DEBUG: Connected succesfully to 'queue/sockets/queue' after 0 attempts

But the agent 15555 needs a re-connection to queue to send it successfully

2021/08/05 23:45:11 wazuh-remoted[791292] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555) ([15555] (1-39ae3654-debian7) any)

2021/08/05 23:45:11 wazuh-remoted[791292] os_net.c:504 at OS_SendUnix(): DEBUG: Failed sending event to queue (OS_SOCKTERR): (1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555) (113) 

2021/08/05 23:45:11 wazuh-remoted[791292] mq_op.c:93 at SendMSG(): ERROR: socketerr (not available).

2021/08/05 23:45:11 wazuh-remoted[791292] secure.c:556 at HandleSecureMessage(): ERROR: (1210): Queue 'queue/sockets/queue' not accessible: 'Connection refused'

2021/08/05 23:45:11 wazuh-remoted[791292] mq_op.c:47 at StartMQ(): DEBUG: Connected succesfully to 'queue/sockets/queue' after 0 attempts

2021/08/05 23:45:11 wazuh-remoted[791292] secure.c:561 at HandleSecureMessage(): INFO: Successfully reconnected to 'queue/sockets/queue'

2021/08/05 23:45:11 wazuh-remoted[791292] os_net.c:508 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555) (113)

This behavior is found in every test, and this explains why some messages were found duplicated: it seems that the messages arrives regardless the errors

wazuh-qa/tests/integration/test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP_1514] 2021/08/05 23:45:01 wazuh-remoted[791292] debug_op.c:70 at _log(): DEBUG: Logging module auto-initialized
2021/08/05 23:45:01 wazuh-remoted[791292] main.c:130 at main(): DEBUG: Wazuh home directory: /var/ossec
2021/08/05 23:45:01 wazuh-remoted[791292] main.c:148 at main(): DEBUG: This is not a worker
Agent simulator sending msg: b'!15555!#AES:Z\x01\xaa\xda\xe1Y\x8bE\x0455\xa7y\x14\x87\xbeq\xdce\xffp\x9e\x7f\xc9\xcd\xb9<\x80\xb5\xbez\xc7\xcah\xf6\xcd\x04\xbc\xe6\xa8\x81\xf0\x9a\xec%Y\xfap\x9e\xf8\xee\x1ei\xbd\xfa L\x06Aqj;2l\x0ew*\xd4\x10\x1b/\xf4\xa5.\x8c6\xce\xdd"\x86@\x7fbZ]\x03\xac`\x9f\xd6\x08A\xf2j\x18\x9e\xbfS\xf8\x8f\x91\x9d\x07\\H\x06\xcf@\x8ak\t\xaa\xf17\xb1\xab\x86)]\xf13\xdd\xd8&\'|`\x15'
Inserting item in queue: b'1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555\x00'
Agent simulator sending msg: b'!15556!#AES:\xe7\xe0\xe2\t\xccN\xfdg\x81\x1e\xdd\xb4)|\x1eOn\x94_\x13\x1a\xde\xd8(hk\x14\xd0\xcd\xac#\xa2.>\xbai\x90\xef\xd3>\xa6\xb2\xe9l\x16\x88\x96\xabHP\xaa\xe0\x06\xaa&\x851\x99\x1d\x1e\xb90\xd8\xc0\xab\x1b\xd4\x1cs)\x11\x93\xd0\xea,\xccY\xac=\xe7n\xb6G\xafv\x90\xecM\xb5VO\x9e\x84\x19\x10#\x01\xbe\xe9>\xecx~\xbc\x9a\x1f\\\x18j\xae\xf6\xbda\\\x08\xe8\xda\xd7"$a\xa3=\x90\xa1\xe3\x00\x16'
Inserting item in queue: b'1:[15556] (2-6c8107e9-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15556\x00'
Got msg from queue: b'1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555\x00'
Got msg from queue: b'1:[15555] (1-39ae3654-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15555\x00'
Got msg from queue: b'1:[15556] (2-6c8107e9-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 15556\x00'
PASSED

Failed test case analysis

Analyzing a failed case (ossec_queue_87.log,
queue_test_87.log) we can see that the message from the agent 16583 is found but the one sent by 16584 is not

wazuh-qa/tests/integration/test_remoted/test_agent_communication/test_multi_agent_protocols_communication.py::test_multi_agents_protocols_communication[TCP,UDP_1514] 2021-08-06T01:21:58.1628212918
2021/08/06 01:21:58 wazuh-remoted[841391] debug_op.c:70 at _log(): DEBUG: Logging module auto-initialized
2021/08/06 01:21:58 wazuh-remoted[841391] main.c:130 at main(): DEBUG: Wazuh home directory: /var/ossec
2021/08/06 01:21:58 wazuh-remoted[841391] main.c:148 at main(): DEBUG: This is not a worker
2021-08-06 01:22:38,684 - wazuh_testing - ERROR - Could not find the expected event in queue socket
2021-08-06 01:22:38,685 - wazuh_testing - ERROR - Results accumulated: 0
2021-08-06 01:22:38,685 - wazuh_testing - ERROR - Results expected: 1
Agent simulator sending msg: b'!16583!#AES:\x84\xfc\xfa\xa9Y\xf4\xa5EC\x80\x90U0\x05\xcaA\xb3\xafO\x96\xcd\xb9\xcf\xef\xc2\x998x7\x8d\x88\x97=\xc1\x04\xd8\xb2\x91C&\xd2\x87\x86c\xabOBH\xc1\x05\x18\x9f\x80\xaeq\xa5 \x0e\xd1XN\xd0\xf2^F\x07\x8d\xc7\xf7B*\xfc\\K\xc2f\xe7F\x17\xd1I\x9a\x12m\x04\x8c<\xea\xfa\x1f\xb4G\x18\x0c\xdc\xb2\rTZ\xfaL\xd0\xc7I\xdb\x83\'\x19\x9ek\x82~G\x80"\xb8\xdb#v\x19\xf3\x16^x)v\xaa\xbe'
Agent simulator sending msg: b'!16584!#AES:X\xd4\xb3\x1cA`\xd3t\xeb+\xeb\x9a\xee=!\xc0\x84\xaf\xd2xj!\xcaL\xbb\xf7\xc9G=\x84\x17\x016\x8f,\xd4\xe2]~1\xf1^\x9e \xd9\xab.\xd0C-\x834aF\xfa\x8b\xf3\xe8\xc5\x0fgk>\xea\x96\x1a\xb2A-\xc1\xc2*\x03\xfa\xa4\xdc\xac\x90U@m\x1f\xd2\xb4\xf1\xec\xdd\x16I\xca\x86%\x17\x0b\xcd,\x01\xe7\xa7#S\xd6\t\xd7\x136\x01\xe8\xb7Y~i\xb4\xc3\xb3d:\xd7GO\x00\xad=\xda>\xc8\x9c\x0e'
Inserting item in queue: b'1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583\x00'
Got msg from queue: b'1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583\x00'
Got msg from queue: b'1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583\x00'
FAILED

But the message to queue is sent by remoted

2021/08/06 01:22:08 wazuh-remoted[841391] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583) ([16583] (9-27f5b5fb-debian7) any)
2021/08/06 01:22:08 wazuh-remoted[841391] os_net.c:504 at OS_SendUnix(): DEBUG: Failed sending event to queue (OS_SOCKTERR): (1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583) (113) 
2021/08/06 01:22:08 wazuh-remoted[841391] mq_op.c:93 at SendMSG(): ERROR: socketerr (not available).
2021/08/06 01:22:08 wazuh-remoted[841391] secure.c:547 at HandleSecureMessage(): DEBUG: Remoted received: (1:/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16584) ([16584] (10-5079b3c4-debian7) any)
2021/08/06 01:22:08 wazuh-remoted[841391] secure.c:556 at HandleSecureMessage(): ERROR: (1210): Queue 'queue/sockets/queue' not accessible: 'Connection refused'
2021/08/06 01:22:08 wazuh-remoted[841391] os_net.c:508 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[16584] (10-5079b3c4-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16584) (114) 
2021/08/06 01:22:08 wazuh-remoted[841391] mq_op.c:47 at StartMQ(): DEBUG: Connected succesfully to 'queue/sockets/queue' after 0 attempts
2021/08/06 01:22:08 wazuh-remoted[841391] secure.c:561 at HandleSecureMessage(): INFO: Successfully reconnected to 'queue/sockets/queue'
2021/08/06 01:22:08 wazuh-remoted[841391] os_net.c:508 at OS_SendUnix(): DEBUG: Sending event to queue: (1:[16583] (9-27f5b5fb-debian7) any->/test.log:Feb 23 17:18:20 manager sshd[40657]: test message from agent 16583) (113) 
2021/08/06 01:22:39 wazuh-remoted[841560] mq_op.c:47 at StartMQ(): DEBUG: Connected succesfully to 'queue/sockets/queue' after 0 attempts

Conclusion

The queue interception is unreliable: sometimes the manager fails to send a message but it arrives anyway, and sometimes it successfully sends a message but it never arrives to the socket.

The best option is to change the test structure, and validate the message received by remoted with another method: the agent's status update, an alert generation, etc.

pereyra-m · 2021-08-06T18:51:17Z

A new issue was created with enhancement proposal #1690.

The debugging process of the failed tests would be much faster if the required information were easily available.

damarisg · 2021-08-10T13:36:20Z

TESTS EXECUTION

11-08-21

Test Executions	Date	By	Status	Fails
ResultPedroRemoted1.2.log	11-08-21	Seyla	🟡
ResultPedroRemoted1.1.log	11-08-21	Seyla	🟡
ResultPedroRemoted.log	11-08-21	Seyla	🟡
ResultPedroRemoted2.log	11-08-21	Seyla	🔴	`test_agent_communication/test_agent_version_shared_configuration_startup_message`
ResultPedroRemoted3.log	11-08-21	Seyla	🔴	`test_agent_communication/test_request_agent_info.py` (related to #1698) `test_agent_communication/test_agent_version_shared_configuration_startup_message`
ResultPedroRemoted4.log	11-08-21	Seyla	🟡
FullRemoted.log	11-08-21	Seyla	🟢
Remoted2.log	11-08-21	Seyla	🔴	`test_socket_communication/test_syslog_message`
Remoted3.log	11-08-21	Seyla	🔴	`test_agent_communication/test_agent_version_shared_configuration_startup_message`

Note: Comments deleted was added bellow in a common table.

10-08-21

Test Executions	Date	By	Status
ResultFullRemoted.log	09-08-21	Seyla	🟡
ResultFullRemoted2.log	09-08-21	Seyla	🟡
ResultFullRemoted3.log	09-08-21	Seyla	🟡
ResultFullRemoted4.log	09-08-21	Diego	🟡
ResultFullRemoted5.log	09-08-21	Diego	🟡
ResultFullRemoted6.log	09-08-21	Diego	🟡
ResultFullRemoted7.log	09-08-21	Matias	🟡
ResultFullRemoted8.log	09-08-21	Matias	🟡
ResultFullRemoted9.log	09-08-21	Matias	🟡
ResultFullRemoted10.log	10-08-21	Seyla	🟡
ResultFullRemoted11.log	10-08-21	Seyla	🟡
ResultFullRemoted12.log	10-08-21	Seyla	🟡

08-09-21

All the remoted tests were run after merging the fix in #1692.
All of them passed.

Test Executions	Date	By	Status
test_remoted-r1.log	09-08-21	Diego	🟡
test_remoted-r2.log	09-08-21	Diego	🟡
test_remoted-r3.log	09-08-21	Diego	🟡
ResultFullRemoted.log	09-08-21	Seyla	🟡
ResultFullRemoted2.log	09-08-21	Seyla	🟡
ResultFullRemoted3.log	09-08-21	Seyla	🔴 `test_active_response_send_ar.py::test_active_response_ar_sending`
ResultFullRemoted4.log	09-08-21	Seyla	🔴 `test_multi_agent_status.py::test_protocols_communication` `test_basic_configuration_queue_size_valid.py::test_queue_size_valid`
ResultFullRemoted5.log	09-08-21	Seyla	🟡
ResultFullRemoted7.log	09-08-21	Seyla	🟡
ResultFullRemoted6.log	09-08-21	Seyla	🔴 `test_multi_agent_status.py::test_protocols_communication`
ResultFullRemoted8.log	09-08-21	Seyla	🔴 `test_active_response_send_ar.py::test_active_response_ar_sending`
ResultFullRemoted9.log	09-08-21	Seyla	🔴 `test_multi_agent_status.py::test_protocols_communication`

06-08-21

All the remoted tests were run after the merging the fix in #1692.
All of them passed.

Test Executions	Date	By	Status
test_full_remoted_after_fix_1.log	2021-08-06	Matias	🟡
test_full_remoted_after_fix_2.log	2021-08-06	Matias	🟡
test_full_remoted_after_fix_3.log	2021-08-06	Matias	🟡
test_full_remoted_after_fix_4.log	2021-08-06	Matias	🟡
test_full_remoted_after_fix_5.log	2021-08-06	Matias	🟡

snaow · 2021-08-17T08:07:45Z

Closed by #1717

juliamagan added feature/remoted team/qa labels Jul 1, 2021

juliamagan mentioned this issue Jul 1, 2021

Integration tests: Reliable full yellow local for 4.2.0 #1516

Closed

damarisg changed the title ~~Full-Green - Fix Remoted tests in 4.2~~ Full Yellow Local - Fix Remoted tests in 4.2 Jul 5, 2021

damarisg self-assigned this Jul 6, 2021

DProvinciani mentioned this issue Jul 20, 2021

Fix 'test_remoted/test_agent_communication/test_multi_agent_protocols_communication' tests in 4.2 #1618

Closed

4 tasks

damarisg assigned DProvinciani Jul 23, 2021

DProvinciani mentioned this issue Jul 27, 2021

Incrementing the time to wait for active status in order to handle the case of having a pending status #1662

Merged

1 task

pereyra-m self-assigned this Aug 2, 2021

DProvinciani mentioned this issue Aug 6, 2021

Replacing the MITM usage by the archives.log monitor in the test_multi_agent_protocols_communication tests #1692

Merged

2 tasks

This was referenced Aug 9, 2021

Full Yellow Jenkins - Fix Remoted Tests in 4.2 #1697

Closed

Disable inconsitent remoted tests #1699

Merged

damarisg mentioned this issue Aug 10, 2021

Remove warnings with unverified HTTPs requests #1703

Merged

2 tasks

wazuh deleted a comment from DProvinciani Aug 11, 2021

wazuh deleted a comment from pereyra-m Aug 11, 2021

snaow mentioned this issue Aug 17, 2021

Full Yellow Local - Fix Remoted tests in 4.2 #1717

Merged

snaow closed this as completed Aug 17, 2021

mdengra mentioned this issue Aug 19, 2021

Full Green Jenkins - Fix Remoted tests in 4.2 #1737

Closed

2 tasks

This was referenced Aug 19, 2021

Full Green Local - Fix Remoted tests in 4.2 #1740

Closed

Integration Tests: Reliable Full Green Local for 4.2.0 #1774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full Yellow Local - Fix Remoted tests in 4.2 #1530

Full Yellow Local - Fix Remoted tests in 4.2 #1530

juliamagan commented Jul 1, 2021 •

edited by damarisg

damarisg commented Jul 6, 2021

damarisg commented Jul 6, 2021

DProvinciani commented Jul 22, 2021

DProvinciani commented Jul 22, 2021

mdengra commented Jul 27, 2021

DProvinciani commented Jul 28, 2021

damarisg commented Jul 28, 2021 •

edited

mdengra commented Jul 28, 2021

damarisg commented Jul 28, 2021

mdengra commented Jul 30, 2021

damarisg commented Jul 30, 2021

DProvinciani commented Jul 30, 2021

juliamagan commented Aug 2, 2021 •

edited

damarisg commented Aug 2, 2021 •

edited

pereyra-m commented Aug 2, 2021 •

edited

Rebits commented Aug 3, 2021 •

edited

DProvinciani commented Aug 3, 2021

pereyra-m commented Aug 3, 2021 •

edited

DProvinciani commented Aug 3, 2021

juliamagan commented Aug 4, 2021 •

edited

pereyra-m commented Aug 4, 2021 •

edited

DProvinciani commented Aug 4, 2021

damarisg commented Aug 4, 2021 •

edited

damarisg commented Aug 5, 2021

pereyra-m commented Aug 5, 2021 •

edited

pereyra-m commented Aug 6, 2021 •

edited

damarisg commented Aug 10, 2021 •

edited

snaow commented Aug 17, 2021 •

edited

Full Yellow Local - Fix Remoted tests in 4.2 #1530

Full Yellow Local - Fix Remoted tests in 4.2 #1530

Comments

juliamagan commented Jul 1, 2021 • edited by damarisg

Description

Test Execution - Results

Settings

Packages details

Environment

Setup Local_internal_opcion.conf

Development

damarisg commented Jul 6, 2021

damarisg commented Jul 6, 2021

DProvinciani commented Jul 22, 2021

DProvinciani commented Jul 22, 2021

Update including changes of #1624 and #1625

mdengra commented Jul 27, 2021

2021-07-27

DProvinciani commented Jul 28, 2021

2021-07-28 Update including changes of #1662

damarisg commented Jul 28, 2021 • edited

Update 28-07-21

mdengra commented Jul 28, 2021

2021-07-28

damarisg commented Jul 28, 2021

Update 28-07-21

mdengra commented Jul 30, 2021

2021-07-30

damarisg commented Jul 30, 2021

Update 30-07-21

DProvinciani commented Jul 30, 2021

2021-07-30

juliamagan commented Aug 2, 2021 • edited

Tests results

damarisg commented Aug 2, 2021 • edited

02-08-21

pereyra-m commented Aug 2, 2021 • edited

2021-08-02

Update

Update

Rebits commented Aug 3, 2021 • edited

03-08-21

DProvinciani commented Aug 3, 2021

pereyra-m commented Aug 3, 2021 • edited

03/08/21

Update

DProvinciani commented Aug 3, 2021

03/08/21

juliamagan commented Aug 4, 2021 • edited

04-08-21

pereyra-m commented Aug 4, 2021 • edited

04/08/21

Update

Update

DProvinciani commented Aug 4, 2021

04/08/21

damarisg commented Aug 4, 2021 • edited

04-08-21

damarisg commented Aug 5, 2021

Full Execution - JENKINS

Results Full Remoted

pereyra-m commented Aug 5, 2021 • edited

05/08/21

Details

Update

Failed test case analysis

Conclusion

pereyra-m commented Aug 6, 2021 • edited

damarisg commented Aug 10, 2021 • edited

TESTS EXECUTION

11-08-21

10-08-21

08-09-21

06-08-21

snaow commented Aug 17, 2021 • edited

juliamagan commented Jul 1, 2021 •

edited by damarisg

damarisg commented Jul 28, 2021 •

edited

juliamagan commented Aug 2, 2021 •

edited

damarisg commented Aug 2, 2021 •

edited

pereyra-m commented Aug 2, 2021 •

edited

Rebits commented Aug 3, 2021 •

edited

pereyra-m commented Aug 3, 2021 •

edited

juliamagan commented Aug 4, 2021 •

edited

pereyra-m commented Aug 4, 2021 •

edited

damarisg commented Aug 4, 2021 •

edited

pereyra-m commented Aug 5, 2021 •

edited

pereyra-m commented Aug 6, 2021 •

edited

damarisg commented Aug 10, 2021 •

edited

snaow commented Aug 17, 2021 •

edited