Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all agents get disconnected in test_shutdown_message #5199

Closed
juliamagan opened this issue Apr 10, 2024 · 5 comments · Fixed by #5298
Closed

Not all agents get disconnected in test_shutdown_message #5199

juliamagan opened this issue Apr 10, 2024 · 5 comments · Fixed by #5298
Assignees
Labels

Comments

@juliamagan
Copy link
Member

Description

During the system tests launched for 4.8.0 Beta 5 at wazuh/wazuh#22824, it has been found that not all agents go offline:

E       AssertionError: assert 33 == 40
E        +  where 33 = len(['Disconnected', 'Disconnected', 'Disconnected', 'Disconnected', 'Disconnected', 'Disconnected', ...])

This test has been modified recently, so we should check if the waiting time for the check is as expected, because if it is, even if no errors appear in the managers, it could indicate some kind of performance error. After all, after several executions, the error seems consistent.

@juliamagan
Copy link
Member Author

After talking to @TomasTurina, it was found that when the agent stops it sends HC_SHUTDOWN to the manager, which immediately shows the agent as Disconnected. However, reviewing the logs, it has been seen that the manager receives 50~52 shutdown messages when there are only 40 agents. We need to check if there are old messages or if some messages are being duplicated. Also, with thread.join() it waits for all the agents to be stopped, so all the agents should appear as Disconnected.

@juliamagan
Copy link
Member Author

By monitoring the logs and the agent statuses, we have been able to see that the test started when there were agents that were not yet Active, which could affect the results. The necessary logic has been added to avoid this, but it is being tested to see how much time is needed for all the agents to be active.

@juliamagan
Copy link
Member Author

On hold due to Beta 6 testing

@juliamagan
Copy link
Member Author

With the proposed solution, the test passes without problem when launched individually, but when all tests in the environment are launched it fails. We are checking if the environment is dirty from the previous tests, but these tests take 1:40h, which makes it very slow to debug.

@juliamagan
Copy link
Member Author

Finally, it was found that the environment was dirty and was not registered in the expected manager. It remains to upload the results of the complete test set to ensure that it does not fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant