Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate a performance issue in test_cluster_performance #23331

Closed
4 tasks done
nico-stefani opened this issue May 7, 2024 · 2 comments
Closed
4 tasks done

Investigate a performance issue in test_cluster_performance #23331

nico-stefani opened this issue May 7, 2024 · 2 comments
Assignees
Labels
level/task type/bug Something isn't working

Comments

@nico-stefani
Copy link
Member

nico-stefani commented May 7, 2024

Description

During #23268 we detect a exceed in the thresholds of the test_cluster_performance

test_cluster_performance.zip

image

We need to investigate the root cause of this problem before continuing to the next RC.

Checks

The following elements have been updated or reviewed (should also be checked if no modification is required):

  • Tests (unit tests, API integration tests).
  • Changelog.
  • Documentation.
  • Integration test mapping (using api/test/integration/mapping/_test_mapping.py).
@nico-stefani
Copy link
Member Author

nico-stefani commented May 8, 2024

Update

Analyzing the artifacts we detect performance degradation in wazuh-db after a cluster restart.

We assume authd couldn't empty all the client.keys after the restart. So when an agent DELETE was invoked the agent-sync tasks after the event were significantly bigger.

For example:

  • agent-sync before delete

2024/02/22 19:09:42 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B448_manager_25] [Agent-info sync] 32/32 chunks updated in wazuh-db in 0.085s.

  • DELETE /agent

2024/02/22 19:11:46 INFO: wazuh 172.31.59.143 "DELETE /agents" with parameters {"agents_list": "all", "status": "all", "older_than": "0s"} and body {} done in 6.961s: 200

  • agent-sync after delete

2024/05/03 19:31:38 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B504_manager_2] [Agent-info sync] 24/24 chunks updated in wazuh-db in 2.258s.

#23347 was open to address the issue.

@fdalmaup
Copy link
Member

fdalmaup commented May 9, 2024

Review

The analyzed logs show that due to some internal behavior of wazuh-db, the Agent-info sync only took less than a second before making a large modification, such as the deletion of the 50k agents from the test, and after this began to take longer than expected, causing an increase in the duration of the mentioned task and a degradation in the cluster's performance.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task type/bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

3 participants