Investigate a performance issue in `test_cluster_performance` #23331

nico-stefani · 2024-05-07T14:31:12Z

Description

During #23268 we detect a exceed in the thresholds of the test_cluster_performance

test_cluster_performance.zip

We need to investigate the root cause of this problem before continuing to the next RC.

Checks

The following elements have been updated or reviewed (should also be checked if no modification is required):

Tests (unit tests, API integration tests).
Changelog.
Documentation.
Integration test mapping (using api/test/integration/mapping/_test_mapping.py).

The text was updated successfully, but these errors were encountered:

nico-stefani · 2024-05-08T17:00:02Z

Update

Analyzing the artifacts we detect performance degradation in wazuh-db after a cluster restart.

We assume authd couldn't empty all the client.keys after the restart. So when an agent DELETE was invoked the agent-sync tasks after the event were significantly bigger.

For example:

agent-sync before delete

2024/02/22 19:09:42 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B448_manager_25] [Agent-info sync] 32/32 chunks updated in wazuh-db in 0.085s.

DELETE /agent

2024/02/22 19:11:46 INFO: wazuh 172.31.59.143 "DELETE /agents" with parameters {"agents_list": "all", "status": "all", "older_than": "0s"} and body {} done in 6.961s: 200

agent-sync after delete

2024/05/03 19:31:38 DEBUG: [Worker CLUSTER-Workload_benchmarks_metrics_B504_manager_2] [Agent-info sync] 24/24 chunks updated in wazuh-db in 2.258s.

#23347 was open to address the issue.

fdalmaup · 2024-05-09T07:49:15Z

Review

The analyzed logs show that due to some internal behavior of wazuh-db, the Agent-info sync only took less than a second before making a large modification, such as the deletion of the 50k agents from the test, and after this began to take longer than expected, causing an increase in the duration of the mentioned task and a degradation in the cluster's performance.

LGTM

nico-stefani added the team/framework label May 7, 2024

Selutario added type/bug Something isn't working level/task and removed team/framework labels May 7, 2024

nico-stefani mentioned this issue May 7, 2024

Release 4.8.0 - Release Candidate 1 - Workload benchmarks metrics #23268

Closed

2 tasks

nico-stefani self-assigned this May 8, 2024

javiersanchz mentioned this issue May 8, 2024

Investigate fail test in /overview/agents/ endpoint #23319

Closed

4 tasks

juliamagan mentioned this issue May 8, 2024

Release 4.8.0 - RC 1 #23246

Closed

Selutario mentioned this issue May 8, 2024

Slower wazuh-db in 4.8.0 #23347

Closed

Selutario closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate a performance issue in `test_cluster_performance` #23331

Investigate a performance issue in `test_cluster_performance` #23331

nico-stefani commented May 7, 2024 •

edited by Selutario

nico-stefani commented May 8, 2024 •

edited

fdalmaup commented May 9, 2024

Investigate a performance issue in test_cluster_performance #23331

Investigate a performance issue in test_cluster_performance #23331

Comments

nico-stefani commented May 7, 2024 • edited by Selutario

Description

Checks

nico-stefani commented May 8, 2024 • edited

Update

fdalmaup commented May 9, 2024

Review

Investigate a performance issue in `test_cluster_performance` #23331

Investigate a performance issue in `test_cluster_performance` #23331

nico-stefani commented May 7, 2024 •

edited by Selutario

nico-stefani commented May 8, 2024 •

edited