Performance for Vulnerability Detection module in clustered environments #5313

Rebits · 2024-04-30T09:47:01Z

Description

This issue is dedicated to conducting a thorough performance analysis of two proposed development approaches:

@wazuh/devel-framework: Create consolidated vulnerability state wazuh#23058
@wazuh/devel-core2 development: Duplicated vulnerabilities when changing agent manager wazuh#22867

The objective is to perform performance tests and compare the results of both approaches. This comparative analysis will provide a comprehensive understanding of the potential impact on the product.

Test environment

Component	Quantity	Operating System	CPU (cores)	RAM (GB)	Disk (GB)
Master	1	Ubuntu 22	4	8	50
Workers	2	Ubuntu 22	4	8	50
Agent 1	1	Ubuntu 22	2	4	30
Agent 2	1	Windows 11	2	4	30
Load Balancer	1	Ubuntu 22	4	8	50
Indexers	2	Ubuntu 22	2	4	30

Note

The load balancer is located on the master node.

23058 Development Packages

Architecture	Framework development package URL URL
DEB	4.8.0-python.vd.spike.deb.1
RPM	4.8.0-python.vd.spike.rpm.1

22867 Development Packages

Architecture	Core development package URL
DEB	4.8.0-0.commitd31b277
RPM	4.8.0-0.commitd31b277

Test Cases

Testing

Automatic

Methodology

Utilizing the CLUSTER-Workload_benchmarks_metrics pipeline to execute specified test cases automatically. Results will be manually analyzed and shared with the development team for validation adjustments.

Test Cases

Case	Description	Number of Agents	EPS	Frequency	Number of Vulnerable Packages	Time
Minimum Activity	Simulate a small, stable environment with low activity	10	10	60	100	3h
Medium Activity	Simulate a medium-sized environment with moderate activity	50	10	60	100	3h
High Activity	Simulate a large-scale environment with significant activity	200	50	60	100	3h

Manual

Methodology

Customizing the set of vulnerable packages is not feasible in automatic testing. Therefore, manual testing will utilize a larger set of 10,000 vulnerabilities to identify any potential instability in environments with a high vulnerability count. The following Wazuh-QA tools will be employed for manual performance analysis:

Monitor class for resource measurement of Wazuh central components
Statistics class for Wazuh data analysis
Simulate agents script for Wazuh agent simulation

Test Cases

Case	Description	Number of Agents	EPS	Frequency	Number of Vulnerable Packages	Time
High Vulnerability Environment	Simulate an intermediate-sized environment with high vulnerability	10	10	60	10,000	3h

Conclusion 🔴

New Issues

Known issues

https://github.com/wazuh/wazuh-jenkins/issues/6203

Note

Manual performance testing, Minimum Activity and High Activity has not been performed. More information in #5313 (comment)

Rebits · 2024-05-06T17:09:11Z

Automatic

Minimum Activity: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/510/
Medium Activity: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/
High Activity: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/512/

Rebits · 2024-05-08T08:40:45Z

Minimum Activity and High activity performance tests fail due to no space left error. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6475

22:03:52  
22:03:52  TASK [Copy ossec.log file to data files] ***************************************
22:03:52  fatal: [CLUSTER-Workload_benchmarks_metrics_B510_manager_2]: UNREACHABLE! => {
22:03:52      "changed": false,
22:03:52      "unreachable": true
22:03:52  }
22:03:52  
22:03:52  MSG:
22:03:52  
22:03:52  Warning: Permanently added '172.31.3.110' (ECDSA) to the list of known hosts.

22:03:52  mkdir: cannot create directory ‘/tmp/ansible-tmp-1715115832.7137516-30912-167679972105845’: No space left on device
22:03:52  
22:03:53  fatal: [CLUSTER-Workload_benchmarks_metrics_B510_manager_1]: UNREACHABLE! => {
22:03:53      "changed": false,
22:03:53      "unreachable": true
22:03:53  }
22:03:53  
22:03:53  MSG:
22:03:53  
22:03:53  Warning: Permanently added '172.31.4.31' (ECDSA) to the list of known hosts.

22:03:53  mkdir: cannot create directory ‘/tmp/ansible-tmp-1715115832.724964-30911-242038256013694’: No space left on device

Only Medium Activity performance tests finished successfully
Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/

Rebits · 2024-05-08T08:50:54Z

Medium Activity 🔴

Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/
Report: Artifact.zip

Logs 🔴

Summary

Worker logs indicate the same database error reported in Broken database during Vulnerability Detector tests wazuh#22847
No errors present in the master node
No errors present in the indexer nodes

Master 🟡

Master node is started before the correct indexer configuration is set. Expected:

2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.

Worker 1 🔴

Worker node is started before the correct indexer configuration is set. Expected

2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.

Multiple database errors reported in Broken database during Vulnerability Detector tests wazuh#22847

2024/05/07 21:24:24 wazuh-remoted: INFO: (1409): Authentication file changed. Updating.
2024/05/07 21:24:24 wazuh-remoted: INFO: (1410): Reading authentication keys file.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_osinfo
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_programs
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_programs
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.

Worker 2 🟡

Worker node is started before the correct indexer configuration is set. Expected

2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.

Indexer 1 🟢

No warnings or errors

Indexer 2 🟢

No warnings or errors

Metrics 🔴

Summary

Low resource usage in the master node
Possible file descriptor leaks. Reported in Possible leak in the Vulnerability Detection register stress test wazuh#23202
Worker nodes are experiencing high CPU and memory usage due to an unrealistic level of activity, with an expected influx of 500 syscollector messages per second in a two-node cluster environment. As a result, it's unsurprising to observe these elevated values

Master 🟢

Metrics

Worker 1 🔴

Metrics

Worker 2 🔴

Metrics

Indexer 1 🟢

No abnormal behavior detected

Metrics

Indexer 2 🟢

No abnormal behavior detected

Metrics

Statistics 🟢

Vulnerabilities State 🟢

The vulnerability generator module, utilized by the simulate agents script, is designed to transmit 100 vulnerable packages to the manager and subsequently confirm their removal. This behavior is visualized through sinuous graphics, reaching a peak with each repetition after processing all vulnerabilities.

In the plot, it's evident that the indexer connector fails to match the ideal expected graphics. However, it's apparent that the simulator is performing as intended.

Implementing various testing methods to determine if the final number of vulnerabilities aligns with expectations at specific points during the test could be highly beneficial.

Alerts 🟢

We anticipate that the alerts generated by both the workers and the manager should correspond with the indexed alert values. Nonetheless, there appears to be a discrepancy:

Due to the high activity levels, some variance between the written alerts and indexed alerts is expected. However, it would be advantageous to incorporate testing methods to gradually mitigate this, thereby stabilizing the environment over time.

Evidence collection 🔴

It has been detected the following errors regarding the evidence-collection capabilities of the pipeline

Vulnerabilities and alerts indexed metrics do not contain timestamps. Including the timestamp will make it easy to compare these values with the rest of the graphics. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6474
Indexer statistics were present in the logcollector directory. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6473
Statistics values for analysis are not correctly plotted. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6203

Rebits · 2024-05-08T11:20:41Z

Following a discussion with @juliamagan, we've made the decision not to replicate the unsuccessful High Activity and Low Activity performance tests. Instead, these tests will be re-launched in RC2

MARCOSD4 · 2024-05-08T12:39:49Z

GJ, but the graphs of the indexer 1 metrics cannot be displayed, perhaps because of an error in writing the comment.

MARCOSD4 · 2024-05-08T13:41:51Z

LGTM

juliamagan · 2024-05-09T09:03:30Z

LGTM

Rebits added level/task Task issue type/test labels Apr 30, 2024

Rebits self-assigned this Apr 30, 2024

Rebits mentioned this issue Apr 30, 2024

Vulnerability state testing for clustered environments #5300

Closed

4 tasks

juliamagan mentioned this issue May 6, 2024

Release 4.8.0 - RC 1 - Vulnerability detection tests wazuh/wazuh#23298

Closed

1 task

Rebits changed the title ~~Performance tests and comparison of both developments~~ Performance for Vulnerability Scab module in clustered environments May 6, 2024

Rebits changed the title ~~Performance for Vulnerability Scab module in clustered environments~~ Performance for Vulnerability Detection module in clustered environments May 7, 2024

juliamagan closed this as completed May 9, 2024

juliamagan mentioned this issue Jun 5, 2024

Packages not scanned and vulnerabilities not triggered in simulated agents wazuh/wazuh#23926

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance for Vulnerability Detection module in clustered environments #5313

Performance for Vulnerability Detection module in clustered environments #5313

Rebits commented Apr 30, 2024 •

edited

Rebits commented May 6, 2024 •

edited

Rebits commented May 8, 2024 •

edited

Rebits commented May 8, 2024 •

edited

Rebits commented May 8, 2024

MARCOSD4 commented May 8, 2024

MARCOSD4 commented May 8, 2024

juliamagan commented May 9, 2024

Performance for Vulnerability Detection module in clustered environments #5313

Performance for Vulnerability Detection module in clustered environments #5313

Comments

Rebits commented Apr 30, 2024 • edited

Description

Test environment

23058 Development Packages

22867 Development Packages

Test Cases

Testing

Automatic

Methodology

Test Cases

Manual

Methodology

Test Cases

Conclusion 🔴

New Issues

Known issues

Rebits commented May 6, 2024 • edited

Automatic

Rebits commented May 8, 2024 • edited

Rebits commented May 8, 2024 • edited

Medium Activity 🔴

Logs 🔴

Summary

Master 🟡

Worker 1 🔴

Worker 2 🟡

Indexer 1 🟢

Indexer 2 🟢

Metrics 🔴

Summary

Master 🟢

Worker 1 🔴

Worker 2 🔴

Indexer 1 🟢

Indexer 2 🟢

Statistics 🟢

Vulnerabilities State 🟢

Alerts 🟢

Evidence collection 🔴

Rebits commented May 8, 2024

MARCOSD4 commented May 8, 2024

MARCOSD4 commented May 8, 2024

juliamagan commented May 9, 2024

Rebits commented Apr 30, 2024 •

edited

Rebits commented May 6, 2024 •

edited

Rebits commented May 8, 2024 •

edited

Rebits commented May 8, 2024 •

edited