Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance for Vulnerability Detection module in clustered environments #5313

Closed
Tracked by #5300
Rebits opened this issue Apr 30, 2024 · 7 comments
Closed
Tracked by #5300
Assignees
Labels

Comments

@Rebits
Copy link
Member

Rebits commented Apr 30, 2024

Description

This issue is dedicated to conducting a thorough performance analysis of two proposed development approaches:

The objective is to perform performance tests and compare the results of both approaches. This comparative analysis will provide a comprehensive understanding of the potential impact on the product.

Test environment

Component Quantity Operating System CPU (cores) RAM (GB) Disk (GB)
Master 1 Ubuntu 22 4 8 50
Workers 2 Ubuntu 22 4 8 50
Agent 1 1 Ubuntu 22 2 4 30
Agent 2 1 Windows 11 2 4 30
Load Balancer 1 Ubuntu 22 4 8 50
Indexers 2 Ubuntu 22 2 4 30

Note

The load balancer is located on the master node.

23058 Development Packages

Architecture Framework development package URL URL
DEB 4.8.0-python.vd.spike.deb.1
RPM 4.8.0-python.vd.spike.rpm.1

22867 Development Packages

Architecture Core development package URL
DEB 4.8.0-0.commitd31b277
RPM 4.8.0-0.commitd31b277

Test Cases

Testing

Automatic

Methodology

Utilizing the CLUSTER-Workload_benchmarks_metrics pipeline to execute specified test cases automatically. Results will be manually analyzed and shared with the development team for validation adjustments.

Test Cases

Case Description Number of Agents EPS Frequency Number of Vulnerable Packages Time
Minimum Activity Simulate a small, stable environment with low activity 10 10 60 100 3h
Medium Activity Simulate a medium-sized environment with moderate activity 50 10 60 100 3h
High Activity Simulate a large-scale environment with significant activity 200 50 60 100 3h

Manual

Methodology

Customizing the set of vulnerable packages is not feasible in automatic testing. Therefore, manual testing will utilize a larger set of 10,000 vulnerabilities to identify any potential instability in environments with a high vulnerability count. The following Wazuh-QA tools will be employed for manual performance analysis:

  • Monitor class for resource measurement of Wazuh central components
  • Statistics class for Wazuh data analysis
  • Simulate agents script for Wazuh agent simulation

Test Cases

Case Description Number of Agents EPS Frequency Number of Vulnerable Packages Time
High Vulnerability Environment Simulate an intermediate-sized environment with high vulnerability 10 10 60 10,000 3h

Conclusion 🔴

New Issues

Known issues

Note

Manual performance testing, Minimum Activity and High Activity has not been performed. More information in #5313 (comment)

@Rebits Rebits self-assigned this Apr 30, 2024
@Rebits Rebits changed the title Performance tests and comparison of both developments Performance for Vulnerability Scab module in clustered environments May 6, 2024
@Rebits
Copy link
Member Author

Rebits commented May 6, 2024

@Rebits Rebits changed the title Performance for Vulnerability Scab module in clustered environments Performance for Vulnerability Detection module in clustered environments May 7, 2024
@Rebits
Copy link
Member Author

Rebits commented May 8, 2024

Minimum Activity and High activity performance tests fail due to no space left error. Reported in https://github.com/wazuh/wazuh-jenkins/issues/6475

22:03:52  
22:03:52  TASK [Copy ossec.log file to data files] ***************************************
22:03:52  fatal: [CLUSTER-Workload_benchmarks_metrics_B510_manager_2]: UNREACHABLE! => {
22:03:52      "changed": false,
22:03:52      "unreachable": true
22:03:52  }
22:03:52  
22:03:52  MSG:
22:03:52  
22:03:52  Warning: Permanently added '172.31.3.110' (ECDSA) to the list of known hosts.

22:03:52  mkdir: cannot create directory ‘/tmp/ansible-tmp-1715115832.7137516-30912-167679972105845’: No space left on device
22:03:52  
22:03:53  fatal: [CLUSTER-Workload_benchmarks_metrics_B510_manager_1]: UNREACHABLE! => {
22:03:53      "changed": false,
22:03:53      "unreachable": true
22:03:53  }
22:03:53  
22:03:53  MSG:
22:03:53  
22:03:53  Warning: Permanently added '172.31.4.31' (ECDSA) to the list of known hosts.

22:03:53  mkdir: cannot create directory ‘/tmp/ansible-tmp-1715115832.724964-30911-242038256013694’: No space left on device

Only Medium Activity performance tests finished successfully
Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/

@Rebits
Copy link
Member Author

Rebits commented May 8, 2024

Medium Activity 🔴

Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/511/
Report: Artifact.zip

Logs 🔴

Summary

Master 🟡

  • Master node is started before the correct indexer configuration is set. Expected:
2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.

Worker 1 🔴

  • Worker node is started before the correct indexer configuration is set. Expected
2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.
2024/05/07 21:24:24 wazuh-remoted: INFO: (1409): Authentication file changed. Updating.
2024/05/07 21:24:24 wazuh-remoted: INFO: (1410): Reading authentication keys file.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_osinfo
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_programs
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.
2024/05/07 21:24:48 wazuh-db: ERROR: DB(004) sqlite3_prepare_v2() : no such table: sys_programs
2024/05/07 21:24:48 wazuh-db: ERROR: (5214): Null statement on internal cache.

Worker 2 🟡

  • Worker node is started before the correct indexer configuration is set. Expected
2024/05/07 21:14:30 indexer-connector: WARNING: No username and password found in the keystore, using default values.
2024/05/07 21:14:30 indexer-connector: WARNING: IndexerConnector initialization failed for index 'wazuh-states-vulnerabilities', retrying until the connection is successful.
2024/05/07 21:16:52 indexer-connector: WARNING: Failed to sync agent '000' with the indexer.

Indexer 1 🟢

No warnings or errors

Indexer 2 🟢

No warnings or errors


Metrics 🔴

Summary

  • Low resource usage in the master node
  • Possible file descriptor leaks. Reported in Possible leak in the Vulnerability Detection register stress test wazuh#23202
  • Worker nodes are experiencing high CPU and memory usage due to an unrealistic level of activity, with an expected influx of 500 syscollector messages per second in a two-node cluster environment. As a result, it's unsurprising to observe these elevated values

Master 🟢

Metrics

CPU
Disk_Read
Disk_Read_Speed
Disk_Write_Speed
Disk_Written
FD
PSS
Read_Ops
RSS
SWAP
USS
VMS
Write_Ops

Worker 1 🔴

Metrics

CPU
Disk_Read
Disk_Read_Speed
Disk_Write_Speed
Disk_Written
FD
PSS
Read_Ops
RSS
SWAP
USS
VMS
Write_Ops

Worker 2 🔴

Metrics

CPU
Read_Ops
RSS
SWAP
USS
VMS
Write_Ops
Disk_Read
Disk_Read_Speed
Disk_Write_Speed
Disk_Written
FD
PSS

Indexer 1 🟢

No abnormal behavior detected

Metrics

CPU
Disk_Read
Disk_Read_Speed
Disk_Write_Speed
Disk_Written
FD
PSS
Read_Ops
RSS
SWAP
USS
VMS
Write_Ops

Indexer 2 🟢

No abnormal behavior detected

Metrics

CPU
Disk_Read
Disk_Read_Speed
Disk_Write_Speed
Disk_Written
FD
PSS
Read_Ops
RSS
SWAP
USS
VMS
Write_Ops


Statistics 🟢

Vulnerabilities State 🟢

The vulnerability generator module, utilized by the simulate agents script, is designed to transmit 100 vulnerable packages to the manager and subsequently confirm their removal. This behavior is visualized through sinuous graphics, reaching a peak with each repetition after processing all vulnerabilities.

In the plot, it's evident that the indexer connector fails to match the ideal expected graphics. However, it's apparent that the simulator is performing as intended.

total_vulnerabilities

Implementing various testing methods to determine if the final number of vulnerabilities aligns with expectations at specific points during the test could be highly beneficial.


Alerts 🟢

We anticipate that the alerts generated by both the workers and the manager should correspond with the indexed alert values. Nonetheless, there appears to be a discrepancy:

combined_and_new_total_alerts

Due to the high activity levels, some variance between the written alerts and indexed alerts is expected. However, it would be advantageous to incorporate testing methods to gradually mitigate this, thereby stabilizing the environment over time.


Evidence collection 🔴

It has been detected the following errors regarding the evidence-collection capabilities of the pipeline

@Rebits
Copy link
Member Author

Rebits commented May 8, 2024

Following a discussion with @juliamagan, we've made the decision not to replicate the unsuccessful High Activity and Low Activity performance tests. Instead, these tests will be re-launched in RC2

@MARCOSD4
Copy link
Member

MARCOSD4 commented May 8, 2024

GJ, but the graphs of the indexer 1 metrics cannot be displayed, perhaps because of an error in writing the comment.

@MARCOSD4
Copy link
Member

MARCOSD4 commented May 8, 2024

LGTM

1 similar comment
@juliamagan
Copy link
Member

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

3 participants