Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syscollector does not check if there's a scan in progress before starting a new one. #22440

Closed
2 tasks done
MiguelazoDS opened this issue Mar 11, 2024 · 4 comments · Fixed by #23145 or wazuh/wazuh-documentation#7247
Closed
2 tasks done
Assignees
Labels
level/task type/enhancement New feature or request

Comments

@MiguelazoDS
Copy link
Member

MiguelazoDS commented Mar 11, 2024

Description

As part of the exploratory for vulnerability detector carried out here #22355 (comment) we found that syscollector interval option affects the reliability of vulnerability detector scan, since syscollector does not skip a scan if there's already one in progress. This situation is easily replicated with a low interval time (1m).

With a 30s scan interval for the manager we can see many duplicate sync messages.
image

For 1h interval the result is as expected
image

DoD

  • Syscollector shouldn't start a scan if there's one in progress. Propose a fix
  • Add UTs or QA tests to validate the solution.
@MiguelazoDS MiguelazoDS added level/task type/enhancement New feature or request labels Mar 11, 2024
@MiguelazoDS MiguelazoDS changed the title Syscollector does not check the if there's a scan in progress before starting another one. Syscollector does not check if there's a scan in progress before starting a new one. Mar 11, 2024
@MiguelazoDS
Copy link
Member Author

MiguelazoDS commented Apr 19, 2024

Analysis

There are also some extra seconds spent between scans due to the time it takes to perform the scan for all providers.

For an interval of 80 seconds, the timestamp where the scan starts is not deterministic. This can be desired since it will always wait for the interval time plus the time it takes to perform the scan. For small times this ensures the execution won't interfere with any scan in execution.

2024/04/19 17:11:21 wazuh-modulesd:syscollector[22023] logging_helper.c:31 at taggedLogFunction(): INFO: Starting evaluation.
2024/04/19 17:12:43 wazuh-modulesd:syscollector[22023] logging_helper.c:31 at taggedLogFunction(): INFO: Starting evaluation.
2024/04/19 17:14:09 wazuh-modulesd:syscollector[22023] logging_helper.c:31 at taggedLogFunction(): INFO: Starting evaluation.
2024/04/19 17:15:33 wazuh-modulesd:syscollector[22023] logging_helper.c:31 at taggedLogFunction(): INFO: Starting evaluation.

The issue comes during the synchronization because the function ends immediately, but the sync continues.
Here we can see that the method returns at 11:43:30 but the synchronization is still in progress (syscollector_packages at 11:43:31)

2024/04/22 11:43:29 wazuh-modulesd:syscollector[6093] logging_helper.c:37 at taggedLogFunction(): DEBUG: Starting syscollector sync
2024/04/22 11:43:30 wazuh-modulesd:syscollector[6093] logging_helper.c:37 at taggedLogFunction(): DEBUG: Ending syscollector sync
2024/04/22 11:43:31 wazuh-modulesd:syscollector[6093] logging_helper.c:40 at taggedLogFunction(): DEBUG: Sync sent: {"component":"syscollector_packages","data":{"attributes":{"architecture":"noarch","checksum":"3f7d0b06562fe41a2a2cc416fab8651b5e40076a","description":"OpenBLAS architecture macros.","format":"rpm","groups":"Unspecified","install_time":"1698874956","item_id":"03952d34db4cd691229b8855b62216335fecd475","location":" ","name":"openblas-srpm-macros","priority":" ","scan_time":"2024/04/22 14:43:31","size":104,"source":" ","vendor":"CentOS","version":"2-11.el9"},"index":"03952d34db4cd691229b8855b62216335fecd475","timestamp":""},"type":"state"}
2024/04/22 11:43:31 wazuh-modulesd:syscollector[6093] logging_helper.c:40 at taggedLogFunction(): DEBUG: Sync sent: {"component":"syscollector_packages","data":{"attributes":{"architecture":"noarch","checksum":"3d8c5a95fbba203af67e2a46fb62e89ee1c4b060","description":"File::Which is a portable implementation (in Perl) of 'which', and can\nbe used to get the absolute filename of an executable program\ninstalled somewhere in your PATH, or just check for its existence.","format":"rpm","groups":"Unspecified","install_time":"1698874954","item_id":"051bc2b2c6fd97bd7e29b4fa971a9a97b128f25e","location":" ","name":"perl-File-Which","priority":" ","scan_time":"2024/04/22 14:43:31","size":30250,"source":" ","vendor":"CentOS","version":"1.23-10.el9"},"index":"051bc2b2c6fd97bd7e29b4fa971a9a97b128f25e","timestamp":""},"type":"state"}

So the synchronization may be in progress when a new sync starts. For 80 seconds there's enough time for the sync to finish, but we see that for 30s the issue is reproducible.

2024/04/22 11:44:55 wazuh-modulesd:syscollector[6093] logging_helper.c:37 at taggedLogFunction(): DEBUG: Starting syscollector sync

We need to achieve a behavior like the one we have with deltas

2024/04/22 11:49:40 wazuh-modulesd:syscollector[6093] logging_helper.c:40 at taggedLogFunction(): DEBUG: Starting hardware scan
2024/04/22 11:49:40 wazuh-modulesd:syscollector[6093] logging_helper.c:40 at taggedLogFunction(): DEBUG: Delta sent: {"data":{"board_serial":"0","checksum":"c4c0e39d41621edd82e5ee6b890d6ce26cd509b7","cpu_cores":10,"cpu_mhz":2592.0,"cpu_name":"Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz","ram_free":3101700,"ram_total":3745708,"ram_usage":18,"scan_time":"2024/04/22 14:49:40"},"operation":"MODIFIED","type":"dbsync_hwinfo"}
2024/04/22 11:49:40 wazuh-modulesd:syscollector[6093] logging_helper.c:40 at taggedLogFunction(): DEBUG: Ending hardware scan

@sebasfalcone
Copy link
Member

Moved ETA due to high complexity associated with the solution

@MiguelazoDS
Copy link
Member Author

Update

Thanks to @pereyra-m for mentioning this #19385, although I was not able to understand how that fixes the issue.
I set a dummy log in the synchronizationController.hpp#checkId() implementation

image

And I don't see when the values satisfy the condition to throw the exception.

2024/04/24 18:10:23 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993022, old value: 1713993022
2024/04/24 18:10:23 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993022, old value: 1713993022
2024/04/24 18:10:23 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993022, old value: 1713993022
2024/04/24 18:10:23 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993022, old value: 1713993022
2024/04/24 18:10:23 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993022, old value: 1713993022
2024/04/24 18:10:23 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993022, old value: 1713993022
2024/04/24 18:10:23 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993022, old value: 1713993022
2024/04/24 18:10:23 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993022, old value: 1713993022
2024/04/24 18:10:58 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993057, old value: 1713993057
2024/04/24 18:10:58 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993057, old value: 1713993057
2024/04/24 18:10:58 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993057, old value: 1713993057
2024/04/24 18:10:58 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993057, old value: 1713993057
2024/04/24 18:10:58 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993057, old value: 1713993057
2024/04/24 18:10:58 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993057, old value: 1713993057
2024/04/24 18:10:58 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993057, old value: 1713993057
2024/04/24 18:10:59 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993057, old value: 1713993057
2024/04/24 18:10:59 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993057, old value: 1713993057
2024/04/24 18:11:32 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:32 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:32 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:32 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:32 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:33 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:33 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:33 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:33 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:33 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091
2024/04/24 18:11:33 rsync[23169] synchronizationController.hpp:65 at checkId(): INFO: new value: 1713993091, old value: 1713993091

The value will be the same since the ID is a timestamp set every time the syscollector starts a new scan

checksumCtx.rightCtx.id = std::time(nullptr);

@MiguelazoDS
Copy link
Member Author

MiguelazoDS commented Apr 26, 2024

Update 04/2024

A solution was added for the first scan when the first message pushed is "no_data" type. There's an issue when the interval time is small, and the first operation is a "checksum_fail". There's no way to differentiate if the next messages are from the first scan or the ones coming after.
Also using an old agent, due to the new message not being handled "full_data", an exception will be thrown every time the message arrives to the agent "Invalid message operation."

Update 05/2024

The solution was implemented. The evaluation of checksum_fail messages as well as the new full_data messages from the manager were discarded as a solution for this issue since they were not necessary.

The final implementation considers the state messages sent from the agent to the manager to evaluate when the synchronization has finished, avoiding a new scan if there is a synchronization in progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task type/enhancement New feature or request
Projects
Status: Done
3 participants