-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rootcheck database inconsistencies #23477
Comments
@GGP1 I've checked that all agents have logs like:
On the other hand, I see the tables populated. Manual checkAgent 000echo 'agent 000 sql select * from pm_event' | ~/wdb-query.py Output[
{
"id": 1,
"date_first": 1715937625,
"date_last": 1715937625,
"log": "Starting rootcheck scan."
},
{
"id": 2,
"date_first": 1715937626,
"date_last": 1715937626,
"log": "Trojaned version of file '/bin/diff' detected. Signature used: 'bash|^/bin/sh|file\\.h|proc\\.h|/dev/[^n]|^/bin/.*sh' (Generic)."
},
{
"id": 3,
"date_first": 1715937626,
"date_last": 1715937626,
"log": "Trojaned version of file '/usr/bin/diff' detected. Signature used: 'bash|^/bin/sh|file\\.h|proc\\.h|/dev/[^n]|^/bin/.*sh' (Generic)."
},
{
"id": 4,
"date_first": 1715937661,
"date_last": 1715937661,
"log": "Ending rootcheck scan."
}
] Agent 002On the other hand, there are agents that just have the "Starting/Ending rootcheck scan.", like this one. echo 'agent 000 sql select * from pm_event' | ~/wdb-query.py Output[
{
"id": 1,
"date_first": 1715938455,
"date_last": 1715938461,
"log": "Starting rootcheck scan."
},
{
"id": 2,
"date_first": 1715938469,
"date_last": 1715938469,
"log": "Ending rootcheck scan."
}
] wazuh-api -XGET https://localhost:55000/rootcheck/002 Output{
"data": {
"affected_items": [],
"total_affected_items": 0,
"total_failed_items": 0,
"failed_items": []
},
"message": "No rootcheck information was returned",
"error": 0
} However, the API still reports the last scan data: wazuh-api -XGET https://localhost:55000/rootcheck/002/last_scan Output{
"data": {
"affected_items": [
{
"start": "2024-05-17T09:34:21Z",
"end": "2024-05-17T09:34:29Z"
}
],
"total_affected_items": 1,
"total_failed_items": 0,
"failed_items": []
},
"message": "Last rootcheck scan of the agent was returned",
"error": 0
} Test implementationIf I'm correct, the Rootcheck integration test is accessing /var/ossec/queue/db/*.db manually: test_rootcheck.py This may provide wrong results or even corrupt the databases. ConclusionThis might be caused by:
I rule out load-related event dropping, as I've seen no logs about full buffer. However, I'm a bit confused as the test is accessing Next stepsPlease avoid accessing database files manually. We need a fix in the test. Until then, we should disregard this test. WorkaroundWait 1:30 minutes (30 second margin + 60 seconds for commit time). |
The tests were this behavior was observed are the API integration tests where we open a connection to the wdb socket (ref). I tried the workaround you suggested by adding 90 seconds to the agents' healtcheck script but the tests continue failing sporadically. Logs: logs.zip |
@GGP1 gotcha, sorry for the mistake. At first glance, it may seem that the scan data is missing from the database since the /rootcheck test fails but /rootcheck/last_scan works. However, after analyzing this case with Selu, we found that the expected values for the latter case are: end: !anything
start: !anystring This would validate a case where the Rootcheck scan has not finished: {
"data": {
"affected_items": [
{
"start": "2024-05-20T08:01:06Z",
"end": null
}
],
"total_affected_items": 1,
"total_failed_items": 0,
"failed_items": []
},
"message": "Last rootcheck scan of the agent was returned",
"error": 0
} ConclusionI believe it is very likely that the test failed simply because the scan has not finished. Suggestions
That will help determine whether the scan data is missing from the DB or the scan did not finish. Next stepsI have agreed with @Selutario to close this issue and leave it to your team to apply the suggested changes above. Best! |
Description
During the investigation in #23389, we found that in 4.9.0 our test cases are failing sporadically because the agents' rootcheck databases are empty even though their logs indicate that the scans have finished.
I tried sleeping for two minutes before starting the test to see if it was a race condition but the database remained empty after multiple scans. In some other executions, even without a sleep, the database did contain the scan results.
We haven't noticed this behavior in previous versions, so we should investigate if there is any issue while populating the database or with the scan itself not finding results.
Logs: rootcheck_logs.zip
The text was updated successfully, but these errors were encountered: