New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wazuh-agent crash randomly when RPCRT4.dll
is loaded.
#18591
Comments
Work in progressThe issue is under revision:
|
I found several systems of mine were no longer running the agent. After checking, I had the same crash and upon searching for it, found this page. |
This probably has nothing to do with the crash, but I found three items in the event viewer a few minutes before wazuh-agent crashed, referring to DFSR replication:
Also, it did not crash right away, it took at least a week. And it only crashed on 2 out of 8 IDENTICAL systems. |
Hi @Brain2000 ! Could you provide more details?
This issue is hard to reproduce indeed. I can also provide instructions for enabling the core dumps if you have direct access to those agents. A backtrace will allow us to determine if we are in the presence of the same issue or not. Regards. |
@pereyra-m yes, if you send me how to enable core dumps, and I'll add it to all the servers running the wazuh-agent. Now, it may take me a few weeks to get one to crash, so please be patient. But when it does, I'll have a core dump I can upload. The OS is Windows 2019 [10.0.17763.4737] I may have found something after looking at the ossec.log, I found that it is looking for cis_win2012r2.yml. Well, when I installed the wazuh-agent, the OS was Windows 2012R2. However, I upgraded it to Windows 2019 last month! Regardless of the crash, those yml files should ALL be installed and it should autodetect which one to use. Is it possible the wrong yml file could cause the crash? It seems doubtful since it was random and took several weeks. ossec.log:
ossec.conf:
|
Thank you @Brain2000 ! I don't see any problem in the shared logs.
They will be stored in
It doesn't seem probable, but the dump will tell us the exact point where the agent is crashing. Regards. |
Got it, will set this up today... will return when one crashes in a week or two |
@Brain2000 any news? |
@Dwordcito It may take a few weeks. |
@pereyra-m I have an update. I have a crash dump related to this issue! I tried to upload it here, but it didn't accept the extension. I have uploaded it here instead. |
Thank you @joseraeiro ! Let me upload the dump file here so it doesn't get deleted The analysis of the dump shows a similar trace to the ones we've seen around the Details
Can you confirm the agent version? |
Yes, it's the version 4.5.0. Thank you very much for all your help so far. This is a critical issue, I have a covering of barely 60% in a client's network. All the other agents are crashing. I'll try to provide more dumps whenever the client sends them my way. |
This is good news. Mine have not crashed again yet, but I'm checking daily. If/when they do, I'll double check that it's the same stack trace. |
Good morning @pereyra-m . After trying to analyze the dump provided by @joseraeiro I've assumed that, when the Wazuh agent tries to launch itself in a Windows environment there is a checksum mismatch causing an Access violation exception at the address "74b5823b" of the rpcrt4.dll library. We have already considered restarting the Wazuh agent service after crashing automatically but we were advised not to do it until we get further feedback from you. Do you have any updates on this or has your team ever dealt with a similar issue before? We really need to have this issue sorted out asap as we have many agents down for the last month and a half. |
Hello again everyone, I apologize for the late response. We've been analyzing this issue for a while now, but we still haven't found the exact cause. It wasn't possible to reproduce the problem in our testing environments.
Please post an update after applying the steps I suggested above. |
@pereyra-m I have some agents in my environment that are also crashing, faulting module RPCRT4.dll. This is occurring for us on ossec agent versions 4.5.0 and 4.5.3, but only effecting a small percentage of total agents (probably less than 10%). I just upgraded some to 4.5.3 today to try to resolve. I collected a dump from one. Crash Dump Exception Analysis
Edit: Also, it's crashing on 4.5.3 with the syscollector wodle disabled. OSSEC.log w/ syscollector disabled
|
Hello @EdwardsCP ! Thank you for all the information provided. Considering you are able to reproduce the issue, would it be possible to run some tests?
I'll be waiting for the results of the proposed tests. |
@pereyra-m, I've tried to reproduce the crash with the 4.5.3-2 installer you provided, but so far have not been successful. The server I'm using to test has most consistently had the service crash when it starts automatically after the system boots. It will usually start without crashing if I manually start the service at some other point in time. I've rebooted it twice with 4.5.3-2 installed, and the service started automatically and continued to run both times. |
@pereyra-m , |
Hello again @EdwardsCP ! Those tests seem really solid, thank you very much. In the meanwhile, do you have some extra details that could help us to reproduce the crash? Regards. |
I can't identify any reason for some of our hosts behaving this way and others not. In addition to seeing it on some Server 2022 hosts, I'm also seeing it on Windows 11 22H2. We have about 60 Win11 hosts that are about as close to identical as you can get in a production environment. They are identical make/model hardware, all imaged with the same OS/Drivers/Apps/etc deployment about 5-6 months ago, have identical AD Group Policies applied, same Endpoint Security software (with identical policies), and are managed by the same patch management system. On any given day, if I check for hosts that are online but disconnected from Wazuh (service crashed), we generally have about 2/3 of the total hosts online (40/60) and roughly 10% (4 or 5) of them are disconnected from Wazuh because the service crashed. |
Hi @EdwardsCP . I understand that those hosts are almost identical then, thank you. I've been working on some reliability improvements around the signature verification feature, and it'll be really helpful for us if you could try another test package. If the tests are successful, we'll be able to release an official patch as soon as possible. Please, install it on the same machine you used for the last test. Regards. |
@pereyra-m, no crashes for my test server on 4.5.3-3 after 2 reboots. On the first reboot, sca, syscheck, and rootcheck were all still disabled. On the second reboot, I reenabled those modules. |
Thank you for all your help @EdwardsCP !! You've greatly contributed to improve Wazuh! We'll be posting here any update. |
Thanks @EdwardsCP , merged in 4.7.0 |
@pereyra-m and/or @Dwordcito , |
When do you guys predict to release the version 4.7.0? |
Hi @EdwardsCP and @dfoux ! We are working to speed up the release process and it may be available next month. https://documentation.wazuh.com/current/deployment-options/wazuh-from-sources/wazuh-agent/index.html You could take the last stable branch and apply these changes, or simply compile the package without the signature verification feature until v4.7.0 is released.
Regards. |
Hello @pereyra-m ! I see that Wazuh 4.7.0 was released but see no mention of this bug having been fixed: https://documentation.wazuh.com/current/release-notes/release-4-7-0.html Could you please confirm that this issue is solved on this version? |
Hello @joseraeiro ! There is an entry under Resolved Issues named |
Collected information about the issue
First
Second
WER data mentioned in Second
Community User Thread
Details
Details
Issue related:
ntdll.dll
when restarting due to dll load. #18122The text was updated successfully, but these errors were encountered: