-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XCP-ng 8.0 / CH 8.0 coalesce issues #298
Comments
- Fixes "army of zombies" and never ending coalesce - xcp-ng/xcp#298
A test package is available:
|
Hello, |
Please reboot, yes. Nothing else to do. |
Maybe not related with this issue but my problem is happening as before. This block looping:
|
The zombie process thing is solved by the patch. It might be possible there's still a problem on the last leaf. But not on further depth (so in short, it will work perfectly until reaching depth of 1, the final child can't be merge for some reasons we are investigating). This seems to happen only on LVM based storage, not file based. |
So I will wait your investigation. I'm ready to help if you need to test or any log. |
Oliver, how is XCP users' feedback on this problem in coalesce? We are evaluating migrating our hosts to XCP (with development and more customer proximity). As informed https://bugs.xenserver.org/browse/XSO-966, the backup process is being a difficult task for our customers due to the failure of the coalesce after the snapshot is created. Would 20 hosts from CH8 to XCP8, is there any recommendation after host update? xenserver-tools works properly or do we need to install the XCP agent itself? |
I upgraded XenServer to XCP-ng and didn't touch the HV-tools, it works perfectly. |
@danieldemoraisgurgel we have a patch on XCP-ng 8.0. Please open a support ticket if you want assistance on that. Sadly, Citrix won't make a patch on CH 8.0. |
We are updating one of our clusters (migrating from CH8 to XCP8). Next we will test the available update and see if we have any positive results with the Coalesce process. I believe that if we succeed, it will be the first step in migrating our Citrix framework to XCP and we will soon be closing a support contract! ;-) |
Patch should fix all zombie process, we checked that with our customers. There's still the final leaf that can't coalesce in all cases, but the impact is almost invisible. |
Update pushed to XCP-ng 8.0 |
Thank you Stormi. |
@stormi the update actually solves the problem of zombies process, but the coalesce, all disks after backup/snapshot (removal) still continue with a 1 disk stuck in the leaf chain. I am also testing the update made available in https://support.citrix.com/article/CTX265619 in an XS 7.1 pool. |
I've played with LIVE_LEAF_COALESCE_MAX_SIZE in /opt/xensource/sm/cleanup.py (in XS7.1) and it helps a bit. Is you want to play with this please be sure to stop coalescing gracefully first using /opt/xensource/sm/cleanup.py -a -u SR-UUID and then run it again xe sr-scan uuid=SR-UUID |
@danieldemoraisgurgel we know for the "last leaf". We are getting less strict in XO backup to do it despite there's the final leaf uncoalesced. In the mean time, we'll experiment with interesting @BogdanRudas suggestions. Thanks everyone! |
The perception I have is as follows in XCP8:
For CH7.1 with the XS71ECU2020 update, the coalesce process completed 100% by pausing the VMs. We will now re-back it up and see if the coalesce runs again 100%. I used standard times in LIVE_LEAF_COALESCE_TIMEOUT=10. |
The strange thing is, I had to turn off the VMs, rescan disk and then turn on again. The coalesce process began with the linked VMs (in production) and successfully completed. The following values have been changed at /opt/xensource/sm/cleanup.py : LIVE_LEAF_COALESCE_MAX_SIZE = 1024 * 1024 * 1024 # bytes Well, apparently everything ok... we will see in our next backup if it will be necessary to turn off the VMs for the coalesce to start and complete correctly. |
Okay please keep us posted 👍 Thanks for your report! |
After the informed change, the backup occurred with 100% success, on no disk in the coalesce chain. We're migrating another cluster to XCP-ng 8! |
So to recap:
Do you confirm you also had to change |
@olivierlambert I change this values in sm/cleanup.py :
The coalesce process successfully completed in all cases, without the need to shut down the servers. I can't tell you how much the value LIVE_LEAF_COALESCE_MAX_SIZE will affect the process, but since it was several months away with this problem, I don't think the default value wasn't enough for the size of bytes required for the process to complete. After patching and changes, I realize that the coalesce process is finally working properly and properly (our environment has more than 350 VMs and about 98TB). We are migrating to XCP-ng 8 :-) |
Nice! We'll see if we can raise those value "by default" in XCP-ng. |
Hello, I am on: Local storage here (server uses 2xSSD in mdadm raid1) I didn't install anything from testing repo so my versions are:
4 VM's on XCP-NG with disk sizes, 500GB, 80GB, 50GB and 20GB. I got report of failed backup due to SR error with not enough space. Checked the chains with:
And saw chains only on largest disk, 500GB, which are not being removed. I have reconfigured with following solution:
And rerun Scan and all chains deleted so now it looks like this:
I guess I do not need sm and sm-rawhba from testing repo? |
Thanks @danieldemoraisgurgel I tried to fix this since weeks 🥇. |
New logic for leaf coalesce has been backported from upstream into XCP-ng 8.1 beta: see https://xcp-ng.org/forum/post/22794 Feedback highly welcome! |
Hi, could I have some issues on XenServer 7.1 if I just change bellow two values and re-run coalescence? It works on XCP-NG 8.0 without patches from 8.1 so could it be that it will work on 7.1? I am on production with over 340 VM's. LIVE_LEAF_COALESCE_MAX_SIZE = 1024 * 1024 * 1024 We do not have this patches on 7.1 as we are not using payed LTS support: |
You might try, but we obviously can't tell you more about it, because we have no experience on the result on 7.1. I strongly suggest that you consider to migrate to XCP-ng if you can at some point then 👍 |
Tnx 👍 |
If I understood correctly, XCP-ng 8.0 inherited a regression from Citrix Hypervisor 8.0 regarding the coalesce process.
Try to backport the fixes (one to fix the army of zombies, another to fix never-ending coalesce) from the upstream
sm
repository to fix them in XCP-ng.More about the issues: https://bugs.xenserver.org/browse/XSO-966
The text was updated successfully, but these errors were encountered: