-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration can fail especially for loaded VMs #72
Comments
Did you found any clue? Background (what I can recall):
Steps we did:
I'll try to reproduce this situation on our "3 host different CPU" testpool as soon as I can. |
emu-manager throws an error: https://xcp-ng.org/forum/post/4961 |
could also reproduce with XCP-ng 7.6 :-| |
A patched xcp-emu-manager is now available in the |
The patched xcp-emu-manager has been made available as an update for XCP-ng 7.6. However it only mitigates the bug for now: instead of failing near the end of the migration, it can now wait for the VM to be less loaded so that it can resume processing the migration. It might still be possible that it fails after sometime if the VM's activity never decreases enough. We're still working on a proper fix for this situation. |
I can confirm that the issue still exist. All VM being used are not able to migrate and hangs at 100% for few days. I have to restart the toolstack then reboot the VM (which is still in the original host). Anything I can help such as which log keyword capture can provided me? since our server log is quite a lot of messages and some are not able to provide here. |
I didn't mean to close it yet. Testing still in progress. |
An update candidate is being tested by the community and available in the
|
Still no luck with the update candidate |
Can you be more specific? |
Test with XCP-ng 7.6 (fully patched) and -> VM is stuck and doesn't migrate :-/ ... waiting allmost 10 minutes Edit: |
What's your OS? it's working for me on a Debian VM that previously failed but works now with latest patches. That's frustrating 😞 @johnelse will introduce more debug in the next iteration, so we could be able to get the details for everyone, without any need to reproduce it internally. This way, we'll detect at least all the potential edge cases! |
@olivierlambert Ubuntu 16.04.4 LTS |
Okay thanks, I'll try on Ubuntu. |
is a reboot needed after applying emu-manager update ? Tried without a reboot an heavy loaded VM (elasticsearch data node on debian) and a migration resulted in an hanged VM, with no more connectivity nor console access. |
Reboot shouldn't be needed. edit: thanks for your feedback by the way, we are continuing to investigate |
Ok, for testing I replaced emu-manager by the XS one without reboot and it work directly so yes, no need for reboot seems required ^^' |
Yes, the way emu-manager is called changed so you need the version from 7.5 |
I`ve also tested it on our pool with XCP-ng 7.6 (fully patched) and xcp-emu-manager (version 0.0.7) from xcp-ng-updates_testing, VM (latest centos 7.5) executes stress --cpu 1 --vm 1 --io 1 vm is stucked at 100% and not responsive anymore positive: vm without load moves smoothly |
I have experienced this bug on Kubernetes host machines (Ubuntu 18.04 LTS) on XCP-ng 7.5 and 7.6. Migration causes the VM to use too much CPU until it becomes unresponsive. A VM with less load migrates smoothly. |
We should have a fix Monday or so if we finished to find the patch. |
@stormi , could you please add an updated package to the 7.5 testing repo? |
I'll see if the fixes can be backported safely. |
The update has been pushed to the |
cheers guys for your effort! merry xmas btw :) |
I did a little test with a loaded VM ( I have to do more test's, but it seems to not corrupt the loaded VM! 🎉 |
Hi, upgraded a two host pool from XS7.2 to XCG7.6 and have a number of VM's that won't migrate and require a toolstack restart to recover. Both hosts were yum updated. One particular VM with guest OS CentOS release 6.3 (Final) / 4gb mem, no memory ballooning is under zero load and still refuses to migrate, tools updated to 7.4 from XCP, VM rebooted and both toolstacks restarted. Daemon logs available here: https://www.proweb.net/xcp/daemon.log.source.2019-01-03_04-30-52.txt With XS7.6 emu-manager-1.0.5-1 installed, VM migrates without issue: https://www.proweb.net/xcp/daemon.source.xs-emu.log.2019-01-03_05-40-56.txt |
Thanks for the feedback, we'll take a look ASAP :) |
I had a look: our emu-manager exits before the migration can actually start. Logs from the source host:
The last two lines display the issue: xenguest loses contact with emu-manager, because the latter exited. The problem is, there's no information about the nature of the error. We need to find a way to reproduce and get more information from When you look back at your logs ( |
Hi @stormi All failed with the same issue: Jan 2 22:12:32 pw-im-xen-2 forkexecd: [error|pw-im-xen-2|0 ||forkexecd] 25928 (/usr/lib64/xen/bin/emu-manager -controloutfd 7 -controlinfd 8 -fd 9 -mode hvm...) exited with code 2 No exceptions in dmesg. |
@prowebuk thanks. Did they all fail right after these messages?
|
They hung, the second to last migrate (Jan 2 22:28:20) for 5 hours and until I restarted the toolstack |
I suppose the toolstack does not anticipate a crashing |
--
--
|
Thanks. The first one is interesting because it differs from the others and may be when it started breaking. I'd be interested in the full logs for that migration, if it is possible :) |
I have managed to reproduce the issue locally. My assumption is that it happens only for PV guests. |
Bug identified, and think we've got a fix! I managed to live migrate a PV VM that wouldn't migrate before. We'll clean-up the code and issue an update candidate. Thanks a lot for the feedback and the logs. |
@prowebuk I've pushed an update candidate, in case you are willing to test it. Install it with:
|
|
I think we can close this, what do you think @stormi ? |
The update for 7.5 is still awaiting tests from the community so I'll keep it open until we released it to everyone. |
Understood, let's wait a week or so then :) |
I just tested three migrations of not loaded VMs, during a 7.4 => 7.6 upgrade, got two VMs hanged (PVHVM) and one success (PV) |
Hi, This is not a EMU-manager issue, but another problem also existing in XenServer, see https://bugs.xenserver.org/browse/XSO-924 |
Closing now that the update for XCP-ng 7.5 has been pushed too. |
@stormi so we should close this one, right? |
Indeed, looks like I forgot to push the close button with my last comment :) |
Might be related to emu-manager.
See https://xcp-ng.org/forum/topic/522/unable-to-migrate-live-vms-after-upgrading-from-xcp-ng-7-4-to-7-5
The text was updated successfully, but these errors were encountered: