Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emergency shutdown race condition? #622

Closed
olivierlambert opened this issue Dec 29, 2015 · 3 comments
Closed

Emergency shutdown race condition? #622

olivierlambert opened this issue Dec 29, 2015 · 3 comments

Comments

@olivierlambert
Copy link
Member

From @pacohope on Disqus:

It got everything suspended, but the system didn't shutdown. (I think I didn't wait long enough. That will be obvious after you read the rest). So I logged in to XenServer as root via ssh. I did a vm-list and saw that all the VMs were suspended. I shutdown the XenServer and did my stuff.
When I started it back up, everything came up normally. Great. Within a few minutes of coming up, the entire system suspended all VMs and shutdown again. Somehow, in the XO VM, it does not know that the suspend-and-shutdown has happened. As soon as I start that VM, it suspends all the VMs and shuts down the dom0. So I'm going to delete that XO VM and install another XO VM. That's pretty easy and quick. I haven't lost any work. But this was REALLY annoying to troubleshoot. I'm not sure how XO records "I have been asked to suspend-and-shutdown". I suspect there is a race condition where my XO VM suspended before it recorded the fact that it had fulfilled my request to suspend-and-shutdown. I don't know how to repair the XO VM, so I won't. I'll just install a new copy.

@julien-f
Copy link
Member

Could you summarize the issue?

Current algorithm:

  1. iterate over all running VMs and suspend them
  2. disable the host
  3. shutdown the host

If I understand correctly I think it might be due to xo-lib automatically retrying a failed call on connection errors.
This is not the first time this behaviour has caused issues, I just pushed an update to disable it, hopefully it will fix more issues that it will create.

@olivierlambert
Copy link
Member Author

Yes, it's related to shutdown a host with XOA running on it. It means:

  • cutting the connection between xo-server and xo-web
  • when XOA is back online, the browser reconnect to xo-server and re-send the order
  • ... and it triggers this issue!

Your fix will solve this :)

@olivierlambert olivierlambert added this to the 4.12 milestone Dec 30, 2015
@olivierlambert
Copy link
Member Author

Fixed by vatesfr/xo-lib@2f29408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants