-
Notifications
You must be signed in to change notification settings - Fork 296
CA-419908: Update xenstore watcher to refresh domains when VM is renamed #6773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA-419908: Update xenstore watcher to refresh domains when VM is renamed #6773
Conversation
Signed-off-by: Ming Lu <ming.lu@cloud.com>
ba900d4 to
ccbb428
Compare
|
The diff view of the first commit is not easy for reviewers to get the change clearly as it is only for moving some existing code snippets. |
last-genius
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this fixing a real-world issue? If so, could you describe when it happens, and if there are any symptoms of missed events?
The VM renaming usually happens for VM migrations during which, on the migration destination host, the domain of the VM is created with a temporary UUID like One case is the PV driver in VM writes feature-poweroff into xenstore after being migrated. The event for this xenstore writing may get lost due to the issue. This causes the migrated VM can't be cleanly shutdown due to lack of the feature in xapi. There may be other symptoms but this is the one we know surely so far. |
| look_for_different_domains () ; | ||
| Atomic.set need_refresh_domains false | ||
| ) else ( | ||
| if Atomic.exchange need_refresh_domains false then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about wrap the atomic operations in functions:
set_need_refresh_domains
clear_need_refresh_domains
consume_need_refresh_domains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are too simple that I don't think it is worth wrapping them within the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be simpler and correct, too?
if path = _introduceDomain || path = _releaseDomain || Atomic.get need_refresh_domains then (
look_for_different_domains () ;
Atomic.set need_refresh_domains false
) else ..There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to call watch_fired when it is neither _introduceDomain nor _introduceDomain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to call watch_fired in this rename case? I thought just updating the mapping would be enough.
Does look_for_different_domains detect uuid changes? Looking at domain_looks_different, it may not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to call watch_fired in this rename case?
The fix doesn't make it call watch_fired in the rename case. Instead, it just marks that the watcher should refresh the domain list later when an event is popped up.
Does look_for_different_domains detect uuid changes?
The look_for_different_domains calls list_domains in which Xenctrl.domain_getinfolist is called to get the current domain list with UUIDs. The UUIDs are just the ones written in xenstore by Toolstack. And the renaming is exactly to rewrite the UUIDs in xenstore.
I thought just updating the mapping would be enough.
The map could be updated directly in rename case. But I think it would result in updating from external side on the data structure which is generated from a single source Xenctrl.domain_getinfolist within the watcher. This seems a change more than enough unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't split exchange into get and set, that is not atomic anymore, and if some other thread sets the value inbetween you'll miss it.
E.g. while you are busy refreshing the domains another remain request could've come in, that the refresh loop already went past and won't notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be simpler and correct, too?
if path = _introduceDomain || path = _releaseDomain || Atomic.get need_refresh_domains then ( look_for_different_domains () ; Atomic.set need_refresh_domains false ) else ..
I think a variation of this might work. You have to do the Atomic.exchange first thing:
if Atomic.exchange need_refresh_domains false || path = _introduceDomain || path = _releaseDomain then
look_for_different_domains ();
if path = _introduceDomain || path = _releaseDomain then ()
elseAnd no atomic or look_for_different_domains calls on the else branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would suggest:
if Atomic.exchange need_refresh_domains false || path = _introduceDomain || path = _releaseDomain then
look_for_different_domains ();
if path <> _introduceDomain && path <> _releaseDomain then (
)|
Aside from what was described, this also might happen on the sending side where the UUID gets renamed to end in |
| module RRDD = Rrd_client.Client | ||
| module StringSet = Set.Make (String) | ||
|
|
||
| module IntMap = Map.Make (struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
module IntMap = Map.Make(Int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blindly copied in the moving-only commit. Will append another commit for this.
| look_for_different_domains () ; | ||
| Atomic.set need_refresh_domains false | ||
| ) else ( | ||
| if Atomic.exchange need_refresh_domains false then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be simpler and correct, too?
if path = _introduceDomain || path = _releaseDomain || Atomic.get need_refresh_domains then (
look_for_different_domains () ;
Atomic.set need_refresh_domains false
) else ..|
The rerun jobs passed with the fix. |
9c0b20e to
ab77d49
Compare
The xenstore watcher maintains a map from domid to VM UUID. This map is used to dispatch the xenstore events. When the VM is renamed, its UUID changes. Hence this map needs to refresh. Otherwise, the xenstore events could not be dispatched to renamed VM. Signed-off-by: Ming Lu <ming.lu@cloud.com>
Signed-off-by: Ming Lu <ming.lu@cloud.com>
ab77d49 to
1b7f149
Compare
(Two commits are included in this PR. The first one is only to move the position of existing code within the same file to allow the use of module Watcher in module VM in the second one.)
The xenstore watcher maintains a map from domid to VM UUID. This map is used to dispatch the xenstore events. When the VM is renamed, its UUID changes. Hence this map needs to refresh. Otherwise, the xenstore events could not be dispatched to renamed VM.