Skip to content

Conversation

minishrink
Copy link
Contributor

@minishrink minishrink commented May 21, 2018

This PR cleans up some of the breakages exposed by the recent bonding enabling (CP-26912).

Changes made to:

startup sequence
in order:

  • create networking objects
  • create cluster_host object (if necessary)
  • wait for clustering IP
  • resync cluster_host

cluster_host operations

  • add joined field to represent cluster membership of host
  • block all but Cluster_host.destroy (leave) if not joined
  • separate Cluster_host.create into internal db and client calls

Consequences:

  • pool-join should now work for bonded networks as well as VLANs

Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good from a clustering point of view, however I'd like a second opinion on whether it is safe to resync VM state before the storage.
(And we can't resync VM state after storage, due to the comment about CA-175353)

Could you create a Jenkins branch and run a BST on it?

@edwintorok edwintorok requested a review from jonludlam May 21, 2018 16:38
@coveralls
Copy link

coveralls commented May 21, 2018

Coverage Status

Coverage increased (+0.008%) to 20.781% when pulling b6372f2 on minishrink:feature/REQ477/CA-290237 into 78b5dd7 on xapi-project:feature/REQ477/master.

@minishrink
Copy link
Contributor Author

Submitted a Jenkins branch with the same name

Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @robhoes that changing the startup sequence is risky. The log_and_ignore part is good (the NIC may not exist at all), we have to avoid getting into a restart loop, however we shouldn't move around the initializing storage if we can avoid it.

Instead we should look up the primary slave of a bond if we can't find the PIF, and try to use that for clustering.

@jonludlam
Copy link
Contributor

Yeah, the startup ordering stuff is really really fragile. Don't change it unless you must, and if you do, be prepared to spend a while debugging nasty corner cases :-(

@edwintorok
Copy link
Contributor

edwintorok commented May 22, 2018

It might be better to fix this in another way that should also fix another bug with clustering not working on startup when the bond gets an IP too late:

  • call Cluster_host.create_as_necessary after the bonds/vlans/etc. get synchronized
  • drop the resync_host call on startup
  • in wait_for_management_ip just before plugging the unplugged PBDs add a maybe_wait_for_clustering_ip
  • this would look up the PIF of the cluster host if any, and then wait until it has an IP address, and he PIF is connected
  • once that happens, call resync_host which enables clustering
  • then return from the function which would then do plug_unplugged_pbds
  • make sure no exceptions escape from the wait for function, wrap it in log_and_ignore_exn

Could also consider adding a 2nd step after wait for management IP that uses Startup.OnThread, and does the maybe_wait_for_clustering_ip and then plug_unplugged_pbds in order not to block the startup

@edwintorok edwintorok removed the request for review from jonludlam May 22, 2018 14:10
@edwintorok edwintorok assigned minishrink and unassigned edwintorok May 22, 2018
@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch 3 times, most recently from 2fafad8 to e89cb1f Compare May 22, 2018 17:19
@minishrink minishrink changed the title CA-290237: Synchronise network objects before resyncing cluster_hosts CA-290237: Create cluster_hosts after network objects and safely resync hosts May 22, 2018
@minishrink
Copy link
Contributor Author

Will run a BVT and BST.

"Synchronising VLANs on slave with master", [Startup.OnlySlave; Startup.NoExnRaising], Sync_networking.copy_vlans_from_master ~__context;
"Synchronising tunnels on slave with master", [Startup.OnlySlave; Startup.NoExnRaising], Sync_networking.copy_tunnels_from_master ~__context;
(* CA-290237: Create cluster objects after network objects so PIFs are available *)
"Create any necessary cluster_host objects", [ Startup.NoExnRaising ], (fun () -> Xapi_cluster_host.create_as_necessary __context (Helpers.get_localhost ~__context));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use log_and_ignore_exn here, so if something goes wrong XAPI doesn't go into a restart loop.

(fun () -> Pool_db_backup.fetch_database_backup ~master_address:(Pool_role.get_master_address())
~pool_secret:!Xapi_globs.pool_secret ~force:None);
"wait management interface to come up", [ Startup.NoExnRaising ], wait_management_interface;
"wait management interface to come up, resync any cluster_hosts created", [ Startup.NoExnRaising ], wait_management_interface;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and plug any unplugged PBDs.

if List.mem (Db.Cluster.get_cluster_stack ~__context ~self) required_cluster_stacks
then Xapi_cluster_host.resync_host ~__context ~host
then begin
Xapi_cluster_host.create_as_necessary ~__context ~host;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the object can be missing here, it got created during the startup sequence, or during join, this one probably doesn't need to be changed, and calling just resync_host is fine.

begin match Xapi_clustering.find_cluster_host ~__context ~host with
| Some self -> ignore
Xapi_mgmt_iface.(wait_for_clustering_ip ~__context ~self);
Xapi_cluster_host.resync_host ~__context ~host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugging the PBD already calls resync_host, so the only thing we need here is to wait for an IP.
Rename this to maybe_wait_for_clustering_ip.

(fun () -> Pool_db_backup.fetch_database_backup ~master_address:(Pool_role.get_master_address())
~pool_secret:!Xapi_globs.pool_secret ~force:None);
"wait management interface to come up", [ Startup.NoExnRaising ], wait_management_interface;
"wait management interface to come up, resync any cluster_hosts created", [ Startup.NoExnRaising ], wait_management_interface;
Copy link
Contributor

@edwintorok edwintorok May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a blocking action, if something goes wrong and the networking interface of clustering doesn't come up, it would be nice if we didn't block the whole of XAPI.
So perhaps introduce a second entry here below 'wait for management interface', that says "wait for clustering IP if any, and plug unplugged PBDs" that calls your maybe_wait_for_clustering_ip function above, and uses Startup.OnThread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So move PBD plugging to the wait_for_clustering_ip function?

Copy link
Contributor

@edwintorok edwintorok May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, keep it both where it is now, and add it inside the new step as well, we should not change existing behaviour. Other PBDs may successfully plug even if we don't have clustering

@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch 2 times, most recently from 5cb789a to 8754ea3 Compare May 23, 2018 15:50
@edwintorok
Copy link
Contributor

I think the logic should work now, the commit message should probably say this about the order

  • create network objects (unmodified)
  • create cluster host if need be
    ...
  • wait for management IP (unmodified)
  • wait for clustering IP, plug any unplugged PBDs which would also resync the cluster host if needed

@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch 2 times, most recently from 4463739 to 058ddad Compare May 23, 2018 15:57
Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets wait for tests to finish running before merging.

(fun () -> while !iP="" || not !attached do
Condition.wait ip_cond ip_mutex;
iP := Db.PIF.get_IP ~__context ~self:pIF;
attached := Db.PIF.get_currently_attached ~__context ~self:pIF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does do the same as get_management_iface_is_connected, which is checking the PIF carrier (link state). You can use PIF_metrics.carrier for that.

"Synchronising VLANs on slave with master", [Startup.OnlySlave; Startup.NoExnRaising], Sync_networking.copy_vlans_from_master ~__context;
"Synchronising tunnels on slave with master", [Startup.OnlySlave; Startup.NoExnRaising], Sync_networking.copy_tunnels_from_master ~__context;
(* CA-290237: Create cluster objects after network objects so PIFs are available *)
"Create any necessary cluster_host objects", [ Startup.NoExnRaising ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minishrink we'll need to move this after we got an IP. Creating a cluster host object checks for the presence of an IP, and you get a NO_COMPATIBLE_CLUSTER_HOST error, caused by the create failing due to PIF_HAS_NO_NETWORK_CONFIGURATION

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the object not only creates the object in the Db but also tries to enable it, and creates the object in the db only if enabling succeeded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add CA-290473 in the commit message when this is fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it might be better to have a function here just create the database object, and have the actual join/enable done elsewhere (in resync_host).
For that to work:

  • add a new field to cluster_host called joined, default to true for backwards compat
  • refactor Cluster_host.create into a create_db and create (where create calls create_db and then resync_host)
  • resync_host will look at the joined field: if true then it calls daemon.enable+join, and sets joined to true if successful, otherwise it calls enable if enabled
  • in the startup sequence here only call create_db

Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more problems found when testing with VLANs with a new xenrt test.

@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch from 058ddad to 47b03ab Compare May 25, 2018 11:20
warn "Error occurred during Cluster_host.create";
handle_error error
end;
if not (Db.Cluster_host.get_joined ~__context ~self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might look odd, but I think its better for ordering to keep it like this. The logical sequence is: join then later enable if not already enabled.

@edwintorok
Copy link
Contributor

The cleanup create_storage seems to be a no-op and not really needed for this PR, can we keep it for a future PR?

@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch from fe0bd2f to dfbbb4c Compare June 4, 2018 14:04
Cluster_host object exists for the host. If one does not exist, the call
will return the reference to the Cluster object wrapped in 'Some' *)

val create_internal : __context:Context.t -> cluster:API.ref_Cluster ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you expose internal functions?

host:API.ref_host -> pIF:API.ref_PIF -> API.ref_Cluster_host
(** [create_internal ~__context ~cluster ~host ~pif] creates cluster_host in DB *)

val join_internal : __context:Context.t -> self:API.ref_Cluster_host -> unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question.

(** [resync_host ~__context ~host] checks for the existence of a cluster_host.
If one is found and enabled in the db, it is enabled in the Client layer
too. Otherwise, nothing happens. *)
If one exists but hasn't joined the cluster, [join_internal] is called,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably move the docs of join_internal inline here.

(** [create ~__context ~cluster ~host] is implementation of the XenAPI call
'Cluster_host.create'. It is the Cluster_host object constructor *)
'Cluster_host.create'. It is the Cluster_host object constructor, and works by
calling `create_internal`, then `resync_host`, which either joins the host to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_internal and resync_host are implementation details, you should probably copy/move their descriptions here.

@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch from b4a2cbb to cfc5364 Compare June 4, 2018 15:28
| _ -> ()

let assert_operation_valid ~__context ~self ~op =
assert_joined_cluster ~__context ~self op;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be part of get_operation_error, because that is also used during update of allowed operations below.
I know currently the allowed operations are not updated properly, but lets avoid introducing more bugs :)

@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch from cfc5364 to 169158d Compare June 4, 2018 15:35
then raise Api_errors.(Server_error (cluster_host_not_joined, [ Ref.string_of self ]))
| None -> () end
end;
assert_allowed_during_rpu __context op;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as below, now that you refactored it, its obvious that this is in the wrong place, please move it inside get_operation_error.

@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch 3 times, most recently from 6e8a5af to 2f30ce8 Compare June 5, 2018 08:25
Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found why join fails: one of the asserts is wrong now that we've already created the cluster

with_clustering_lock (fun () ->
assert_operation_host_target_is_localhost ~__context ~host;
let host = Db.Cluster_host.get_host ~__context ~self in
assert_cluster_host_can_be_created ~__context ~host;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete this line, it will always fail, because we've created it already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it to create_internal


let host = Db.Cluster_host.get_host ~__context ~self in
assert_operation_host_target_is_localhost ~__context ~host;
assert_cluster_host_can_be_created ~__context ~host;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this one too, also this whole sequence of asserts seem to be duplicate of some asserts that already existed above, are they redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but some are in the wrong place.

Helpers.call_api_functions ~__context (fun rpc session_id ->
try
api_func rpc session_id
with err ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an Backtrace.is_important err right after the with to capture the backtrace as soon as possible.
If you do anything else before the backtrace will get overwritten.

assert_cluster_host_can_be_created ~__context ~host;
assert_pif_attached_to ~host ~pif ~__context;
let ref = Ref.make () in
assert_pif_attached_to ~host ~pIF ~__context;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to query the pif again, otherwise the fix PIF prerequisites function before has done its job of setting the PIF to disallow unplug, but we still got the old PIF.
Would be better if we moved querying the PIF inside fix_pif_prerequisites, and just give fix_pif_prerequisites the PIF ref, not the PIF record (which can be out of date)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should all the PIF helpers use the ref and not the record?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code that modifies the PIF should be before the one that fetches the PIF record, so any code that modifies a PIF must not take the record as input, that way we can move the code that fetches the PIF record after all the code that modifies it. Currently the only code that modifies it is fix_pif_prerequisites

@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch from b903770 to e94655e Compare June 5, 2018 14:03
assert_cluster_host_can_be_created ~__context ~host;
assert_pif_attached_to ~host ~pIF ~__context;
assert_pif_prerequisites (pIF,pifrec);
(* Re-acquire PIF record after it has updated *)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of delay do we need here in order to refresh the record?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we just use pif_of_host here again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No delay is needed, just fetch it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if fix_pif_prerequisites fetched the record itself, to avoid accidentally reusing a stale record in the caller.

@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch from 7bb6da6 to 050fe0d Compare June 5, 2018 17:38
@minishrink minishrink changed the title [WIP] CA-290237, CA-290473: Create cluster_hosts after network objects and safely resync hosts CA-290237, CA-290473: fix pool-join on bonds and VLANs Jun 5, 2018
@minishrink minishrink changed the title CA-290237, CA-290473: fix pool-join on bonds and VLANs CA-290237, CA-290473: fix pool-join on bonded networks and VLANs Jun 5, 2018
Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall diff is good now.
Some minor comments on structuring of individual commits

Akanksha Mathur added 2 commits June 6, 2018 10:31
- don't call Cluster_host.enable before cluster join finishes
- attempt to plug PBDs after resyncing host

Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>
- separate Cluster_host.create into internal db and client calls
- refactor assertions, helpers, tests, and locks
- block Cluster_host operations if not joined, except leave

Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>
@minishrink minishrink force-pushed the feature/REQ477/CA-290237 branch from 050fe0d to b6372f2 Compare June 6, 2018 10:44
Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, this looks good now.

@edwintorok edwintorok merged commit 1f7a322 into xapi-project:feature/REQ477/master Jun 6, 2018
@minishrink minishrink deleted the feature/REQ477/CA-290237 branch June 6, 2018 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants