CA-290237, CA-290473: fix pool-join on bonded networks and VLANs #3607

minishrink · 2018-05-21T16:33:59Z

This PR cleans up some of the breakages exposed by the recent bonding enabling (CP-26912).

Changes made to:

startup sequence
in order:

create networking objects
create cluster_host object (if necessary)
wait for clustering IP
resync cluster_host

cluster_host operations

add joined field to represent cluster membership of host
block all but Cluster_host.destroy (leave) if not joined
separate Cluster_host.create into internal db and client calls

Consequences:

pool-join should now work for bonded networks as well as VLANs

edwintorok

This looks good from a clustering point of view, however I'd like a second opinion on whether it is safe to resync VM state before the storage.
(And we can't resync VM state after storage, due to the comment about CA-175353)

Could you create a Jenkins branch and run a BST on it?

coveralls · 2018-05-21T16:54:44Z

Coverage increased (+0.008%) to 20.781% when pulling b6372f2 on minishrink:feature/REQ477/CA-290237 into 78b5dd7 on xapi-project:feature/REQ477/master.

minishrink · 2018-05-22T09:29:48Z

Submitted a Jenkins branch with the same name

edwintorok

Discussed with @robhoes that changing the startup sequence is risky. The log_and_ignore part is good (the NIC may not exist at all), we have to avoid getting into a restart loop, however we shouldn't move around the initializing storage if we can avoid it.

Instead we should look up the primary slave of a bond if we can't find the PIF, and try to use that for clustering.

jonludlam · 2018-05-22T11:17:13Z

Yeah, the startup ordering stuff is really really fragile. Don't change it unless you must, and if you do, be prepared to spend a while debugging nasty corner cases :-(

edwintorok · 2018-05-22T13:31:27Z

It might be better to fix this in another way that should also fix another bug with clustering not working on startup when the bond gets an IP too late:

call Cluster_host.create_as_necessary after the bonds/vlans/etc. get synchronized
drop the resync_host call on startup
in wait_for_management_ip just before plugging the unplugged PBDs add a maybe_wait_for_clustering_ip
this would look up the PIF of the cluster host if any, and then wait until it has an IP address, and he PIF is connected
once that happens, call resync_host which enables clustering
then return from the function which would then do plug_unplugged_pbds
make sure no exceptions escape from the wait for function, wrap it in log_and_ignore_exn

Could also consider adding a 2nd step after wait for management IP that uses Startup.OnThread, and does the maybe_wait_for_clustering_ip and then plug_unplugged_pbds in order not to block the startup

minishrink · 2018-05-22T17:23:46Z

Will run a BVT and BST.

edwintorok · 2018-05-23T14:01:22Z

ocaml/xapi/xapi.ml

          "Synchronising VLANs on slave with master", [Startup.OnlySlave; Startup.NoExnRaising], Sync_networking.copy_vlans_from_master ~__context;
          "Synchronising tunnels on slave with master", [Startup.OnlySlave; Startup.NoExnRaising], Sync_networking.copy_tunnels_from_master ~__context;
+          (* CA-290237: Create cluster objects after network objects so PIFs are available *)
+          "Create any necessary cluster_host objects", [ Startup.NoExnRaising ], (fun () -> Xapi_cluster_host.create_as_necessary __context (Helpers.get_localhost ~__context));


Please use log_and_ignore_exn here, so if something goes wrong XAPI doesn't go into a restart loop.

edwintorok · 2018-05-23T14:02:15Z

ocaml/xapi/xapi.ml

          (fun () -> Pool_db_backup.fetch_database_backup ~master_address:(Pool_role.get_master_address())
              ~pool_secret:!Xapi_globs.pool_secret ~force:None);
-          "wait management interface to come up", [ Startup.NoExnRaising ], wait_management_interface;
+          "wait management interface to come up, resync any cluster_hosts created", [ Startup.NoExnRaising ], wait_management_interface;


... and plug any unplugged PBDs.

edwintorok · 2018-05-23T14:06:56Z

ocaml/xapi/xapi_pbd.ml

    if List.mem (Db.Cluster.get_cluster_stack ~__context ~self) required_cluster_stacks
-    then Xapi_cluster_host.resync_host ~__context ~host
+    then begin
+      Xapi_cluster_host.create_as_necessary ~__context ~host;


I don't think that the object can be missing here, it got created during the startup sequence, or during join, this one probably doesn't need to be changed, and calling just resync_host is fine.

edwintorok · 2018-05-23T14:10:52Z

ocaml/xapi/xapi.ml

+          begin match Xapi_clustering.find_cluster_host ~__context ~host with
+          | Some self -> ignore
+            Xapi_mgmt_iface.(wait_for_clustering_ip ~__context ~self);
+            Xapi_cluster_host.resync_host ~__context ~host


Plugging the PBD already calls resync_host, so the only thing we need here is to wait for an IP.
Rename this to maybe_wait_for_clustering_ip.

edwintorok · 2018-05-23T14:15:12Z

ocaml/xapi/xapi.ml

          (fun () -> Pool_db_backup.fetch_database_backup ~master_address:(Pool_role.get_master_address())
              ~pool_secret:!Xapi_globs.pool_secret ~force:None);
-          "wait management interface to come up", [ Startup.NoExnRaising ], wait_management_interface;
+          "wait management interface to come up, resync any cluster_hosts created", [ Startup.NoExnRaising ], wait_management_interface;


Since this is a blocking action, if something goes wrong and the networking interface of clustering doesn't come up, it would be nice if we didn't block the whole of XAPI.
So perhaps introduce a second entry here below 'wait for management interface', that says "wait for clustering IP if any, and plug unplugged PBDs" that calls your maybe_wait_for_clustering_ip function above, and uses Startup.OnThread.

So move PBD plugging to the wait_for_clustering_ip function?

No, keep it both where it is now, and add it inside the new step as well, we should not change existing behaviour. Other PBDs may successfully plug even if we don't have clustering

edwintorok · 2018-05-23T15:53:32Z

I think the logic should work now, the commit message should probably say this about the order

create network objects (unmodified)
create cluster host if need be
...
wait for management IP (unmodified)
wait for clustering IP, plug any unplugged PBDs which would also resync the cluster host if needed

edwintorok

Lets wait for tests to finish running before merging.

robhoes · 2018-05-24T10:34:21Z

ocaml/xapi/xapi_mgmt_iface.ml

+    (fun () -> while !iP="" || not !attached do
+        Condition.wait ip_cond ip_mutex;
+        iP := Db.PIF.get_IP ~__context ~self:pIF;
+        attached := Db.PIF.get_currently_attached ~__context ~self:pIF


This does do the same as get_management_iface_is_connected, which is checking the PIF carrier (link state). You can use PIF_metrics.carrier for that.

edwintorok · 2018-05-24T12:51:33Z

ocaml/xapi/xapi.ml

          "Synchronising VLANs on slave with master", [Startup.OnlySlave; Startup.NoExnRaising], Sync_networking.copy_vlans_from_master ~__context;
          "Synchronising tunnels on slave with master", [Startup.OnlySlave; Startup.NoExnRaising], Sync_networking.copy_tunnels_from_master ~__context;
+          (* CA-290237: Create cluster objects after network objects so PIFs are available *)
+          "Create any necessary cluster_host objects", [ Startup.NoExnRaising ],


@minishrink we'll need to move this after we got an IP. Creating a cluster host object checks for the presence of an IP, and you get a NO_COMPATIBLE_CLUSTER_HOST error, caused by the create failing due to PIF_HAS_NO_NETWORK_CONFIGURATION

Creating the object not only creates the object in the Db but also tries to enable it, and creates the object in the db only if enabling succeeded.

Please add CA-290473 in the commit message when this is fixed.

Actually it might be better to have a function here just create the database object, and have the actual join/enable done elsewhere (in resync_host).
For that to work:

add a new field to cluster_host called joined, default to true for backwards compat

refactor Cluster_host.create into a create_db and create (where create calls create_db and then resync_host)

resync_host will look at the joined field: if true then it calls daemon.enable+join, and sets joined to true if successful, otherwise it calls enable if enabled

in the startup sequence here only call create_db

edwintorok

Some more problems found when testing with VLANs with a new xenrt test.

edwintorok · 2018-06-04T13:45:49Z

ocaml/xapi/xapi_cluster_host.ml

-        warn "Error occurred during Cluster_host.create";
-        handle_error error
-    end;
+    if not (Db.Cluster_host.get_joined ~__context ~self)


This might look odd, but I think its better for ordering to keep it like this. The logical sequence is: join then later enable if not already enabled.

edwintorok · 2018-06-04T13:47:12Z

The cleanup create_storage seems to be a no-op and not really needed for this PR, can we keep it for a future PR?

edwintorok · 2018-06-04T14:59:49Z

ocaml/xapi/xapi_cluster_host.mli

    Cluster_host object exists for the host. If one does not exist, the call
    will return the reference to the Cluster object wrapped in 'Some' *)

+val create_internal : __context:Context.t -> cluster:API.ref_Cluster ->


Why do you expose internal functions?

edwintorok · 2018-06-04T14:59:55Z

ocaml/xapi/xapi_cluster_host.mli

+    host:API.ref_host -> pIF:API.ref_PIF -> API.ref_Cluster_host
+(** [create_internal ~__context ~cluster ~host ~pif] creates cluster_host in DB *)
+
+val join_internal : __context:Context.t -> self:API.ref_Cluster_host -> unit


Same question.

edwintorok · 2018-06-04T15:01:09Z

ocaml/xapi/xapi_cluster_host.mli

 (** [resync_host ~__context ~host] checks for the existence of a cluster_host.
-    If one is found and enabled in the db, it is enabled in the Client layer
-    too. Otherwise, nothing happens. *)
+    If one exists but hasn't joined the cluster, [join_internal] is called,


you can probably move the docs of join_internal inline here.

edwintorok · 2018-06-04T15:01:29Z

ocaml/xapi/xapi_cluster_host.mli

 (** [create ~__context ~cluster ~host] is implementation of the XenAPI call
-    'Cluster_host.create'. It is the Cluster_host object constructor *)
+    'Cluster_host.create'. It is the Cluster_host object constructor, and works by
+     calling `create_internal`, then `resync_host`, which either joins the host to


create_internal and resync_host are implementation details, you should probably copy/move their descriptions here.

edwintorok · 2018-06-04T15:34:59Z

ocaml/xapi/xapi_cluster_host_helpers.ml

+  | _ -> ()
+
 let assert_operation_valid ~__context ~self ~op =
+  assert_joined_cluster ~__context ~self op;


This should be part of get_operation_error, because that is also used during update of allowed operations below.
I know currently the allowed operations are not updated properly, but lets avoid introducing more bugs :)

edwintorok · 2018-06-04T15:36:09Z

ocaml/xapi/xapi_cluster_helpers.ml

-        then raise Api_errors.(Server_error (cluster_host_not_joined, [ Ref.string_of self ]))
-      | None -> () end
-  end;
+  assert_allowed_during_rpu __context op;


Same comment as below, now that you refactored it, its obvious that this is in the wrong place, please move it inside get_operation_error.

edwintorok

Found why join fails: one of the asserts is wrong now that we've already created the cluster

edwintorok · 2018-06-05T10:23:29Z

ocaml/xapi/xapi_cluster_host.ml

-  with_clustering_lock (fun () ->
-      assert_operation_host_target_is_localhost ~__context ~host;
+      let host = Db.Cluster_host.get_host ~__context ~self in
      assert_cluster_host_can_be_created ~__context ~host;


Please delete this line, it will always fail, because we've created it already.

I'll move it to create_internal

edwintorok · 2018-06-05T10:24:08Z

ocaml/xapi/xapi_cluster_host.ml

+
+      let host = Db.Cluster_host.get_host ~__context ~self in
+      assert_operation_host_target_is_localhost ~__context ~host;
+      assert_cluster_host_can_be_created ~__context ~host;


Delete this one too, also this whole sequence of asserts seem to be duplicate of some asserts that already existed above, are they redundant?

No, but some are in the wrong place.

edwintorok · 2018-06-05T10:25:53Z

ocaml/xapi/xapi_cluster_host.ml

+    Helpers.call_api_functions ~__context (fun rpc session_id ->
        try
          api_func rpc session_id
        with err ->


Please add an Backtrace.is_important err right after the with to capture the backtrace as soon as possible.
If you do anything else before the backtrace will get overwritten.

edwintorok · 2018-06-05T10:30:35Z

ocaml/xapi/xapi_cluster_host.ml

      assert_cluster_host_can_be_created ~__context ~host;
-      assert_pif_attached_to ~host ~pif ~__context;
-      let ref = Ref.make () in
+      assert_pif_attached_to ~host ~pIF ~__context;


We need to query the pif again, otherwise the fix PIF prerequisites function before has done its job of setting the PIF to disallow unplug, but we still got the old PIF.
Would be better if we moved querying the PIF inside fix_pif_prerequisites, and just give fix_pif_prerequisites the PIF ref, not the PIF record (which can be out of date)

In that case, should all the PIF helpers use the ref and not the record?

I think the code that modifies the PIF should be before the one that fetches the PIF record, so any code that modifies a PIF must not take the record as input, that way we can move the code that fetches the PIF record after all the code that modifies it. Currently the only code that modifies it is fix_pif_prerequisites

minishrink · 2018-06-05T14:06:41Z

ocaml/xapi/xapi_cluster_host.ml

-      assert_cluster_host_can_be_created ~__context ~host;
-      assert_pif_attached_to ~host ~pIF ~__context;
-      assert_pif_prerequisites (pIF,pifrec);
+      (* Re-acquire PIF record after it has updated *)


What kind of delay do we need here in order to refresh the record?

Or should we just use pif_of_host here again?

No delay is needed, just fetch it again.

Would be good if fix_pif_prerequisites fetched the record itself, to avoid accidentally reusing a stale record in the caller.

edwintorok

I think the overall diff is good now.
Some minor comments on structuring of individual commits

- don't call Cluster_host.enable before cluster join finishes - attempt to plug PBDs after resyncing host Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

- separate Cluster_host.create into internal db and client calls - refactor assertions, helpers, tests, and locks - block Cluster_host operations if not joined, except leave Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

edwintorok

Thanks for the update, this looks good now.

minishrink assigned edwintorok May 21, 2018

edwintorok approved these changes May 21, 2018

View reviewed changes

edwintorok requested a review from jonludlam May 21, 2018 16:38

edwintorok requested changes May 22, 2018

View reviewed changes

edwintorok removed the request for review from jonludlam May 22, 2018 14:10

edwintorok assigned minishrink and unassigned edwintorok May 22, 2018

minishrink force-pushed the feature/REQ477/CA-290237 branch 3 times, most recently from 2fafad8 to e89cb1f Compare May 22, 2018 17:19

minishrink added the updated label May 22, 2018

minishrink changed the title ~~CA-290237: Synchronise network objects before resyncing cluster_hosts~~ CA-290237: Create cluster_hosts after network objects and safely resync hosts May 22, 2018

edwintorok reviewed May 23, 2018

View reviewed changes

minishrink force-pushed the feature/REQ477/CA-290237 branch 2 times, most recently from 5cb789a to 8754ea3 Compare May 23, 2018 15:50

minishrink force-pushed the feature/REQ477/CA-290237 branch 2 times, most recently from 4463739 to 058ddad Compare May 23, 2018 15:57

edwintorok approved these changes May 23, 2018

View reviewed changes

robhoes reviewed May 24, 2018

View reviewed changes

edwintorok reviewed May 24, 2018

View reviewed changes

edwintorok requested changes May 24, 2018

View reviewed changes

minishrink added don't merge and removed updated labels May 24, 2018

minishrink force-pushed the feature/REQ477/CA-290237 branch from 058ddad to 47b03ab Compare May 25, 2018 11:20

edwintorok reviewed Jun 4, 2018

View reviewed changes

minishrink force-pushed the feature/REQ477/CA-290237 branch from fe0bd2f to dfbbb4c Compare June 4, 2018 14:04

edwintorok reviewed Jun 4, 2018

View reviewed changes

minishrink force-pushed the feature/REQ477/CA-290237 branch from b4a2cbb to cfc5364 Compare June 4, 2018 15:28

edwintorok reviewed Jun 4, 2018

View reviewed changes

minishrink force-pushed the feature/REQ477/CA-290237 branch from cfc5364 to 169158d Compare June 4, 2018 15:35

edwintorok reviewed Jun 4, 2018

View reviewed changes

minishrink force-pushed the feature/REQ477/CA-290237 branch 3 times, most recently from 6e8a5af to 2f30ce8 Compare June 5, 2018 08:25

edwintorok requested changes Jun 5, 2018

View reviewed changes

edwintorok reviewed Jun 5, 2018

View reviewed changes

minishrink force-pushed the feature/REQ477/CA-290237 branch from b903770 to e94655e Compare June 5, 2018 14:03

minishrink commented Jun 5, 2018

View reviewed changes

minishrink force-pushed the feature/REQ477/CA-290237 branch from 7bb6da6 to 050fe0d Compare June 5, 2018 17:38

minishrink removed the don't merge label Jun 5, 2018

minishrink changed the title ~~[WIP] CA-290237, CA-290473: Create cluster_hosts after network objects and safely resync hosts~~ CA-290237, CA-290473: fix pool-join on bonds and VLANs Jun 5, 2018

minishrink changed the title ~~CA-290237, CA-290473: fix pool-join on bonds and VLANs~~ CA-290237, CA-290473: fix pool-join on bonded networks and VLANs Jun 5, 2018

edwintorok reviewed Jun 6, 2018

View reviewed changes

Akanksha Mathur added 2 commits June 6, 2018 10:31

CA-290237: Wait for clustering IP before resyncing host

26db637

- don't call Cluster_host.enable before cluster join finishes - attempt to plug PBDs after resyncing host Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

minishrink force-pushed the feature/REQ477/CA-290237 branch from 050fe0d to b6372f2 Compare June 6, 2018 10:44

edwintorok approved these changes Jun 6, 2018

View reviewed changes

edwintorok merged commit 1f7a322 into xapi-project:feature/REQ477/master Jun 6, 2018

minishrink deleted the feature/REQ477/CA-290237 branch June 6, 2018 14:10

CA-290237, CA-290473: fix pool-join on bonded networks and VLANs #3607

CA-290237, CA-290473: fix pool-join on bonded networks and VLANs #3607

Uh oh!

Conversation

minishrink commented May 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edwintorok left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented May 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minishrink commented May 22, 2018

Uh oh!

edwintorok left a comment

Choose a reason for hiding this comment

Uh oh!

jonludlam commented May 22, 2018

Uh oh!

edwintorok commented May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minishrink commented May 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edwintorok May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edwintorok May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edwintorok commented May 23, 2018

Uh oh!

edwintorok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edwintorok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edwintorok commented Jun 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edwintorok left a comment

Choose a reason for hiding this comment

minishrink commented May 21, 2018 •

edited

Loading

edwintorok left a comment •

edited

Loading

coveralls commented May 21, 2018 •

edited

Loading

edwintorok commented May 22, 2018 •

edited

Loading

edwintorok May 23, 2018 •

edited

Loading

edwintorok May 23, 2018 •

edited

Loading