CP-26717: Port Gpumon to PPX-based RPCs #196

minishrink · 2018-01-28T14:12:34Z

This is the first of 3 PRs to port the GPU monitoring daemon gpumon to use PPX-based code generation for RPCs. This PR updates the Gpumon interface and client to use PPX, and adds a dev/debugging CLI.
(Core BVT and Ring3 BST have passed)

PRs involved:

Port Gpumon IDL to PPX (this one)
Update Xapi to use upgraded Gpumon interface
Port Gpumon daemon

mseri · 2018-01-29T10:38:13Z

gpumon/gpumon_client.ml

+      xml_url
+      call
+module Client = RPC_API(Idl.GenClientExnRpc(struct let rpc=rpc end))
+include Client


Why did you decide to include the client? Just to avoid doing Gpumon_client.Client?

Yes, although it looks like that's redundant since it's only called in xapi 4 times and always as Gpumon_client.Client anyway, so I can change that.

It's fine either way

mseri · 2018-01-29T10:39:09Z

gpumon/jbuild

@@ -38,7 +37,7 @@ let () = Printf.ksprintf Jbuild_plugin.V1.send {|
    threads
    xcp))
  (wrapped false)
-  %s))


Please keep the coverage_rewriters otherwise this would break @edwintorok coverage work

So, keep %s and coverage_rewriter in lines 41 and 56? Or just line 56?

mseri · 2018-01-29T10:48:24Z

gpumon/gpumon_interface.ml


-type incompatibility_reason = Host_driver | Guest_driver | GPU | Other
-type compatibility = Compatible | Incompatible of incompatibility_reason list
+(** Boolean: compatible? *)


This doctstring is a bit weird. This is the compatibility information for vgpu migrations returned from the Nvidia library. Vgpu and pgpu can be compatible or they can be incompatible for a number of reasons. This type encodes that behaviour

Yeah, I wasn't sure what to write there.

mseri · 2018-01-29T10:49:54Z

gpumon/gpumon_interface.ml

+      ;"in the form `domain:bus:device.function` PCI identifier."]
+      pgpu_address
+
+  let nvidia_pgpu_metadata_p = param ~description:


It could make sense, now that it is possible, to move all these nvidia_ types and errors to the Nvidia submodule dropping the prefix and encapsulating things more appropriately. They had to be extracted only due to camlp4 generator limitations.

mseri

Looks good but we need to avoid breanking coverage and it makes sense to make the interface more coherent now that we can

minishrink · 2018-01-29T11:03:03Z

gpumon/gpumon_interface.ml

+
+
+(** Error wrapper *)
+type gpu_errors =


One thing I'm uncertain about is whether we need an error for compatibility issues. I assume these are already handled, but is it worth including one just in case?

The compatibility errors are only for the Nvidia module at the moment, we will need new ones for ATI or Intel if ever needed. How to deal with it I think is open for discussion:

It is probably not worth splitting:

type nvml_error = | NvmlInterfaceNotAvailable (** Exception raised when gpumon is unable to load the nvml nvidia library *) | NvmlFailure of string (** Exception raised by the c bindings to the nvml nvidia library*) [@@deriving rpcty] (** Error wrapper *) type gpu_errors = | Nvml of nvml_error (** Error raised by the Nvml library bindings *) | Gpumon_failure (** Default exception raised upon daemon failure *) [@@default Gpumon_failure] [@@deriving rpcty]

mseri

Don't merge until all the necessary testing have been performed

lindig · 2018-01-29T14:05:27Z

gpumon/gpumon_client.ml

@@ -17,14 +17,13 @@ open Xcp_client

 let xml_url () = "file:" ^ xml_path

-module Client = Gpumon_interface.Client(struct


With this gone, do we still need open Gpumon_interface?

lindig · 2018-01-29T14:06:52Z

gpumon/gpumon_client.ml

-end)
-
+let rpc call =
+  if !use_switch


I'd rather see if !Xcp_service.use_switch and no open Xcp_client. This module is so small that it should not need `open.

Without opening modules:

let xml_url () = "file:" ^ Gpumon_interface.xml_path let rpc call = if !Xcp_client.use_switch then Xcp_client.json_switch_rpc Gpumon_interface.queue_name call else Xcp_client.xml_http_rpc ~srcstr:(Xcp_client.get_user_agent ()) ~dststr:"gpumon" xml_url call module Client = Gpumon_interface.RPC_API(Idl.GenClientExnRpc(struct let rpc=rpc end))

I think there's a case to be made either way, what do you think?

let rpc call = let open Xcp_client in if !use_switch then json_switch_rpc Gpumon_interface.queue_name call else xml_http_rpc ~srcstr:(get_user_agent ()) ~dststr:"gpumon" xml_url call

I think it is better to be explicit. You can still to inside rpc:

let rpc call = let open Xcp_client in

or

module C = Xcp_client let rpc call = if !C.use_switch then ..

lindig · 2018-01-29T14:08:02Z

gpumon/gpumon_interface.ml

+              ; description =
+                  [ "This interface is used by Xapi and Gpumon to monitor "
+                  ; "physical and virtual GPUs."]
+              ;


Join to ; version = (1,0,0)

lindig · 2018-01-29T14:10:19Z

gpumon/gpumon_interface.ml

+    let get_pgpu_vm_compatibility =
+      declare "get_pgpu_vm_compatibility"
+        ["Checks compatibility between a VM's vGPU(s) and another pGPU."]
+        (debug_info_p @->  pgpu_address_p @-> domid_p @-> nvidia_pgpu_metadata_p @-> returning compatibility_p gpu_err )


Very long line; how about

( debug_info_p @-> pgpu_address_p @-> domid_p @-> nvidia_pgpu_metadata_p @-> returning compatibility_p gpu_err )

or similar? This also provides room for comments on parameters when needed.

lindig · 2018-01-29T14:10:52Z

gpumon/gpumon_interface.ml

+    let get_pgpu_vgpu_compatibility =
+      declare "get_pgpu_vgpu_compatibility"
+        ["Checks compatibility between a pGPU (on a host) and a list of vGPUs (assigned to a VM). Note: A VM may use several vGPUs."]
+        ( debug_info_p @->  nvidia_pgpu_metadata_p @-> nvidia_vgpu_metadata_list_p @-> returning compatibility_p gpu_err )


Again, a very long line. Break it up in at least two lines.

mseri

Do not merge until all testing is complete and all the related PRs are approved

coveralls · 2018-01-30T12:36:51Z

Coverage remained the same at 11.014% when pulling 8a007fa on minishrink:gpu_ppx into 8ab1ac7 on xapi-project:master.

Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

mseri · 2018-02-01T14:32:51Z

Please squash the fixup commits

Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com> fixup! CP-26717: Minor edits and refactoring - no global opens in Gpumon_client - reformatted Gpumon_interface - rewrote compatibility type docstring in Gpumon_interface Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

This was referenced Jan 28, 2018

CP-26717: Update Xapi to use PPX-based Gpumon IDL xapi-project/xen-api#3418

Merged

CP-26717: Port Gpumon to PPX-based RPCs xenserver/gpumon#29

Merged

mseri reviewed Jan 29, 2018

View reviewed changes

mseri requested changes Jan 29, 2018

View reviewed changes

minishrink commented Jan 29, 2018

View reviewed changes

mseri approved these changes Jan 29, 2018

View reviewed changes

lindig reviewed Jan 29, 2018

View reviewed changes

lindig approved these changes Jan 29, 2018

View reviewed changes

mseri approved these changes Jan 29, 2018

View reviewed changes

minishrink force-pushed the gpu_ppx branch from 34f3e1c to 55fef37 Compare January 30, 2018 11:30

minishrink force-pushed the gpu_ppx branch 2 times, most recently from 9e0eaef to 6000ed6 Compare January 31, 2018 14:07

Akanksha Mathur added 3 commits February 1, 2018 11:11

CP-26717: Port Gpumon interface from Camlp4 to PPX-based RPC scheme

e97fd11

Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

CP-26717: Port Gpumon client generation to PPX

1d98b2c

Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

Add Gpumon CLI tool for debugging

e65473a

Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

minishrink force-pushed the gpu_ppx branch from 6000ed6 to 8d1c99c Compare February 1, 2018 11:12

mseri approved these changes Feb 1, 2018

View reviewed changes

mseri closed this Feb 1, 2018

mseri reopened this Feb 1, 2018

Akanksha Mathur added 2 commits February 1, 2018 14:40

CP-26717: Runtest builds and runs Gpumon CLI

8a007fa

Signed-off-by: Akanksha Mathur <akanksha.mathur@citrix.com>

minishrink force-pushed the gpu_ppx branch from 9f99089 to 8a007fa Compare February 1, 2018 14:40

mseri approved these changes Feb 1, 2018

View reviewed changes

mseri merged commit bda1536 into xapi-project:master Feb 2, 2018

minishrink deleted the gpu_ppx branch February 15, 2018 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CP-26717: Port Gpumon to PPX-based RPCs #196

CP-26717: Port Gpumon to PPX-based RPCs #196

minishrink commented Jan 28, 2018 •

edited

Loading

mseri Jan 29, 2018

minishrink Jan 29, 2018

mseri Jan 29, 2018

mseri Jan 29, 2018 •

edited

Loading

minishrink Jan 29, 2018

mseri Jan 29, 2018

mseri Jan 29, 2018

minishrink Jan 29, 2018

mseri Jan 29, 2018 •

edited

Loading

mseri left a comment

minishrink Jan 29, 2018

mseri Jan 29, 2018

mseri left a comment

lindig Jan 29, 2018

lindig Jan 29, 2018

minishrink Jan 29, 2018

mseri Jan 29, 2018

lindig Jan 29, 2018

lindig Jan 29, 2018

lindig Jan 29, 2018

lindig Jan 29, 2018

mseri left a comment

coveralls commented Jan 30, 2018 •

edited

Loading

mseri commented Feb 1, 2018

		@@ -17,14 +17,13 @@ open Xcp_client

		let xml_url () = "file:" ^ xml_path

		module Client = Gpumon_interface.Client(struct

CP-26717: Port Gpumon to PPX-based RPCs #196

CP-26717: Port Gpumon to PPX-based RPCs #196

Conversation

minishrink commented Jan 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mseri Jan 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mseri Jan 29, 2018 • edited Loading

Choose a reason for hiding this comment

mseri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mseri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mseri left a comment

Choose a reason for hiding this comment

coveralls commented Jan 30, 2018 • edited Loading

mseri commented Feb 1, 2018

minishrink commented Jan 28, 2018 •

edited

Loading

mseri Jan 29, 2018 •

edited

Loading

mseri Jan 29, 2018 •

edited

Loading

coveralls commented Jan 30, 2018 •

edited

Loading