-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-20676: Add gpumon interface #151
Conversation
gpumon/gpumon_interface.ml
Outdated
type vgpu_compatibility = unit (* temporarily unit, until real type is determined *) | ||
|
||
external get_pgpu_metadata: debug_info -> pgpu_address -> pgpu_metadata = "" | ||
external get_pgpu_vm_compatibility: debug_info -> pgpu_metadata -> dom_id -> vgpu_compatibility list = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need the address of the current pGPU as well (@mseri).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the return type, we essentially just need a Boolean, which give the answer to the question "can this VM be live migrated to this pGPU", plus a reason in case the answer is "no". Possible reasons are "host driver", "guest driver", "gpu", "other", or a combination of those. How about this type:
type incompatibility_reason = Host_driver | Guest_driver | GPU | Other
type compatibility = Compatible | Incompatible of incompatibility_reason list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have extended the bindings in a way that may require some changes here. To get the pGPU handle, and then the metadata, you will need the pci.bus_id
(a string) as pgpu_address
.
Compatibility here can be any type useful to xapi, we can always reshape it inside gpumon
when calling the idl to expose the correct informations. Seems good to simplify it as suggested by Rob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For specifying the PCI device, I had imagined we might be able to use the xapi database's (misleadingly-named) pci_id
field of a pci
db entry, provided by xapi with code like Db.PCI.get_pci_id ~__context ~self:pci
so it would be of format "%04x:%02x:%02x.%01x"
See http://wiki.xen.org/wiki/Bus:Device.Function_%28BDF%29_Notation
and/or xen-api.git/ocaml/xapi/pciops.ml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood the comment there. We do need the pgpu_address
only to get the metadata. Fr the compatibility we need the current vgpu (we get it using the domid) and the destination pgpu metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so:
external get_pgpu_vm_compatibility: debug_info -> pgpu_address -> pgpu_metadata -> dom_id -> compatibility
...where pgpu_address
is for the pGPU that is currently in use, and pgpu_metadata
is for the one we want to migrate the VM to. We can add labels or a comment to clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yes, I see what you mean. Indeed, we need the current pgpu and domid to get the current vgpu metadata, and we need the destination pgpu metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought we could get the metadata about the VM's (current) vGPUs from the domid without needing to specify which pgpu (or pgpus) the VM is currently using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomassa Rob told me we can but it's not as straight-forward as we like (IIRC it involves looping through the pGPUs to check which one we're looking for), so having it here makes it simpler.
In any case, this needs to go onto a feature branch. |
gpumon/gpumon_interface.ml
Outdated
|
||
type pgpu_address = string | ||
type pgpu_metadata = string | ||
type dom_id = int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called domid
in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that domid
is a string for the bindings and the underlying library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can translate it to a string inside gpumon where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I felt int
was better for this interface, so that the translation to string will be done in gpumon rather than in xapi.
gpumon/gpumon_client.ml
Outdated
open Gpumon_interface | ||
open Xcp_client | ||
|
||
let json_url () = "file:" ^ json_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look like a file URL. I expected to see "file://"^json_path
. https://en.wikipedia.org/wiki/File_URI_scheme#Unix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json_path
is set in gpumon/gpumon_interface.ml
:
let json_path = "/var/xapi/" ^ service_name ^ ".json"
so that json_url
ends up as file:/var/xapi/gpu.json
(as a side note, i'm changing the service_name to gpumon
instead of gpu
). I'm not sure if file:/...
also works as the recommended file://...
, but memory/memory_interface.ml does the same thing, and my assumption was that it worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying it out at http://www.freeformatter.com/url-parser-query-string-splitter.html
suggests that it is accepted. I still find it dubious but would have to read rules for URLs again whether this is syntactically valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wikipedia says:
The // after the file: is part of the general syntax of URLs. (The double slash // should always appear in a file URL according to the specification, but in practice many Web browsers allow it to be omitted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, seeing as we seem to have had no problem with the memory interface (AFAIK) then maybe it's better to conform to what we're already doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue the other way round: do the right thing and check that it works. If not, do the wrong thing but document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The xenops interface appears to do the same (uses file:/...
).
Edit: All the other interfaces also do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have a separate commit to change them all to use //
but I see no reason to if it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonludlam, @robhoes we need your opinion, here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I don't see json_url
used anywhere, so it looks like dead code that should just be removed.
However, the same question applies to xml_url
below. The place where this is used is in the function http_rpc
in lib/xcp_client.ml, which passes it to Uri.of_string
, which is provided by a library. I assume that this is the format that is accepted by this library, because indeed we use this format everywhere. For this PR, IMO, we should just comply with the other interfaces in xcp-idl, which is what the current patch does.
gpumon/gpumon_interface.ml
Outdated
* GNU Lesser General Public License for more details. | ||
*) | ||
|
||
let service_name = "gpu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client is called "gpumon" (if that is what ~dststr:"gpumon"
means) but the service here is "gpu", is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right; I'm changing it to gpumon
now. (we previously had everything as gpu
and forgot to change this one)
gpumon/gpumon_interface.ml
Outdated
let service_name = "gpu" | ||
let queue_name = Xcp_service.common_prefix ^ service_name | ||
let json_path = "/var/xapi/" ^ service_name ^ ".json" | ||
let xml_path = "/var/xapi/" ^ service_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the client, do we use these paths anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xml_path
is used by the xml_url
function in gpumon/gpumon_client.ml
, which is used in the xml_http_rpc
call below that in the same file. json_path
, as @robhoes pointed out above, isn't used, so I'm removing it.
^ New commit with all discussed changes. |
Looks good now. Please squash. |
gpumon/gpumon_interface.ml
Outdated
type incompatibility_reason = Host_driver | Guest_driver | GPU | Other | ||
type compatibility = Compatible | Incompatible of incompatibility_reason list | ||
|
||
(* Get the metadata for a pGPU, given its address (PCI bus ID). *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be (** text *)
so that it is read as a docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in new commit!
gpumon/gpumon_interface.ml
Outdated
(* Get the metadata for a pGPU, given its address (PCI bus ID). *) | ||
external get_pgpu_metadata: debug_info -> pgpu_address -> pgpu_metadata = "" | ||
|
||
(* Check compatibility between a VM's vGPU(s) and another pGPU. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I think this whole block of comments should be in
(** text ...
...
... *)
to be interpreted as a docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it.
Commit 1 of 2 for new gpumon interface. Signed-off-by: Frederico Mazzone <frederico.mazzone@citrix.com>
Commit 2 of 2 for new gpumon interface. Signed-off-by: Frederico Mazzone <frederico.mazzone@citrix.com>
Squashed! |
No description provided.