CA-395174: Try to unarchive VM's metrics when they aren't running #5855

psafont · 2024-07-18T12:31:39Z

Non-running VMs' metrics are stored in the coordinator. When the coordinator is
asked about the metrics try to unarchive them instead of failing while trying
to fetch the coordinator's IP address.

This needs to force the HTTP method of the query to be POST

Also returns a Service Unavailable when the host is marked as Broken.

A build with these changes has passed the automated tests that discovered the issue: Job 4055228

ocaml/xapi/rrdd_proxy.ml

edwintorok

Looks good, some minor comments.

Non-running VMs' metrics are stored in the coordinator. When the coordinator is asked about the metrics try to unarchive them instead of failing while trying to fetch the coordinator's IP address. This needs to force the HTTP method of the query to be POST Also returns a Service Unavailable when the host is marked as Broken. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>

Forces users to use an address, instead of being implicit, this avoid the underlying cause for the issue fixed in the previous commit: it allowed a coordinator to call Pool_role.get_master_address, which always fails. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>

This makes the selection of the action obvious, previously the two booleans made it hazy to understand the decision, and was part of the error why the coordinator tried to get the coordinator address from the pool_role file (and failed badly) Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>

Currently a List.assoc is used, which raises an unhandled exception. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>

lindig

Without being familiar with the domain, I can't really review this. I did not spot anything, though.

Vincent-lau · 2024-07-19T11:39:14Z

ocaml/xapi/rrdd_proxy.ml

              unarchive ()
-            else
+          | Slave _, true, _ | Slave _, _, false ->


If the current host is a slave, and the owner is a the local host (which is the Slave _, true, _ case), why do we need to ask the coordinator here? Although it seems that line 69 already checks that the metrics is not available locally

When a host reboot and starts running a VM again, it needs to fetch the VM's metrics, leading to this convoluted flow:
member ----get_metrics---> coordinator ----get_metrics---> member ---unarchive---> coordinator

edwintorok reviewed Jul 19, 2024

View reviewed changes

ocaml/xapi/rrdd_proxy.ml Outdated Show resolved Hide resolved

edwintorok reviewed Jul 19, 2024

View reviewed changes

ocaml/xapi/rrdd_proxy.ml Outdated Show resolved Hide resolved

edwintorok approved these changes Jul 19, 2024

View reviewed changes

psafont added 5 commits July 19, 2024 13:08

http-lib: avoid double-queries to the radix tree

110c112

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>

rrdd_proxy: Return 400 on bad vm request

3658806

Currently a List.assoc is used, which raises an unhandled exception. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>

psafont force-pushed the rrd-unpack branch from 4dc3793 to 3658806 Compare July 19, 2024 13:55

lindig approved these changes Jul 22, 2024

View reviewed changes

Vincent-lau approved these changes Jul 22, 2024

View reviewed changes

psafont merged commit 54abab8 into xapi-project:master Jul 22, 2024
15 checks passed

psafont deleted the rrd-unpack branch July 22, 2024 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA-395174: Try to unarchive VM's metrics when they aren't running #5855

CA-395174: Try to unarchive VM's metrics when they aren't running #5855

psafont commented Jul 18, 2024 •

edited

Loading

edwintorok left a comment

lindig left a comment

Vincent-lau Jul 19, 2024

psafont Jul 22, 2024

CA-395174: Try to unarchive VM's metrics when they aren't running #5855

CA-395174: Try to unarchive VM's metrics when they aren't running #5855

Conversation

psafont commented Jul 18, 2024 • edited Loading

edwintorok left a comment

Choose a reason for hiding this comment

lindig left a comment

Choose a reason for hiding this comment

Vincent-lau Jul 19, 2024

Choose a reason for hiding this comment

psafont Jul 22, 2024

Choose a reason for hiding this comment

psafont commented Jul 18, 2024 •

edited

Loading