Skip to content

Conversation

psafont
Copy link
Member

@psafont psafont commented Sep 12, 2025

Now the HA shared SR and network constraint violations are used when plugging and unplugging.

When unplugging a pbd or pif, enabling a host, or adding a vbd or vif, the shared SR or network constraint violations could be violated, but the error used in these cases was that the operation blocked the failover planning. This was confusing because the main reason was not mentioned in the error. Instead use the
shared constraint violation error, and log a more descriptive message in the logs as info, because these can happen during normal operation and there's nothing dodgy going on.

I previously wanted to know Tina's opinion on how we change the reason in a way that can be better treated by clients and internationalized, but saw that the error used was simply not the right one.

@psafont psafont requested a review from kc284 September 12, 2025 10:45
Comment on lines +270 to +272
( if
true
&& Db.Pool.get_ha_enabled ~__context ~self:pool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is pre-existing, but true && is surely redundant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this before in the codebase, it seems a style choice, maybe to keep all the conditions at the same indentation level.

if List.mem vdi vdis then
let reason =
Printf.sprintf
"PBD.unplug will make protected VM %s not agile since it has \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this is that the refs in question are within the string and it makes it more difficult for the client to extract them. A separate error code with the refs as parameters which would be somehow thrown nested in or concatenated with ha_operation_would_break_failover_plan would be better. I think we do something similar with the authentication errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at those and see what I can do. The issue is not all reasons have the same structure, although most of them follow "operation X on object Y makes VM Z non-agile"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nice to formalize the reason somehow. But if that's not possible, a plain string is already an improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the erros to use a more structured format, but there are 3 cases, depending on the number of parameters, is this acceptable, or should I create new error messages?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perhaps also insert an error subcode? Because if the client wants to get the object record (so as to show the user the name instead of a reference, as we usually do in XC) they wouldn't know the difference between [explanation; vm; vdi] and [explanation; vm; vif]

Copy link
Member Author

@psafont psafont Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think all these complex cases should raise
Api_errors.ha_constraint_violation_network_not_shared and Api_errors.ha_constraint_violation_sr_not_shared, with an info logline that prints the pif, network, and pbd and vbd

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client will split the error parameter and may want to show the object's name, e.g. VM MyCoolVM has a problem instead of VM OpaqueRef:xyz has a problem. We do know the 2nd parameter is always a VM, so we can call VM.get_record (or resolve it among cache VMs). For the 3rd parameter we can't know whether to call VDI.get_record or VIF.get_record, but if there were a parameter like [error_subcode_vif; explanation; vm; vif] [error_subcode_vdi ; explanation; vm; vdi] the client would check the subcode. Distinct errors like ha_operation_would_break_failover_plan_disk ha_operation_would_break_failover_plan_network may be simpler, unless you have a specific reason to want only one ha_operation_would_break_failover_plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I hadn't seen that pattern before, I'll stick to the already-existing errors for this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API error reporting is an awkward spot.

  • It's kind of complicated to add, yet not very powerful
  • A string would often be more descriptive but makes i18n complicated - would require matching.
  • Modern APIs return JSON which has much more structure and could provide strings but also parameters

As a result we have weak error reporting that is neither comprehensive nor quick to add.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also note that the errors specified in the datamodel aren't verified at compile time like objects are, so users aren't consistent, see #6588 for an example

@psafont psafont changed the title idl/datamodel_errors: add reason to ha_operation_would_break_failover_plan Replace ha_operation_would_break_failover_plan with constraint errors Sep 12, 2025
…lugging

When unplugging a pbd, enabling a host, or adding a vbd, the shared SR
constraint violation could be violated, but the error used in these cases
was that the operation blocked the failover planning. This was confusing
because the main reason was not mentioned in the error. Instead use the
SR constraint violation error, and log a more descriptive message in the
logs as info, because these can happen during normal operation and
there's nothing dodgy going on.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
…d unplugging

When unplugging a PIF, enabling a host, or adding a VIF, the shared
network constraint violation could be violated, but the error used in
these cases was that the operation blocked the failover planning. This
was confusing because the main reason was not mentioned in the error.
Instead use the network constraint violation error, and log a more
descriptive message in the logs as info, because these can happen during
normal operation and there's nothing dodgy going on.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
It didn't add any useful information to the error. Also cleaned up the
formatting of some comments found during the patch series.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
@psafont psafont added this pull request to the merge queue Sep 15, 2025
Merged via the queue into xapi-project:master with commit 0010c74 Sep 15, 2025
15 checks passed
@psafont psafont deleted the dev/pau/ha_errs branch September 16, 2025 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants