Skip to content

[Merge lcm to lcm feature] For trusted certificates feature#6940

Merged
minglumlu merged 9 commits intofeature/26.1-lcm/trusted-certsfrom
26.1-lcm
Mar 10, 2026
Merged

[Merge lcm to lcm feature] For trusted certificates feature#6940
minglumlu merged 9 commits intofeature/26.1-lcm/trusted-certsfrom
26.1-lcm

Conversation

@minglumlu
Copy link
Member

No description provided.

edwintorok and others added 9 commits February 27, 2026 16:20
This still has race conditions (as the domain is built the amount of free
memory on the node will decrease, so we're essentially double counting during
that time), but the effects are confined to the race window.
Previously free memory on a NUMA node would only decrease, so after a certain
number of guest reboots or migrations the NUMA optimization would stop working
completely (until a toolstack restart).

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Versions of Xen without per-node claim support won't allow using up
all the memory from node0 (it'll move some of its memory to other nodes).
On versions of Xen with per-node claim support it'll respect the claim,
but it is undesirable to completely run out of memory on node0, since some
devices may require the low 4GiB of RAM for DMA.

Xen would reserve 2^32 on node0, and only use it as last resort.
However we don't know how much of that has been used up already.
For now reserve just 2^31 by default, and make this configurable in
xenopsd.conf.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
…abled

(cherry-pick from commit b6e2a9c)

It assumes that the switching will not cause any differences on the
availability of features required by HA.

Signed-off-by: Ming Lu <ming.lu@cloud.com>
#6930)

…abled

(cherry-pick from commit b6e2a9c)

It assumes that the switching will not cause any differences on the
availability of features required by HA.

PR against the master branch:
#6880
The `check_references` function needs to skip null
reference before calling `get_snapshot`, or it will
try to fetch a database record for Ref.null, leading
to an exception.

Signed-off-by: Changlei Li <changlei.li@cloud.com>
…t renaming the interface

There are some use cases according to which the user would like to
perform a network reset by keeping the interface's name intact (prevent
reseting the interface name rules). This commit allows the user to run
the xe-reset-networking script by requesting the aforementioned
behavior.

This is a backport of #6852

Signed-off-by: Stefanos Gerangelos <stefanos.gerangelos@vates.tech>
The `check_references` function needs to skip null reference before
calling `get_snapshot`, or it will try to fetch a database record for
Ref.null, leading to an exception.

This the backport of PR #6933.
…eset without renaming the management interface (#6937)

There are some use cases according to which the user would like to
perform a network reconfiguration of the management interface by keeping
the interface's name intact. This commit allows the user to run the
xe-reset-networking script by requesting the aforementioned behavior.

This is a backport of PR
[#6852](#6852).
…d VMs are not placed in a single NUMA node (#6929)

Xenopsd takes the minimum between the actual free memory on a NUMA node,
and an estimate.
The estimate never got implemented, so it just took the last free memory
value, which resulted in the free memory only ever decreasing on a NUMA
node, and once the host memory is fully used NUMA optimization would
stop working for newly (re)booted VMs, even if meanwhile other VMs were
stopped.

Master has a different fix (it uses claims, and tracks memory usage
accurately), but that requires a newer version of Xen.
Initially it was decided that this bug wouldn't get fixed on the LCM
branch, but that decision has changed now.

Instead it should track how much memory the pending VM starts are using.
This estimate can become inaccurate fairly quickly on old versions of
Xen without per-node claims. To reduce that window we only use the
estimates when we have other pending domain builds.
When we have 0 pending domain builds, we reset the estimate to the
actual free memory on the node (note: reset, *not* take the minimum).

Then finally we take the minimum between the actual free memory on a
node and the above estimate.

In practice this is not as simple as that, because Xen doesn't always
listen to us when we give it a CPU affinity hint, and sometimes it
places the memory on other nodes. This happens particularly when the
last 4GiB (the 32-bit DMA zone) is used up on node0: Xen would only use
this as a last resort (some devices can only work if you have memory
available here).
To fix this xenopsd considers node0 to have less memory available than
it actually does (so it'll prefer other nodes instead of using up the
last 4GiB). And finally it'll spread the last VMs across all nodes, when
we can't accurately estimate how much memory ends up where (we don't
know how much of the DMA32 heap is already used or not by devices).

This bug also affects master in a different way: there due to the claims
Xen does *exactly* what we ask it to do, and will completely deplete the
memory on node0. But that could cause failures with some devices, so we
might need to include the 2nd commit in some form on master too (maybe
don't reserve the full 4GiB, but only a few hundred MiB?).

Some manual testing shows promising results: previously VMs would get
all/all affinity, and after this fix they get the proper affinity again.
Getting the automated tests to pass is still an open problem due to the
4GiB handling, which will require some test changes too to not be so
strict, so opening this as a draft.
@minglumlu minglumlu merged commit 7b7a701 into feature/26.1-lcm/trusted-certs Mar 10, 2026
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants