Skip to content

Conversation

changlei-li
Copy link
Contributor

No description provided.

snwoods and others added 30 commits May 27, 2025 11:22
Signed-off-by: Steven Woods <steven.woods@cloud.com>
Besides the errors, Xapi_cluster and Xenopsd use the exact same Observer
RPC definitions. Add a new Observer error (as the unique errors for
cluster/xenops are not applicable to the Observer functions anyway) and
use common code to remove this duplication.

Signed-off-by: Steven Woods <steven.woods@cloud.com>
Currently, xapi-storage-script uses the presence/absence of a smapi
observer config file to determine whether it should create traces. This
only happens on startup which means smapiv3 traces will often not be
created when they should be.

This commit updates the Smapi Observer forwarder to use an RPC client to
send messages to xapi-storage-script, updating it on any relevant
changes to the Observer.

Signed-off-by: Steven Woods <steven.woods@cloud.com>
…ross modules

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Host.cpu_info list no longer contains a value associated with a "features" key,
but the CLI implementation was hardcoded to expect it. Instead use the
cpu_info_features keys from xapi-consts.

Add the pool version of the command.

Additionally document their output format.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
on every toolstack restart on hosts without Nvidia GPUs, xapi complains about a
non-existent directory:

xapi: [error||0 |dbsync (update_env) R:733fc2551767|xapi_vgpu_type]
Failed to create NVidia compat config_file: Sys_error("/usr/share/nvidia/vgx: No such file or directory")

Handle the directory's absence without propagating the error.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
…_line

Otherwise this quite frequently logs something like:

```
xcp-networkd: [error||22 |dbsync (update_env)|network_utils]
Error in read one line of file: /sys/class/net/eth0/device/sriov_totalvfs,
exception Unix.Unix_error(Unix.ENOENT, "open", "/sys/class/net/eth0/device/sriov_totalvfs")
Backtrace ...
```

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
non_debug_receive will dump an error after reading the last bits of the header,
which is expected and handled by the caller appropriately:

```
xenopsd-xc: [error||67 |Async.VM.resume R:beac7be348f1|xenguesthelper]
Memory F 6019464 KiB S 0 KiB T 8183 MiB <--- dumping error

xenopsd-xc: [debug||67 |Async.VM.resume R:beac7be348f1|mig64]
Finished emu-manager result processing <---- End_of_file expected and handled
```

Don't pollute the logs and instead just log the same info with 'debug' when the
error is End_of_file.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
It's an in-house joke that most of `xapi`'s errors are "expected"
(either because they're handled by the caller after being logged or
because they are frequent and benign) and knowing which ones are unusual
and actually critical takes a lot of learning. Try to remove at least
some of the most obvious ones from the toolstack startup/VM resume,
either removing the logs completely or downgrading it to `debug` when
appropriate.

Manually tested in the appropriate scenarios.
…ions

While one could potentially filter for this "value", I don't think it's that
useful and adds noise to the completions, like here:

```
$ xe vm-list resident-on=
64c11cad-2c52-4dea-aea6-5fae0e720699
\<not\ in\ database\>
7f566729-0ee7-47c4-853d-2c5f3a195ad4
```

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
log needs to be moved one line below the first assignment into the
"description" variable, otherwise it's always going to be an empty string

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
We used to rely on Bash's completion removing duplicate entries from the
suggestions, but processing them in the first place is unnecessary (and will
slow down completion since there's usually an 'xe' command run for each entry
in the wordlist).

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Provide a helpful description for some parameters of 'xe vm-list', compare
before:
```
$ xe vm-list resident-on=
64c11cad-2c52-4dea-aea6-5fae0e720699
7f566729-0ee7-47c4-853d-2c5f3a195ad4
```

with after:
```
$ xe vm-list resident-on=
64c11cad-2c52-4dea-aea6-5fae0e720699  - hpmc30
7f566729-0ee7-47c4-853d-2c5f3a195ad4  - hpmc29
```

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
No completion was provided before, and it's handled properly now:
```
$ xe vm-list suspend-SR-uuid=
08906228-cbf6-dad4-720d-e581df11510a  - SR1
37b734f0-e594-0e48-2114-cd063241dd36  - SR2
```

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
…starting up

Otherwise, errors like this can be a little bit confusing in the logs:

```
[dispatch:session.login_with_password |backtrace]
Raised Db_exn.Read_missing_uuid("host", "", "236acc01-0f95-4af1-8b35-f5a2fb51c354")
```

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
It is a XenServer-specific expectation that these certificates should always be
present on the host (they are not provided on XCP-ng by default, for example,
due to licensing restrictions).

The error log is not followed by any exception, and the missing UEFI
certificates do not interrupt any operation, they just mean the host is set up
differently (which can be verified by the clients with appropriate API calls
like pool-get-guest-secureboot-readiness):

```
xapi: [error||Sync UEFI certificates on host with XAPI db |xapi_host]
check_valid_uefi_certs: missing KEK.auth in /var/lib/varstored

xapi: [error||Sync UEFI certificates on host with XAPI db |xapi_host]
check_valid_uefi_certs: missing db.auth in /var/lib/varstored
```

These warrant a warning instead of an error log.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Handle a few more cases, fix logging, reduce slowdown
from https://en.wikipedia.org/wiki/Passwd#Password_file
uid_info as following format

username:password:uid:gid:gecos:homedir:shell

Regarding gecos, it is recommended as follows

Typically, this is a set of comma-separated values including the user's
full name and contact details.

However, this information comes form AD and user may mis-configure
it with `:`, which is used as seperator. In such case, the parse
would failed.

Enhance the parse function to support `:` in gecos, other fields does
not likely contain it.

Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
from https://en.wikipedia.org/wiki/Passwd#Password_file uid_info as
following format

username:password:uid:gid:gecos:homedir:shell

Regarding gecos, it is recommended as follows

Typically, this is a set of comma-separated values including the user's
full name and contact details.

However, this information comes form AD and user may mis-configure it
with `:`, which is used as seperator. In such case, the parse would
failed.

Enhance the parse function to support `:` in gecos, other fields does
not likely contain it.
Instruments:
- `VM.add`,
- `VM.stat`,
- `VM.exists`,
- `VM.list`.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Instruments `switch_rpc` according to OpenTelemetry standard on
instrumenting rpc calls.
- `server.address` is the name of the message queue.

Intruments sending the message on a queue according to OpenTelemetry
standard on instrumenting messaging.
- `destination` is the name of the message queue.

`Tracing.with_tracing` now accepts an optional argument to set the Span
Kind.

Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
The regex removed parameters with a particular suffix instead of checking for
the whole name.

For example, after providing a uuid parameter to xe vif-move, network-uuid
would no longer be suggested:
```
$ xe vif-move <TAB><TAB>
network-uuid=  uuid=

$ xe vif-move uuid=0af7619c-0798-c5be-5a0e-20813a48c7df <TAB><TAB>
```

This is fixed now:
```
$ xe vif-move uuid=0af7619c-0798-c5be-5a0e-20813a48c7df <TAB><TAB>
$ xe vif-move uuid=0af7619c-0798-c5be-5a0e-20813a48c7df network-uuid=
```

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
There were several optional boolean parameters that were checked and used in
cli_operations but were not included in cli_frontend (therefore would not be
shown in suggestions or 'xe help command').

Add these to cli_frontend. is_unique is a ... unique case because it does not
follow the style of the CLI parameters (which use dashes, not underscores, to
separate words), so add 'is-unique' to cli_frontend but handle both in
cli_operations.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
This code hasn't been used for 10+ years.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Instrument xenops vm non-atomic functions.
Instruments:
- `VM.add`,
- `VM.stat`,
- `VM.exists`,
- `VM.list`.

Instruments `switch_rpc` according to OpenTelemetry standard on
instrumenting rpc calls.
- `server.address` is the name of the message queue.

Intruments sending the message on a queue according to OpenTelemetry
standard on instrumenting messaging.
- `destination` is the name of the message queue.

`Tracing.with_tracing` now accepts an optional argument to set the Span
Kind.

![image](https://github.com/user-attachments/assets/b58a3309-6e31-4e40-84a5-c053bc00d5c8)
Maintenance mode is entered by running Host.evacuate, followed
by promoting a new pool coordinator and shutting down XAPI.

We only export spans every 30s, so we may miss exporting the span for Host.evacuate.
Ensure that we at least trigger the export when XAPI is about to shutdown.
Do not wait for the export to finish, because this could take a long time
(e.g. when exporting to a remote Jaeger instance).

After this change I now see Host.evacuate properly in the exported trace.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
json_reformat cannot handle newline delimited json, it is easier if we have a command to reformat it ourselves.

This can be useful when debugging why a trace is missing elements. Traces are stored as newline-delimited JSON
in /var/log/dt/zipkinv2/json, however json_reformat cannot process them directly, and the lines can be very long and difficult to read otherwise.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
#6525)

Soon after Host.evacuate XAPI could be restarted (e.g. on coordinator
promotion).
But we only export traces every 30s, so we lose the spans from the last
30s, including the toplevel Host.evacuate span (which although long
running is only emitted on completion).

After this change I'm now able to see Host.evacuate and all the migrate
calls in the exported distributed trace.
Christian Lindig and others added 26 commits July 2, 2025 16:37
Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
In addition to the file name, write the VDI UUID to xenstore for the
guest agent to pick up.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Mostly for development: when inserting a CD we wait before we expect the
VM to have recognised it such that we can start sysprep. Make this
configuratble for easier experimentation.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Remove VDIs only when the feature is enabled.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
As per the commit message for fbda87d (12 years ago): "[ECONNREFUSED
retry logic] probably should be pushed into the generic XCP function,
since the same behaviour will probably be expected for other
interfaces." and David was right!

Moving the functionality to Xcp_client.ml itself so that it isn't
duplicated across 3 different clients.
A new API call that works in collaboration with a guest agent on a
Windows VM to pass a `unattend.xml` to Windows sysprep.

* Receives the XML content as a parameter. The XE interface supports
defining this from a file but an API client has to pass the content as a
parameter.
* The API call creates a local ISO SR; creates an ISO with
`unattended.xml` and inserts the ISO into the VMs CD-ROM, then notifies
the VM via xenstore.
* Outside of scaffolding for the API, everything in `vm_sysprep.ml`.
* Exceptions raised are handled in `Xapi_vm.sysprep()`
When data/cant_suspend_reason is present in xenstore (renamed
data-cant-suspend-reason in "other" guest metrics), it would exclude
operations involving suspend from the allowed operations list, but still allow
actually suspending (strict=true during allowed operations checks but could be
false otherwise. This was additionally overriden by assuming that PV driver
presence guarantees feature-suspend).

Suspending in such a situation would always result in a crash, usually with
the following error:

```
$ xe vm-suspend uuid=d546c8c2-bcb1-0ed6-7931-d0fc9393ccb2
The server failed to handle your request, due to an internal error.
The given message may give details useful for debugging the problem.
message: xenopsd internal error:
Device_common.QMP_Error(8,
"{\"error\":
            {\"class\":\"GenericError\",
             \"desc\":\"State blocked by non-migratable device '0000:00:07.0/nvme'\",
             \"data\":{}
             },
  \"id\":\"qmp-000006-8\"}")
```

So disallow suspending altogether when cant_suspend_reason is present.
Log the QEMU error directly in the XAPI error as well:

```
$ xe vm-suspend uuid=d546c8c2-bcb1-0ed6-7931-d0fc9393ccb2
You attempted an operation on a VM which lacks the feature.
vm: d546c8c2-bcb1-0ed6-7931-d0fc9393ccb2 (Windows 10 (64-bit))
reason: {"error":{"class":"GenericError","desc":"State blocked by non-migratable device '0000:00:07.0/nvme'","data":{}},"id":"qmp-000012-9"}
```

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
This is a partial revert of 1a46f33. It retains the deactivate and
detach functions introduced but restores the original unplug function so
that the VBD_unplug atom is completely unchanged when
xenops_vbd_plug_unplug_legacy=true instead of running deactivate
followed by detach.

This will fix the S(Does_not_exist) Xenopsd errors we are seeing in
some VBD_unplug calls, until a fix for the split functions is found.

Signed-off-by: Steven Woods <steven.woods@cloud.com>
This was missing from previous commit to control the time waited for a
VM to recognise a CD. In the long run we would like to replace this with
a protocol that tells us when the guest is ready.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
This was missing from previous commit to control the time waited for a
VM to recognise a CD. In the long run we would like to replace this with
a protocol that tells us when the guest is ready.
Don't assume the presence of PV drivers magically solves everything,
actually honour `feature-suspend` and make `data/cant_suspend_reason`
critical.

----

Manually tested to not break the following cases (thanks to @dinhngtu):

- [x] Test on migratable Windows (shouldn't block suspend) - OK

- [x] Test on Windows with a non-migratable device (should block
suspend) - OK (VM lacks feature)

- [x] Test on migratable Linux with unplug - OK

- [x] Test on Linux with a non-migratable device - OK (VM lacks feature)

- [x] Test nopv Windows/Linux without unmigratable devices with manual
`feature-suspend` xenstore entry forced

    - [x] Windows BIOS: nopv OK

    - [x] Linux BIOS: nopv OK

    - [x] Linux UEFI: nopv OK
This script serves as one half of a `vhd-tool`-like solution for QCOW,
only offering the export side (converting a raw file to a qcow2 stream).
We can't use `qemu-img` or some other ready-made library because these
create a temporary file (seeking in both the source and the
destination), which we want to avoid.

The script is taken from upstream QEMU, with some modifications for our
use-case:
* we want the script to only operate on "raw" files
* it needs to work on block special files as opposed to regular files
* and it needs to export a "difference" between two files, just like
`vhd-tool` does

The script is currently unused.

The import half (`qcow2 stream->raw file`) was merged in
[`ocaml-qcow`](mirage/ocaml-qcow#127) and will
later be integrated into xapi together with this script through a
`qcow_tool_wrapper`.
From strace/gdb, XS9 qemu requires /proc/self/fd/<fd> to work well
This is due to systemd/libudev update.

Just bind mount /proc/self/ to the chroot to permit qemu access

```
1047 openat(AT_FDCWD, "/proc/self/fd/46", O_RDONLY|O_NOCTTY|O_CLOEXEC|O_PATH) = -1 ENOENT (No such file or directory)
1048 openat(AT_FDCWD, "/proc/", O_RDONLY|O_NOCTTY|O_CLOEXEC|O_PATH) = -1 ENOENT (No such file or directory)

    ../sysdeps/unix/sysv/linux/fstatfs64.c:30
    out>, dir_fd=<optimized out>) at ../src/basic/stat-util.c:566
    magic_value=1650812274) at ../src/basic/stat-util.c:369
    fd=<optimized out>) at ../src/basic/stat-util.h:66
    verify=<optimized out>) at
../src/libsystemd/sd-device/sd-device.c:221
    (ret=0x7ffc67ebba20, syspath=0x7ffc67ebb950
"/sys/bus/usb/devices/usb1", strict=true)
    at ../src/libsystemd/sd-device/sd-device.c:271
    (syspath=0x7ffc67ebb950 "/sys/bus/usb/devices/usb1",
ret=0x7ffc67ebba20)
    at ../src/libsystemd/sd-device/sd-device.c:280
```

Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
This is a partial revert of 1a46f33. It retains the deactivate and
detach functions introduced but restores the original unplug function so
that the VBD_unplug atom is completely unchanged when
xenops_vbd_plug_unplug_legacy=true instead of running deactivate
followed by detach.

This will fix the S(Does_not_exist) Xenopsd errors we are seeing in some
VBD_unplug calls, until a fix for the split functions is found (I have
created CA-413304 to track that issue)
On very busy systems, the wait may take much longer than expected. Instead of
hard-coding the expected value, wait once to estimate the time aded to the
delays, and then use it to compare the times.

Also change to use Mtime.Spans instead of using integers.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
On very busy systems, the wait may take much longer than expected.
Instead of hard-coding the expected value, wait once to estimate the
time aded to the delays, and then use it to compare the times.

Also change to use Mtime.Spans instead of using integers.

I've tested this locally on unit-tests, and @ydirson has tested this on
his system that usually needs a patch to disable these tests because
they run on an cpu-overprovisioned xen system
Previously, encountering unknown features such as ATOMIC_PAUSE or SR_CACHING
in SM.feature would trigger an error in xapi. However, these features can be
used internally by SM and are not necessarily indicative of a misconfiguration.

This change downgrades such cases from error to warning, allowing normal
operation while still notifying the user that an unrecognized feature is
present.

Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
Replace code that loops waiting for an update in xenstore with a watch.
This should eliminate the chance of a race condition.

Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
Previously, encountering unknown features such as ATOMIC_PAUSE or
SR_CACHING in SM.feature would trigger an error in xapi. However, these
features can be used internally by SM and are not necessarily indicative
of a misconfiguration.

This change downgrades such cases from error to warning to still notify
the user that an unrecognized feature is present but can be expected. It
fixes #5353.
Replace code that loops waiting for an update in xenstore with a watch.
This should eliminate the chance of a race condition. We are introducing
a small delay to give sysprep the chance to read its file before we
eject the CD.
- Add unitest for usb_reset for coverage
- Move mock to unittest.mock as python3 only now
- exit -> sys.exit

Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>
…h for XS9 (#6572)

The approach work for both XS8 and XS9

- 4357061 for xs9
- 4357062 for xs8

```
Date:   Fri Jul 4 15:40:00 2025 +0800

    CA-393417: Bind mount /proc/<pid> into chroot

    From strace/gdb, XS9 qemu requires /proc/self/fd/<fd> to work well
    This is due to systemd/libudev update.

    Just bind mount /proc/self/ to the chroot to permit qemu access

    ```
    1047 openat(AT_FDCWD, "/proc/self/fd/46", O_RDONLY|O_NOCTTY|O_CLOEXEC|O_PATH) = -1 ENOENT (No such file or directory)
    1048 openat(AT_FDCWD, "/proc/", O_RDONLY|O_NOCTTY|O_CLOEXEC|O_PATH) = -1 ENOENT (No such file or directory)

        ../sysdeps/unix/sysv/linux/fstatfs64.c:30
        out>, dir_fd=<optimized out>) at ../src/basic/stat-util.c:566
        magic_value=1650812274) at ../src/basic/stat-util.c:369
        fd=<optimized out>) at ../src/basic/stat-util.h:66
        verify=<optimized out>) at
    ../src/libsystemd/sd-device/sd-device.c:221
        (ret=0x7ffc67ebba20, syspath=0x7ffc67ebb950
    "/sys/bus/usb/devices/usb1", strict=true)
        at ../src/libsystemd/sd-device/sd-device.c:271
        (syspath=0x7ffc67ebb950 "/sys/bus/usb/devices/usb1",
    ret=0x7ffc67ebba20)
        at ../src/libsystemd/sd-device/sd-device.c:280
    ```

    Signed-off-by: Lin Liu <Lin.Liu01@cloud.com>

commit fc5f98b
Author: Lin Liu <Lin.Liu01@cloud.com>
Date:   Tue Jul 1 15:56:18 2025 +0800

    CA-393417: Drop device controller of cgroup v1

    For deprivileged qemu, following ops are performed
    - bind mount /dev/ to qemu chroot, so qemu can access it
    - cgroup controller deny all devices, except the target usb device

    However, new XS updated to cgroup v2 and the devices controller
    available anymore.

    Instead of bind mount all /dev folder, only the permitted usb
    devices are created into the chroot. Thus, the cgroup controller
    is no longer necessary.

    Besides, there are following updates accordingly
    - qemu pid is no longer necessary as command line args, as cgroup
    is dropped.
    - save and restore system /etc/ devices file ownership is no longer
    necessary. New file is cloned into chroot instead of bind mount system
    device file, so only need to set ownership of chroot file directly
    ```
Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
The user attempted to designate a new master, but the operation failed.

The root cause is as follows:
After the new proposed master successfully sent the `commit_new_master`
API call to the old master, it attempted to send a `logout` request.
However, at this point, the old master was already rebooting its xapi
service, causing the `logout` to fail. As a result, the process of
designating the new master was marked as failed, and the status changed
to `broken`. In high-load environments, there can be a delay in sending
the logout request, increasing the likelihood that it is sent after the
old master has already started rebooting.

If `commit_new_master` has already been successful, the success of the
subsequent `logout` operation should not be considered critical.
Therefore, the solution is to ignore the result of the `logout` request
if `commit_new_master` was successful.

Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
The user attempted to designate a new master, but the operation failed.

The root cause is as follows:
After the new proposed master successfully sent the `commit_new_master`
API call to the old master, it attempted to send a `logout` request.
However, at this point, the old master was already rebooting its xapi
service, causing the `logout` to fail. As a result, the process of
designating the new master was marked as failed, and the status changed
to `broken`. In high-load environments, there can be a delay in sending
the logout request, increasing the likelihood that it is sent after the
old master has already started rebooting.

If `commit_new_master` has already been successful, the success of the
subsequent `logout` operation should not be considered critical.
Therefore, the solution is to ignore the result of the `logout` request
if `commit_new_master` was successful.
@changlei-li changlei-li merged commit 37053b8 into feature/host-network-device-ordering Jul 10, 2025
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.