stream_vdi: Only process allocated clusters for VHD and QCOW on XVA export #6786

last-genius · 2025-12-09T14:11:20Z

Following #6769, add a read_headers command to vhd-tool (following qcow-tool's JSON format). This allows stream_vdi to determine which clusters are allocated in QCOW- and VHD-backed VDIs, only reading and writing allocated blocks (previously it read the whole raw disk, verifying if blocks only contain zeros).

If there are any issues during header parsing, it falls back to the slow path (we don't handle errors during XVA export well, embedding 500 packets inside 200s)

This greatly speeds up XVA export for VMs with sparse VDIs:

5gb empty VDI: 19s -> 3s
5gb empty VDI + 2mb filled VDI: 22s -> 6s
5gb empty VDI + half-empty VDI (~4 gigs out of 10): 89s -> 49s

Note: If the block size of the VDI is larger than the size of the XVA blocks (this is currently the case for VHD, it has blocks of 2mb, while stream_vdi splits xvas into files of 1mb), stream_vdi can overestimate the size of allocated data (say, if only the first half of the VHD block has data), but in testing with real VDIs this impact was within a margin of error.

ocaml/xapi/xapi_vdi_helpers.ml

ocaml/xapi/vhd_tool_wrapper.ml

ocaml/xapi/stream_vdi.ml

ocaml/libs/vhd/vhd_format/f.mli

ocaml/tapctl/tapctl.ml

ocaml/xapi/storage_smapiv1_migrate.ml

Currently, vhd-tool provides several "hybrid" modes where it exports into vhd from raw, using the information from the VHD bitmaps to determine which blocks and sectors contain data (to avoid reading zero blocks). Other tools are also handling VHD-backed VDIs (we are exporting them as part of XVA export, and now they can also be exported to QCOW), and currently they have to read the whole raw disk. Instead provide a read_headers command which provides data on allocated clusters for other tools to use, allowing them to speed up handling sparse VDIs. It uses a new blocks_json function in Vhd_format. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

The body has less indentation this way Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

This allows using it in stream_vdi and qcow_tool_wrapper without introducing a dependency cycle. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

Qcow_tool_wrapper and Vhd_tool_wrapper expect a particular driver to be backing the VDI and fall back to handling the VDI as raw otherwise - they will be using backing_file_of_device_with_driver. Stream_vdi, however, will need to branch on the type of the driver, and it will use backing_info_of_device (which also returns the type of the driver) Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

Split common code used by {Vhd,Qcow}_tool_wrapper into a new vhd_qcow_parsing module. Since Vhd_tool_wrapper.run_vhd_tool is hardcoded to read the progress percentage printed by vhd-tool, we have to use the more generic Vhd_qcow_parsing.run_qcow_tool to run vhd-tool. Since VHD and QCOW follow the same format of JSON, use the same parse_header function. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

Reads the bitmaps for VHD- and QCOW-backed VDIs, determines which clusters are allocated and only reads and writes these to the resulting xva. This avoids the need for the "timeout workaround", which is needed when no data has been sent for an extended period of time (so stream_vdi writes a "packet" that doesn't carry any data, just a checksum of an empty body. in case of a compressed export, however, the compressor binary buffers output and this timeout workaround does not work). This also greatly speeds up export of VMs with sparse VDIs. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

Some of the users of the function did not handle exceptions correctly - make the "not found" case explicit with an option. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

last-genius force-pushed the asv/vhd-read-header branch 2 times, most recently from d1c869b to 92c11d8 Compare December 9, 2025 15:30

psafont reviewed Dec 9, 2025

View reviewed changes

ocaml/xapi/xapi_vdi_helpers.ml Outdated Show resolved Hide resolved

ocaml/xapi/vhd_tool_wrapper.ml Outdated Show resolved Hide resolved

ocaml/xapi/stream_vdi.ml Outdated Show resolved Hide resolved

last-genius force-pushed the asv/vhd-read-header branch 2 times, most recently from cfb4a5f to 56b2575 Compare December 10, 2025 09:01

psafont reviewed Dec 10, 2025

View reviewed changes

ocaml/libs/vhd/vhd_format/f.mli Show resolved Hide resolved

psafont reviewed Dec 10, 2025

View reviewed changes

ocaml/tapctl/tapctl.ml Outdated Show resolved Hide resolved

ocaml/xapi/storage_smapiv1_migrate.ml Show resolved Hide resolved

last-genius added 9 commits December 10, 2025 11:11

stream_vdi: Factor out send_one into a top-level function

37f8b66

The body has less indentation this way Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

xapi_vdi_helpers: Move backing_file_of_device from vhd_tool_wrapper

3d36a04

This allows using it in stream_vdi and qcow_tool_wrapper without introducing a dependency cycle. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

tapctl: Return Option instead of raising Not_found

a7b78d9

Some of the users of the function did not handle exceptions correctly - make the "not found" case explicit with an option. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

stream_vdi: Move to using Mtime_clock.counter

2f26d4c

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

dune-project: Add yojson as vhd-tool dependency

2adb82e

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>

last-genius force-pushed the asv/vhd-read-header branch from 56b2575 to 2adb82e Compare December 10, 2025 11:37

psafont approved these changes Dec 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

stream_vdi: Only process allocated clusters for VHD and QCOW on XVA export #6786

stream_vdi: Only process allocated clusters for VHD and QCOW on XVA export #6786

Uh oh!

last-genius commented Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stream_vdi: Only process allocated clusters for VHD and QCOW on XVA export #6786

Are you sure you want to change the base?

stream_vdi: Only process allocated clusters for VHD and QCOW on XVA export #6786

Uh oh!

Conversation

last-genius commented Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants