Skip to content

Conversation

@last-genius
Copy link
Contributor

Following #6769, add a read_headers command to vhd-tool (following qcow-tool's JSON format). This allows stream_vdi to determine which clusters are allocated in QCOW- and VHD-backed VDIs, only reading and writing allocated blocks (previously it read the whole raw disk, verifying if blocks only contain zeros).

If there are any issues during header parsing, it falls back to the slow path (we don't handle errors during XVA export well, embedding 500 packets inside 200s)

This greatly speeds up XVA export for VMs with sparse VDIs:

5gb empty VDI: 19s -> 3s
5gb empty VDI + 2mb filled VDI: 22s -> 6s
5gb empty VDI + half-empty VDI (~4 gigs out of 10): 89s -> 49s

Note: If the block size of the VDI is larger than the size of the XVA blocks (this is currently the case for VHD, it has blocks of 2mb, while stream_vdi splits xvas into files of 1mb), stream_vdi can overestimate the size of allocated data (say, if only the first half of the VHD block has data), but in testing with real VDIs this impact was within a margin of error.

@last-genius last-genius force-pushed the asv/vhd-read-header branch 2 times, most recently from d1c869b to 92c11d8 Compare December 9, 2025 15:30
@last-genius last-genius force-pushed the asv/vhd-read-header branch 2 times, most recently from cfb4a5f to 56b2575 Compare December 10, 2025 09:01
Currently, vhd-tool provides several "hybrid" modes where it exports into vhd
from raw, using the information from the VHD bitmaps to determine which blocks
and sectors contain data (to avoid reading zero blocks).

Other tools are also handling VHD-backed VDIs (we are exporting them as part of
XVA export, and now they can also be exported to QCOW), and currently they have
to read the whole raw disk.

Instead provide a read_headers command which provides data on allocated
clusters for other tools to use, allowing them to speed up handling sparse
VDIs. It uses a new blocks_json function in Vhd_format.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
The body has less indentation this way

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
This allows using it in stream_vdi and qcow_tool_wrapper without introducing a
dependency cycle.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Qcow_tool_wrapper and Vhd_tool_wrapper expect a particular driver to be backing
the VDI and fall back to handling the VDI as raw otherwise - they will be using
backing_file_of_device_with_driver.

Stream_vdi, however, will need to branch on the type of the driver, and it will
use backing_info_of_device (which also returns the type of the driver)

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Split common code used by {Vhd,Qcow}_tool_wrapper into a new vhd_qcow_parsing
module.

Since Vhd_tool_wrapper.run_vhd_tool is hardcoded to read the progress
percentage printed by vhd-tool, we have to use the more generic
Vhd_qcow_parsing.run_qcow_tool to run vhd-tool.

Since VHD and QCOW follow the same format of JSON, use the same parse_header
function.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Reads the bitmaps for VHD- and QCOW-backed VDIs, determines which clusters are
allocated and only reads and writes these to the resulting xva.

This avoids the need for the "timeout workaround", which is needed when no data
has been sent for an extended period of time (so stream_vdi writes a "packet"
that doesn't carry any data, just a checksum of an empty body. in case of a
compressed export, however, the compressor binary buffers output and this
timeout workaround does not work).

This also greatly speeds up export of VMs with sparse VDIs.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Some of the users of the function did not handle exceptions correctly - make
the "not found" case explicit with an option.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants