-
Notifications
You must be signed in to change notification settings - Fork 296
stream_vdi: Only process allocated clusters for VHD and QCOW on XVA export #6786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
last-genius
wants to merge
9
commits into
xapi-project:master
Choose a base branch
from
last-genius:asv/vhd-read-header
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d1c869b to
92c11d8
Compare
psafont
reviewed
Dec 9, 2025
cfb4a5f to
56b2575
Compare
psafont
reviewed
Dec 10, 2025
psafont
reviewed
Dec 10, 2025
Currently, vhd-tool provides several "hybrid" modes where it exports into vhd from raw, using the information from the VHD bitmaps to determine which blocks and sectors contain data (to avoid reading zero blocks). Other tools are also handling VHD-backed VDIs (we are exporting them as part of XVA export, and now they can also be exported to QCOW), and currently they have to read the whole raw disk. Instead provide a read_headers command which provides data on allocated clusters for other tools to use, allowing them to speed up handling sparse VDIs. It uses a new blocks_json function in Vhd_format. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
The body has less indentation this way Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
This allows using it in stream_vdi and qcow_tool_wrapper without introducing a dependency cycle. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Qcow_tool_wrapper and Vhd_tool_wrapper expect a particular driver to be backing the VDI and fall back to handling the VDI as raw otherwise - they will be using backing_file_of_device_with_driver. Stream_vdi, however, will need to branch on the type of the driver, and it will use backing_info_of_device (which also returns the type of the driver) Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Split common code used by {Vhd,Qcow}_tool_wrapper into a new vhd_qcow_parsing
module.
Since Vhd_tool_wrapper.run_vhd_tool is hardcoded to read the progress
percentage printed by vhd-tool, we have to use the more generic
Vhd_qcow_parsing.run_qcow_tool to run vhd-tool.
Since VHD and QCOW follow the same format of JSON, use the same parse_header
function.
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Reads the bitmaps for VHD- and QCOW-backed VDIs, determines which clusters are allocated and only reads and writes these to the resulting xva. This avoids the need for the "timeout workaround", which is needed when no data has been sent for an extended period of time (so stream_vdi writes a "packet" that doesn't carry any data, just a checksum of an empty body. in case of a compressed export, however, the compressor binary buffers output and this timeout workaround does not work). This also greatly speeds up export of VMs with sparse VDIs. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Some of the users of the function did not handle exceptions correctly - make the "not found" case explicit with an option. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
56b2575 to
2adb82e
Compare
psafont
approved these changes
Dec 10, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Following #6769, add a
read_headerscommand tovhd-tool(followingqcow-tool's JSON format). This allowsstream_vdito determine which clusters are allocated in QCOW- and VHD-backed VDIs, only reading and writing allocated blocks (previously it read the whole raw disk, verifying if blocks only contain zeros).If there are any issues during header parsing, it falls back to the slow path (we don't handle errors during XVA export well, embedding 500 packets inside 200s)
This greatly speeds up XVA export for VMs with sparse VDIs:
5gb empty VDI: 19s -> 3s
5gb empty VDI + 2mb filled VDI: 22s -> 6s
5gb empty VDI + half-empty VDI (~4 gigs out of 10): 89s -> 49s
Note: If the block size of the VDI is larger than the size of the XVA blocks (this is currently the case for VHD, it has blocks of 2mb, while stream_vdi splits xvas into files of 1mb), stream_vdi can overestimate the size of allocated data (say, if only the first half of the VHD block has data), but in testing with real VDIs this impact was within a margin of error.