Adding design updates to handle block device with kopia #6590

shawn-hurley · 2023-08-02T17:23:52Z

Thank you for contributing to Velero!

Please add a summary of your change

This will add the design elements for handling block device's. We are making a small change to the VGDP provider interface so that the data type can be based on and used by data movers.

I don't believe this is an API change that is user-facing or even plugin-facing. Can someone help me make sure?

Please indicate you've done the following:

Accepted the DCO. Commits without the DCO will delay acceptance.
Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
Updated the corresponding documentation in site/content/docs/main.

kaovilai · 2023-08-02T17:58:44Z

/kind changelog-not-required

design/volume-snapshot-data-movement/volume-snapshot-data-movement.md

dzaninovic · 2023-08-08T12:48:16Z

I got the backup and restore to work in the implementation this design is based on:
catalogicsoftware#1

There is still a lot of work remaining to be done in the implementation.

shawn-hurley · 2023-08-08T22:50:34Z

@Lyndon-Li Please comment on the PR where I attempt your proposal if you would like

catalogicsoftware#2

I think that @dzaninovic brings up good points about the reader, but we can use the virutalfs streaming file to ensure that Kopia handles it like a streaming file.

I think that for restoration, the performance impacts of writing zeros could be a real concern. I think that we really should consider making a wrapper for the output like here:

https://github.com/catalogicsoftware/velero_block/pull/1/files#diff-5fce85fc3f9b0abc12ba749f6318d311e9177847e1a7e026abd6a8ec90a43827R16-R20

Like I said in the comment on the PR I am planning on updating the design with the above. If it makes sense please let me know so that I can adjust when updating tomorrow :)

Lyndon-Li · 2023-08-09T02:34:06Z

@shawn-hurley Let me comment on your changes in catalogicsoftware#2 here below, so that we can make the discussion centrally.
Totally speaking, if both the backup and restore work through virtualfs, it will be a good choice, because it involves less potential breaks by future Kopia version and it also make things simpler.

Lyndon-Li · 2023-08-09T02:37:32Z

1. For the IO optimization, like DIO:
As I mentioned here, the current Kopia file system uploader doesn't support this, so we need a completely separate implementation.
For this implementation, we not only need to implement a reader but also a writer (because the writing also benefits from the DIO).
Moreover, in order to use DIO, the IO pattern will be different because the read & write have to be aligned with sector size.

For all these complications, a new uploader -- block uploader is more suitable.
Therefore, together with other requirements as listed here , I categorized this to the future block level backup instead of the current Phase-0.

On the other hand, I am not saying that the current virtualfs implementation doesn't support these IO optimizations technically. As you can see, the StreamingFileFromReader accepts an io.ReadCloser interface, we can implement interface and open & read the block device through DIO.

Conclusively:

I don't suggest we have these IO optimizations in a patch release. Regarding to data consistency, there are more potential problems for block IO, so we need more time for testing & verification.
If we want to do it for Phase-0 in a minor release, we need to wrap the implementation well enough so that the future block level backup could reuse it

Lyndon-Li · 2023-08-09T02:51:27Z

2. For skipping zero bytes during restore:
I don't think it is a good choice.
Zero bytes are not always meaningless and should be applied to the target disk in any way.
Once a new volume/PV is provisioned, we cannot assume that the entire disk data are all-zero. Storages usually don't clear the disk data unless they are requested to do this, i.e., for strong security requirement (think about how long it takes to write all zero to a large disk). As a result, if in some area of the disk, the data should be zero, but we skip writing zero to the area and the area for the provisioned disk is not zero, data corruption will happen.

Moreover, detecting zero bytes are not an easy task either. Simply enumerating each byte one by one is not a good choice, think about how much CPU it takes for large disk.

Actually, detecting and applying zero bytes involve backup repo and storage side support, we don't need to consider it in Phase-0 and even the initial version of block level backup doesn't need to support it.

Lyndon-Li · 2023-08-09T02:55:21Z

3. check target file is a block device from OS level
This is a good to have. We get the volume mode from Kubernetes first and then double check this in a different way from the uploader.
Moreover, I would suggest we wrap this check to a separate function, as it is useful for future modules/functionalities.

shawn-hurley · 2023-08-09T14:35:30Z

I don't suggest we have these IO optimizations in a patch release. Regarding to data consistency, there are more potential problems for block IO, so we need more time for testing & verification.
If we want to do it for Phase-0 in a minor release, we need to wrap the implementation well enough so that the future block level backup could reuse it

I believe the answer here, IMO, is to just use Golang's built-in reader from os. File. This seems to work. @dzaninovic, if we need to add the performance bits, I think it does make sense to make this part of a more full feature block uploader that can also handle CBT IMO.

@Lyndon-Li just making sure, but you are ok with this? @dzaninovic does this work for you?

dzaninovic · 2023-08-09T14:45:55Z

I believe the answer here, IMO, is to just use Golang's built-in reader from os. File. This seems to work. @dzaninovic, if we need to add the performance bits, I think it does make sense to make this part of a more full feature block uploader that can also handle CBT IMO.

@Lyndon-Li just making sure, but you are ok with this? @dzaninovic does this work for you?

I am fine with having optimizations added later in separate PRs and use the simple way for now if that is acceptable.

dzaninovic · 2023-08-09T15:37:29Z

2. For skipping zero bytes during restore: I don't think it is a good choice. Zero bytes are not always meaningless and

This can be a non-default option that user can set if they know that disks are provisioned with all zeroes.

shawn-hurley · 2023-08-09T17:03:34Z

@sseago @shubham-pampattiwar @Lyndon-Li @dzaninovic @weshayutin

Quick ping to inform folks that the design is updated based on the conversations. Please take a look and let me know what y'all think!

weshayutin · 2023-08-09T20:33:25Z

I've read through this design, which looks good and the interface change pr 6608 which is also looking good.

Lyndon-Li · 2023-08-10T06:53:08Z

The current design looks good to me.

Lyndon-Li · 2023-08-10T07:13:49Z

@dzaninovic

2. For skipping zero bytes during restore: I don't think it is a good choice. Zero bytes are not always meaningless and

This can be a non-default option that user can set if they know that disks are provisioned with all zeroes.

Personally, I don't think we can use this as an optional method. It is too dangerous since all-zero provision is hard to config, detect and not all upper level scenarios direct to a new provision; and it is not a good bargain to detect zero data by per-byte enumeration.
Anyway, as mentioned above, detecting and applying zero bytes is an advanced feature, let's see what we can deliver in future when we have more mature backup, backup repo and restore.

dzaninovic · 2023-08-10T16:10:48Z

I reverted catalogicsoftware#2 where virtualfs.StreamingFileFromReader() was attempted because it was causing a backup failure.

time="2023-08-09T21:31:47Z" level=error msg="Async fs backup data path failed" dataupload=backup1-cv4mr error="Failed to run kopia backup: Failed to upload the kopia snapshot for si default@default:block1/pvc-raw: unsupported source: default@default:block1/pvc-raw" error.file="/go/pkg/mod/github.com/kopia/kopia@v0.13.0/snapshot/snapshotfs/upload.go:1291" error.function="github.com/kopia/kopia/snapshot/snapshotfs.(*Uploader).Upload" logSource="pkg/controller/data_upload_controller.go:328"

By looking at the Kopia code I can see that it fails because fs.StreamingFile is not considered in the switch statement.
https://github.com/kopia/kopia/blob/v0.13.0/snapshot/snapshotfs/upload.go#L1291

@shawn-hurley will attempt to use this instead so if that is successful and more preferable than my original solution we will use that:
https://github.com/kopia/kopia/blob/master/fs/virtualfs/virtualfs.go#L96

In any case design will have to change to account for this change so don't merge it yet.

Signed-off-by: Shawn Hurley <shawn@hurley.page>

codecov · 2023-08-14T20:06:12Z

Codecov Report

Merging #6590 (572d601) into main (bb96c21) will decrease coverage by 0.02%.
Report is 31 commits behind head on main.
The diff coverage is n/a.

❗ Current head 572d601 differs from pull request most recent head ebaf316. Consider uploading reports for the commit ebaf316 to get more accurate results

@@            Coverage Diff             @@
##             main    #6590      +/-   ##
==========================================
- Coverage   60.18%   60.17%   -0.02%     
==========================================
  Files         242      242              
  Lines       25640    25679      +39     
==========================================
+ Hits        15432    15452      +20     
- Misses       9135     9150      +15     
- Partials     1073     1077       +4

see 8 files with indirect coverage changes

shawn-hurley · 2023-08-14T20:07:36Z

@Lyndon-Li @sseago @shubham-pampattiwar

I have once again updated the design. We have to do some specific things during the restore to handle the block device and how things work from the node.

We still use the virtualfs to save the data into the kopia repository.

Please let me now what y'all think

@dzaninovic My new virtualfs PR should work for restore now!

dzaninovic · 2023-08-14T21:08:30Z

@dzaninovic My new virtualfs PR should work for restore now!

I merged your code and confirmed that md5 of restored data is correct.

Lyndon-Li · 2023-08-15T02:41:26Z

@shawn-hurley
I checked the code. The restore part looks fine as it is adding some more checks on the block device. Merely, I am just curious whether the restore part code are must-have (vs better to have)?

from your statement:

do some specific things during the restore to handle the block device and how things work from the node

It looks like these code have fixed some problems, without of which the restore will fail. I didn't get what the problems are. So if it is true, could you share some more details?

shawn-hurley · 2023-08-15T13:57:30Z

It looks like these codes have fixed some problems, without which the restore will fail. I didn't get what the issues were. So if it is true, could you share some more details?

The most significant change is that the built-in output will try and create a directory where the block device is. This fails, so we check to make sure that the file is a block device, and if it is, we skip the creation of the directory because it exists.

Another change is the writing of the blocks. When using the default FileSystemOutput.WriteFile, was changing the data in some way (the SHA sums of the resulting data were inconsistent). I did not look into why because even if we understood why, we would need to either write the code that exists or we would be required to make some change upstream.

kaovilai · 2023-08-17T17:13:56Z

#6608 merged

sseago · 2023-08-17T17:36:28Z

"codecov/project — 60.18% (-0.01%) compared to bb96c21"
This failure isn't relevant for this PR since it only includes design documentation and no actual new code that can be tested.

github-actions bot requested review from Lyndon-Li and reasonerjt August 2, 2023 17:24

github-actions bot assigned shawn-hurley Aug 2, 2023

github-actions bot added the kind/changelog-not-required PR does not require a user changelog. Often for docs, website, or build changes label Aug 2, 2023

Lyndon-Li reviewed Aug 3, 2023

View reviewed changes

Lyndon-Li added kind/changelog-not-required PR does not require a user changelog. Often for docs, website, or build changes and removed kind/changelog-not-required PR does not require a user changelog. Often for docs, website, or build changes labels Aug 3, 2023

weshayutin mentioned this pull request Aug 3, 2023

Design: volumeMode=block support migtools/volume-snapshot-mover#260

Open

shawn-hurley force-pushed the add-design-block-vol branch from f6f1312 to b21aef6 Compare August 3, 2023 20:00

shubham-pampattiwar reviewed Aug 3, 2023

View reviewed changes

design/volume-snapshot-data-movement/volume-snapshot-data-movement.md Show resolved Hide resolved

shubham-pampattiwar reviewed Aug 3, 2023

View reviewed changes

design/volume-snapshot-data-movement/volume-snapshot-data-movement.md Outdated Show resolved Hide resolved

Lyndon-Li reviewed Aug 4, 2023

View reviewed changes

design/volume-snapshot-data-movement/volume-snapshot-data-movement.md Outdated Show resolved Hide resolved

shawn-hurley force-pushed the add-design-block-vol branch from b21aef6 to c853c89 Compare August 7, 2023 17:44

dzaninovic mentioned this pull request Aug 8, 2023

Add support for block volumes catalogicsoftware/velero_block#1

Closed

shawn-hurley force-pushed the add-design-block-vol branch 2 times, most recently from c466d34 to 572d601 Compare August 9, 2023 16:59

kaovilai approved these changes Aug 9, 2023

View reviewed changes

shubham-pampattiwar previously approved these changes Aug 9, 2023

View reviewed changes

sseago previously approved these changes Aug 9, 2023

View reviewed changes

Lyndon-Li previously approved these changes Aug 10, 2023

View reviewed changes

Adding design updates to handle block device with kopia

ebaf316

Signed-off-by: Shawn Hurley <shawn@hurley.page>

shawn-hurley dismissed stale reviews from Lyndon-Li, sseago, and shubham-pampattiwar via ebaf316 August 14, 2023 20:06

shawn-hurley force-pushed the add-design-block-vol branch from 572d601 to ebaf316 Compare August 14, 2023 20:06

sseago approved these changes Aug 14, 2023

View reviewed changes

shubham-pampattiwar approved these changes Aug 15, 2023

View reviewed changes

Lyndon-Li approved these changes Aug 16, 2023

View reviewed changes

sseago merged commit 0e7c417 into vmware-tanzu:main Aug 17, 2023
7 of 8 checks passed

dzaninovic mentioned this pull request Aug 18, 2023

Add support for block volumes #6680

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding design updates to handle block device with kopia #6590

Adding design updates to handle block device with kopia #6590

shawn-hurley commented Aug 2, 2023

kaovilai commented Aug 2, 2023

dzaninovic commented Aug 8, 2023 •

edited

shawn-hurley commented Aug 8, 2023 •

edited

Lyndon-Li commented Aug 9, 2023 •

edited

Lyndon-Li commented Aug 9, 2023

Lyndon-Li commented Aug 9, 2023 •

edited

Lyndon-Li commented Aug 9, 2023

shawn-hurley commented Aug 9, 2023

dzaninovic commented Aug 9, 2023

dzaninovic commented Aug 9, 2023

shawn-hurley commented Aug 9, 2023

weshayutin commented Aug 9, 2023

Lyndon-Li commented Aug 10, 2023

Lyndon-Li commented Aug 10, 2023

dzaninovic commented Aug 10, 2023

codecov bot commented Aug 14, 2023

shawn-hurley commented Aug 14, 2023

dzaninovic commented Aug 14, 2023

Lyndon-Li commented Aug 15, 2023 •

edited

shawn-hurley commented Aug 15, 2023

kaovilai commented Aug 17, 2023

sseago commented Aug 17, 2023

Adding design updates to handle block device with kopia #6590

Adding design updates to handle block device with kopia #6590

Conversation

shawn-hurley commented Aug 2, 2023

Please add a summary of your change

Please indicate you've done the following:

kaovilai commented Aug 2, 2023

dzaninovic commented Aug 8, 2023 • edited

shawn-hurley commented Aug 8, 2023 • edited

Lyndon-Li commented Aug 9, 2023 • edited

Lyndon-Li commented Aug 9, 2023

Lyndon-Li commented Aug 9, 2023 • edited

Lyndon-Li commented Aug 9, 2023

shawn-hurley commented Aug 9, 2023

dzaninovic commented Aug 9, 2023

dzaninovic commented Aug 9, 2023

shawn-hurley commented Aug 9, 2023

weshayutin commented Aug 9, 2023

Lyndon-Li commented Aug 10, 2023

Lyndon-Li commented Aug 10, 2023

dzaninovic commented Aug 10, 2023

codecov bot commented Aug 14, 2023

Codecov Report

shawn-hurley commented Aug 14, 2023

dzaninovic commented Aug 14, 2023

Lyndon-Li commented Aug 15, 2023 • edited

shawn-hurley commented Aug 15, 2023

kaovilai commented Aug 17, 2023

sseago commented Aug 17, 2023

dzaninovic commented Aug 8, 2023 •

edited

shawn-hurley commented Aug 8, 2023 •

edited

Lyndon-Li commented Aug 9, 2023 •

edited

Lyndon-Li commented Aug 9, 2023 •

edited

Lyndon-Li commented Aug 15, 2023 •

edited