Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for block volumes #6680

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

dzaninovic
Copy link
Contributor

Thank you for contributing to Velero!

Please add a summary of your change

Add support for backup and restore of CSI block volumes with Kopia uploader.

Design: #6590
Interface change: #6608

Testing:

  • Automated tests
  • Filesystem and Block PVC backup and restore with md5 verification

Does your change fix a particular issue?

Fixes #6548

Please indicate you've done the following:

  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

@codecov
Copy link

codecov bot commented Aug 21, 2023

Codecov Report

Merging #6680 (aa14f83) into main (c3ec7b7) will decrease coverage by 0.13%.
Report is 6 commits behind head on main.
The diff coverage is 55.00%.

@@            Coverage Diff             @@
##             main    #6680      +/-   ##
==========================================
- Coverage   60.78%   60.66%   -0.13%     
==========================================
  Files         245      249       +4     
  Lines       26256    26464     +208     
==========================================
+ Hits        15961    16055      +94     
- Misses       9164     9266     +102     
- Partials     1131     1143      +12     
Files Coverage Δ
pkg/datapath/file_system.go 32.14% <100.00%> (-2.03%) ⬇️
pkg/exposer/generic_restore.go 71.31% <100.00%> (-0.12%) ⬇️
pkg/install/daemonset.go 100.00% <100.00%> (ø)
pkg/uploader/provider/kopia.go 83.44% <100.00%> (-0.64%) ⬇️
pkg/exposer/csi_snapshot.go 79.92% <75.00%> (-0.64%) ⬇️
pkg/controller/data_upload_controller.go 68.44% <85.71%> (+0.08%) ⬆️
pkg/install/resources.go 76.28% <0.00%> (-0.92%) ⬇️
pkg/install/deployment.go 88.46% <0.00%> (-1.54%) ⬇️
pkg/exposer/host_path.go 80.00% <45.45%> (+3.80%) ⬆️
pkg/uploader/kopia/snapshot.go 82.86% <81.81%> (+0.93%) ⬆️
... and 5 more

... and 8 files with indirect coverage changes

@dzaninovic
Copy link
Contributor Author

I fixed the reported issues and rebased.

@reasonerjt
Copy link
Contributor

Could you also add more UT? The coverage has dropped 0.2%

@dzaninovic
Copy link
Contributor Author

Could you also add more UT? The coverage has dropped 0.2%

I will work on increasing the coverage.

@dzaninovic
Copy link
Contributor Author

I added more unit tests.

@dzaninovic
Copy link
Contributor Author

CI Run failure seems unrelated:
2023-08-24T14:00:17.0705891Z [2023-08-24T14:00:17.069Z] ['error'] There was an error running the uploader: Error uploading to https://codecov.io: Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')} 2023-08-24T14:00:17.0709047Z [2023-08-24T14:00:17.070Z] ['verbose'] The error stack is: Error: Error uploading to https://codecov.io: Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')} 2023-08-24T14:00:17.0710931Z at main (/snapshot/repo/dist/src/index.js) 2023-08-24T14:00:17.0711849Z at processTicksAndRejections (node:internal/process/task_queues:96:5) 2023-08-24T14:00:17.0713279Z [2023-08-24T14:00:17.070Z] ['verbose'] End of uploader: 1579 milliseconds 2023-08-24T14:00:17.0871109Z ##[error]Codecov: Failed to properly upload: The process '/home/runner/work/_actions/codecov/codecov-action/v3/dist/codecov' failed with exit code 255

@dzaninovic
Copy link
Contributor Author

I rebased and squashed the commits.

I will be on vacation next week so if there is anything else that needs to be done I can do that after the vacation or @draghuram can assign somebody else to continue while I am away.

@anshulahuja98
Copy link
Collaborator

@dzaninovic / @draghuram - will this still support kopia's dedup? Or will each backup be full copies.

@draghuram
Copy link
Contributor

@anshulahuja98, Kopia dedup will occur just like for regular files. As far as Kopia is concerned, the device is being presented as a file.

@anshulahuja98
Copy link
Collaborator

@anshulahuja98, Kopia dedup will occur just like for regular files. As far as Kopia is concerned, the device is being presented as a file.

what about dedup performance, should that also be same? Any idea? Since we are streaming device as file

@shawn-hurley
Copy link
Contributor

@anshulahuja98 Dedup happens at a layer below these changes. This means that everything else, from Kopia's perspective will be the same as a normal filesystem IIUC.

@draghuram
Copy link
Contributor

That is correct. There should be no difference in behavior, including performance related, between devices and files. But it is possible to optimize device backup further (using direct IO etc) but I think we decided not to do those in this iteration.

@anshulahuja98
Copy link
Collaborator

Got it, thanks for the inputs @shawn-hurley / @draghuram
One last question to connect the dots for my understanding the next best thing after this approach is using CBT based diff correct? That would basically have further better dedup by only copying diffs and storing them. And for that the current hope is mainly on the dataprotection sig's CBT design? Or is there any other industry standard in K8s context which is being used by other vendors.

I am just further trying to see to confirm my understanding that we'll have to write another uploader in velero to interface with the CBT APIs/ interface and then upload them

@Lyndon-Li
Copy link
Contributor

@anshulahuja98
Yes, we will create a block data uploader to support CBT as well as other feature, see the discussion here.

Besides Kubernetes SIG's CBT mechanism, we may also integrate the block uploader with storage/computing platform APIs directly, which will be used to solve problems like CSI is unavailable or inefficient.

@anshulahuja98
Copy link
Collaborator

@anshulahuja98 Yes, we will create a block data uploader to support CBT as well as other feature, see the discussion here.

Besides Kubernetes SIG's CBT mechanism, we may also integrate the block uploader with storage/computing platform APIs directly, which will be used to solve problems like CSI is unavailable or inefficient.

got it, thanks for sharing this @Lyndon-Li

@dzaninovic
Copy link
Contributor Author

I rebased, resolved conflicts and retested.

On @sseago's request in the last Velero Community meeting I tested block PVC backup without the mover and PVC was skipped.

time="2023-09-07T14:47:41Z" level=warning msg="volume vol is declared in pod block1/pod-raw but not mounted by any container, skipping" backup=velero/backup2 logSource="pkg/podvolume/backupper.go:270" name=pod-raw namespace=block1 resource=pods

Code is only checking for mounted volumes but not for attached devices:

	for _, container := range pod.Spec.Containers {
		for _, volumeMount := range container.VolumeMounts {
			mountedPodVolumes.Insert(volumeMount.Name)
		}
	}
...
	for _, volumeName := range volumesToBackup {
...
		// volumes that are not mounted by any container should not be backed up, because
		// its directory is not created
		if !mountedPodVolumes.Has(volumeName) {
			msg := fmt.Sprintf("volume %s is declared in pod %s/%s but not mounted by any container, skipping", volumeName, pod.Namespace, pod.Name)
			log.Warn(msg)
			pvcSummary.addSkipped(volumeName, msg)
			continue
		}

Do we want to try to support this code path in this pull request?

@sseago
Copy link
Collaborator

sseago commented Sep 7, 2023

@dzaninovic Oh, right, I'd forgotten about that. We actually had someone try to use fs backup for a block volume and hit that exact same error message (it didn't have the block mode support, since this was velero 1.11, so it still wouldn't have worked) -- but yes, if we can support the fs-backup code path, that would be great. I don't know whether the code you pasted above is the only thing needed to fix this, but it may be worth trying. I'd say that if you can get it working with fs-backup with minimal extra work, then feel free to do it here -- if it's a lot more work, then a follow-on PR for that later might be better.

@sseago
Copy link
Collaborator

sseago commented Sep 26, 2023

@dzaninovic I think "privileged-node-agent" is clearer, so consider this another vote for that change.

@dzaninovic
Copy link
Contributor Author

@dzaninovic I think "privileged-node-agent" is clearer, so consider this another vote for that change.

Ok, I will rename it.

@dzaninovic
Copy link
Contributor Author

I renamed privileged-agent installation option to privileged-node-agent.

@Lyndon-Li
Copy link
Contributor

Lyndon-Li commented Sep 27, 2023

@dzaninovic
Copy link
Contributor Author

@dzaninovic I see below comments related to doc changes are not resolved yet:

  1. Add support for block volumes #6680 (comment)
  2. point 2 at Add support for block volumes #6680 (comment)

For 2, specifically, we need to change below doc places:

Would you help to check them? Or if you want our helps, just let us know.

@Lyndon-Li, thank you for summarizing what needs to be changed in the documentation. I will take some time today to make the changes.

Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
@dzaninovic
Copy link
Contributor Author

dzaninovic commented Sep 27, 2023

I made the documentation changes as requested and marked comments as resolved that I think are now resolved.
https://github.com/vmware-tanzu/velero/compare/4cf2fc0654217d11b3f4ad199f23b912a22126a7..aa14f836b4472e78e5d288300bfbf97ad80cff4c

@dzaninovic
Copy link
Contributor Author

I don't have write access to the repository so somebody else will need to merge.

@weshayutin
Copy link
Contributor

@dzaninovic well done sir, thank you!

@shubham-pampattiwar
Copy link
Collaborator

@dzaninovic Waiting on @Lyndon-Li's final review on this PR.

@sseago sseago merged commit 8e01d1b into vmware-tanzu:main Sep 28, 2023
22 of 24 checks passed
@sseago
Copy link
Collaborator

sseago commented Sep 28, 2023

@dzaninovic Could you cherry-pick the merged commit into a PR against the release-1.12 branch? We also want to get this into 1.12.1.

@dzaninovic dzaninovic deleted the block_volume branch September 28, 2023 14:20
dzaninovic added a commit to catalogicsoftware/velero_block that referenced this pull request Sep 28, 2023
Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
(cherry picked from commit 8e01d1b)
@dzaninovic dzaninovic mentioned this pull request Sep 28, 2023
3 tasks
@dzaninovic
Copy link
Contributor Author

@dzaninovic Could you cherry-pick the merged commit into a PR against the release-1.12 branch? We also want to get this into 1.12.1.

@sseago, sure, here is the PR: #6897.

dzaninovic added a commit to catalogicsoftware/velero_block that referenced this pull request Sep 29, 2023
Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
(cherry picked from commit 8e01d1b)
dzaninovic added a commit to catalogicsoftware/velero_block that referenced this pull request Sep 29, 2023
Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
(cherry picked from commit 8e01d1b)
dzaninovic added a commit to catalogicsoftware/velero_block that referenced this pull request Sep 29, 2023
Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
(cherry picked from commit 8e01d1b)
dzaninovic added a commit to catalogicsoftware/velero_block that referenced this pull request Sep 29, 2023
(cherry picked from commit 8e01d1b)

Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
sseago pushed a commit that referenced this pull request Sep 29, 2023
(cherry picked from commit 8e01d1b)

Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
kaovilai pushed a commit to kaovilai/velero that referenced this pull request Oct 2, 2023
Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
kaovilai pushed a commit to kaovilai/velero that referenced this pull request Oct 2, 2023
Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
sseago pushed a commit to openshift/velero that referenced this pull request May 23, 2024
(cherry picked from commit 8e01d1b)

Signed-off-by: David Zaninovic <dzaninovic@catalogicsoftware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support backup on volumeMode block via Data mover (Kopia)
10 participants