Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow backup list requests to be chunked #3823

Merged
merged 1 commit into from Aug 18, 2021

Conversation

dharmab
Copy link
Contributor

@dharmab dharmab commented May 24, 2021

Thank you for contributing to Velero!

Please add a summary of your change

Add a --client-page-size flag to the Velero server. This flag causes LIST calls for resources other than Namespaces to be chunked into chunks of the given size during backups (i.e. using the limit query parameter on the apiserver request). The default value of 0 disables chunking, similar to the behavior of the --chunk-size flag for kubectl.

In the case of a ResourceExpired error due to modification of the object list during pagination, the server will attempt to fall back on a non-chunked list. If this request fails, the backup will fail.

I have tested this in my own cluster using client-page-size of both 0 and 500 successfully.

Note: I was unable to run most of the make commands as noted in #262 (comment). I may need help from a maintainer to complete this pull request. This worked after a rebase.

Does your change fix a particular issue?

Fixes #262 (except for Namespace objects)

Please indicate you've done the following:

@dharmab dharmab force-pushed the list-chunking-1.6.0 branch 2 times, most recently from 8c2a58c to 75a1f12 Compare May 24, 2021 20:51
@dharmab dharmab marked this pull request as ready for review May 24, 2021 21:32
@github-actions github-actions bot requested a review from a-mccarthy May 24, 2021 21:32
@dharmab
Copy link
Contributor Author

dharmab commented May 25, 2021

Might be better to use https://pkg.go.dev/k8s.io/client-go/tools/pager#ListPager.List- with the layer of abstraction from the dynamic client it didn't seem like the most trivial option.

@alaypatel07
Copy link
Contributor

Using ListPager could be a good option, kube controllers used to use this before a refactor to informers system. One later bloomer being worked on right now is the cronjob controller. Example of list page in that controller here: https://github.com/kubernetes/kubernetes/blob/cf59c68e15f35a074e46ba6eae704702d3833e5d/pkg/controller/cronjob/cronjob_controller.go#L113

Note: a trivial difference between the cronjob example and here would be we might have to handle the resource expired type errors in the page function like follows, but otherwise +1 to using the abstraction.

	resourceListFunc := func(opts metav1.ListOptions) (runtime.Object, error) {
		obj, listErr := resourceClient.Namespace(namespace).List(context.Background(), opts)
		switch {
		case errors.IsResourceExpired(listErr):
			// log the warning
		default:
			return nil, listErr
		}
		return obj, nil
	}

Infact I would think this could be another configurable parameter, the user could choose to tolerate expired list or pay the performance benefit with an accurate snapshot from apiserver directly if they so care.

@dharmab
Copy link
Contributor Author

dharmab commented May 25, 2021

I've pushed a new commit using ListPager and updated the PR description.

@github-actions github-actions bot added Area/Design Design Documents Dependencies Pull requests that update a dependency file has-e2e-tests has-unit-tests Website non-docs changes for the website labels May 25, 2021
@dharmab
Copy link
Contributor Author

dharmab commented May 25, 2021

Hmm, trying to resolve conflicts seems to have caused some problems, probably because I branched from 1.6.0 instead of master. Time to rebase entirely...

@github-actions github-actions bot removed Area/Design Design Documents Dependencies Pull requests that update a dependency file Documentation has-unit-tests has-e2e-tests labels May 25, 2021
@dharmab dharmab force-pushed the list-chunking-1.6.0 branch 2 times, most recently from a030f3c to 1452c45 Compare May 25, 2021 18:15
@github-actions github-actions bot added Documentation and removed Website non-docs changes for the website labels May 25, 2021
// If limit is positive, use a pager to split list over multiple requests
// Use Velero's dynamic list function instead of the default
listFunc := pager.SimplePageFunc(func(opts metav1.ListOptions) (runtime.Object, error) {
list, err := resourceClient.List(listOptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are not planning on checking the type of error, simply return resourceClient.List(listOptions) might work as well.

Also, can you move the TODO on line 320 here? I think the ResourceExperied errors will be handled here in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page buffer size would be handled by setting listPager.PageBufferSize.

I double checked how ResourceExpired errors need to be handled and it looks like ListPager only handles them for us if we use List() instead of EachListItem() :( So this needs to be changed to handle the error after all.

Copy link
Contributor

@alaypatel07 alaypatel07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me, added a few suggestions inline. Thanks @dharmab

cc @sseago

@dharmab
Copy link
Contributor Author

dharmab commented May 25, 2021

Updated to handle ResourceExpired errors (since EachListItem doesn't do that for us), explicitly handle a casting error, and invert the page size condition to close the negative page size loophole.

pkg/backup/backup.go Outdated Show resolved Hide resolved
pkg/backup/item_collector.go Outdated Show resolved Hide resolved
@dharmab
Copy link
Contributor Author

dharmab commented Jun 7, 2021

May this PR have an updated review following the squashed commit/header updates?

Copy link
Contributor

@dsu-igeek dsu-igeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, look pretty good, let's default to 500 chunk size. We don't want people to have to go looking for this option when they run into issues.

pkg/cmd/server/server.go Outdated Show resolved Hide resolved
site/content/docs/main/backup-reference.md Outdated Show resolved Hide resolved
@dsu-igeek
Copy link
Contributor

Also, please run E2E tests against this. Ping me if you need help.

Signed-off-by: Dharma Bellamkonda <bellamko@adobe.com>
@a-mccarthy a-mccarthy removed their request for review June 14, 2021 16:12
@dharmab
Copy link
Contributor Author

dharmab commented Jun 14, 2021

I'm seeing a lot of errors when running make test-e2e similar to:

Namespace/: attempting to create resource
Namespace/: attempting to create resource client

• Failure in Spec Setup (BeforeEach) [2.455 seconds]
[Restic] Velero tests on cluster using the plugin provider for object storage and Restic for volume backups
/Users/bellamko/go/src/github.com/vmware-tanzu/velero/test/e2e/backup_test.go:36
  when kibishii is the sample workload [BeforeEach]
  /Users/bellamko/go/src/github.com/vmware-tanzu/velero/test/e2e/backup_test.go:82
    should be successfully backed up and restored to the default BackupStorageLocation
    /Users/bellamko/go/src/github.com/vmware-tanzu/velero/test/e2e/backup_test.go:83

    Expected success, but got an error:
        <*errors.withMessage | 0xc000331b40>: {
            cause: {
                error: {
                    cause: {
                        error: {
                            cause: {
                                ErrStatus: {
                                    TypeMeta: {Kind: "Status", APIVersion: "v1"},
                                    ListMeta: {
                                        SelfLink: "",
                                        ResourceVersion: "",
                                        Continue: "",
                                        RemainingItemCount: nil,
                                    },
                                    Status: "Failure",
                                    Message: "Namespace \"\" is invalid: metadata.name: Required value: name or generateName is required",
                                    Reason: "Invalid",
                                    Details: {
                                        Name: "",
                                        Group: "",
                                        Kind: "Namespace",
                                        UID: "",
                                        Causes: [
                                            {Type: ..., Message: ..., Field: ...},
                                        ],
                                        RetryAfterSeconds: 0,
                                    },
                                    Code: 422,
                                },
                            },
                            msg: "Error creating resource Namespace/",
                        },
                        stack: [0x21e3d53, 0x21e48e5, 0x22342ab, 0x22360ff, 0x223f805, 0x21f4963, 0x21f457c, 0x21f4a47, 0x21f75ad, 0x21f70f2, 0x2205d71, 0x2205887, 0x2205077, 0x2207786, 0x22143b8, 0x2214067, 0x22389a9, 0x111d84f, 0x1073d81],
                    },
                    msg: "\n\nError installing Velero. Use `kubectl logs deploy/velero -n velero` to check the deploy logs",
                },
                stack: [0x2234505, 0x22360ff, 0x223f805, 0x21f4963, 0x21f457c, 0x21f4a47, 0x21f75ad, 0x21f70f2, 0x2205d71, 0x2205887, 0x2205077, 0x2207786, 0x22143b8, 0x2214067, 0x22389a9, 0x111d84f, 0x1073d81],
            },
            msg: "Failed to install Velero in cluster",
        }
        Failed to install Velero in cluster:

        Error installing Velero. Use `kubectl logs deploy/velero -n velero` to check the deploy logs: Error creating resource Namespace/: Namespace "" is invalid: metadata.name: Required value: name or generateName is required

    /Users/bellamko/go/src/github.com/vmware-tanzu/velero/test/e2e/backup_test.go:66

@sseago
Copy link
Collaborator

sseago commented Jun 16, 2021

@dharmab When running e2e tests, it looks like you're missing the velero namespace env var. Try adding VELERO_NAMESPACE=velero to your make test-e2e call.

@dharmab
Copy link
Contributor Author

dharmab commented Jun 23, 2021

A quick update here- I've continued to have some issues with running e2e-tests even after setting some of the undocumented variables in the e2e-test Makefile. I suspect my test cluster may differ from the standard ones used by the project. I'm also out on extended time off for much of June/July without much internet access so this PR may linger for a few weeks.

@adriananeci
Copy link

What is needed for this PR to be merged? We are currently running a forked version(v1.6.3 + this PR) and want to align back to upstream version.

Copy link
Contributor

@reasonerjt reasonerjt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @dharmab

Copy link
Contributor

@dsu-igeek dsu-igeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the changes

@dsu-igeek dsu-igeek merged commit dc1f179 into vmware-tanzu:main Aug 18, 2021
danfengliu pushed a commit to danfengliu/velero that referenced this pull request Jan 25, 2022
Signed-off-by: Dharma Bellamkonda <bellamko@adobe.com>
gyaozhou pushed a commit to gyaozhou/velero-read that referenced this pull request May 14, 2022
Signed-off-by: Dharma Bellamkonda <bellamko@adobe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support retrieving chunked lists of items when backing up
7 participants