-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow backup list requests to be chunked #3823
Conversation
8c2a58c
to
75a1f12
Compare
Might be better to use https://pkg.go.dev/k8s.io/client-go/tools/pager#ListPager.List- with the layer of abstraction from the dynamic client it didn't seem like the most trivial option. |
Using ListPager could be a good option, kube controllers used to use this before a refactor to informers system. One later bloomer being worked on right now is the cronjob controller. Example of list page in that controller here: https://github.com/kubernetes/kubernetes/blob/cf59c68e15f35a074e46ba6eae704702d3833e5d/pkg/controller/cronjob/cronjob_controller.go#L113 Note: a trivial difference between the cronjob example and here would be we might have to handle the resource expired type errors in the page function like follows, but otherwise +1 to using the abstraction.
Infact I would think this could be another configurable parameter, the user could choose to tolerate expired list or pay the performance benefit with an accurate snapshot from apiserver directly if they so care. |
75a1f12
to
3a055d3
Compare
I've pushed a new commit using ListPager and updated the PR description. |
3a055d3
to
43e7afb
Compare
Hmm, trying to resolve conflicts seems to have caused some problems, probably because I branched from 1.6.0 instead of master. Time to rebase entirely... |
43e7afb
to
273f6e0
Compare
a030f3c
to
1452c45
Compare
// If limit is positive, use a pager to split list over multiple requests | ||
// Use Velero's dynamic list function instead of the default | ||
listFunc := pager.SimplePageFunc(func(opts metav1.ListOptions) (runtime.Object, error) { | ||
list, err := resourceClient.List(listOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are not planning on checking the type of error, simply return resourceClient.List(listOptions)
might work as well.
Also, can you move the TODO on line 320 here? I think the ResourceExperied
errors will be handled here in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Page buffer size would be handled by setting listPager.PageBufferSize
.
I double checked how ResourceExpired
errors need to be handled and it looks like ListPager
only handles them for us if we use List()
instead of EachListItem()
:( So this needs to be changed to handle the error after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to handle ResourceExpired errors (since EachListItem doesn't do that for us), explicitly handle a casting error, and invert the page size condition to close the negative page size loophole. |
bc68a05
to
9961081
Compare
May this PR have an updated review following the squashed commit/header updates? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, look pretty good, let's default to 500 chunk size. We don't want people to have to go looking for this option when they run into issues.
Also, please run E2E tests against this. Ping me if you need help. |
Signed-off-by: Dharma Bellamkonda <bellamko@adobe.com>
9961081
to
470e5ea
Compare
I'm seeing a lot of errors when running
|
@dharmab When running e2e tests, it looks like you're missing the velero namespace env var. Try adding |
A quick update here- I've continued to have some issues with running e2e-tests even after setting some of the undocumented variables in the e2e-test Makefile. I suspect my test cluster may differ from the standard ones used by the project. I'm also out on extended time off for much of June/July without much internet access so this PR may linger for a few weeks. |
What is needed for this PR to be merged? We are currently running a forked version( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @dharmab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the changes
Signed-off-by: Dharma Bellamkonda <bellamko@adobe.com>
Signed-off-by: Dharma Bellamkonda <bellamko@adobe.com>
Thank you for contributing to Velero!
Please add a summary of your change
Add a
--client-page-size
flag to the Velero server. This flag causes LIST calls for resources other than Namespaces to be chunked into chunks of the given size during backups (i.e. using thelimit
query parameter on the apiserver request). The default value of0
disables chunking, similar to the behavior of the--chunk-size
flag for kubectl.In the case of a ResourceExpired error due to modification of the object list during pagination, the server will attempt to fall back on a non-chunked list. If this request fails, the backup will fail.
I have tested this in my own cluster using
client-page-size
of both0
and500
successfully.Note: I was unable to run most of theThis worked after a rebase.make
commands as noted in #262 (comment). I may need help from a maintainer to complete this pull request.Does your change fix a particular issue?
Fixes #262 (except for Namespace objects)
Please indicate you've done the following:
/kind changelog-not-required
.site/content/docs/main
.