-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage and OOM killed during maintenance tasks #7510
Comments
Duplicate of #7291 |
I don't think this is really a duplicate 🤔 |
@hbollon Moreover, please keep the data in the repo, we may need to try troubleshooting, fix verification in the repo since not all env could reproduce the issue. |
Hello @Lyndon-Li |
Observing this on one of our project clusters as well. Velero pod keeps crashing with OOM. Version 1.13.0 We have round about 1TB of files. Mostly images 5-15mb in size. So there are many files. Current resources configured:
Will try increasing. |
Yep, bumping those solved it for now! |
@Lyndon-Li @hbollon any updates on this issue? I'm experiencing a similiar problem from a specific environment where velero pod crashes with 2GB of memory limit, but somehow works with 4GB. On the other hand on multiple (even bigger) environments there's no need to increase mem limit to more than 1GB. Is this specific to 1.13.0 - any chance it's fixed in 1.13.1/2? Thanks |
@hbollon @contributorr @contributorr Please note that not all memory usage are irrational, varying from the status of the file system (e.g., more files, smaller files), it may take more memory than others. |
No it is not specific to 1.13. The improvements will be only in 1.14 |
@hbollon @contributorr |
The problem in @hbollon's environment is reproduced locally. Here is the details:
1.14 (integrates Kopia 0.17) doesn't solve this problem ultimately, but 1.14 does something better:
The problem still happens for 1.14 when huge number of indexes are generated in one backup or consecutive backups in a short time (e.g., 24 hours). So there will be following up fixes post 1.14. The plan is we will find a way to reduce the number of indexes to compact each time, so that controllable memory is used. |
Hello team, we are using Velero for a new k8s platform on-premise (using k3s) to backup some of our mounted PVC using FSB feature. We have deployed Velero using the helm chart.
We're using it with Kopia uploader to be able to use
.kopiaignore
file to configure some paths to ignore during backups.The backups storage is located on Scaleway Object Storage and the bucket size is about ~850GB of backup data (38 723 files).
First backup is successful but after that one the Velero pod start to crashloop due to OOM during maintenance tasks (we have configured 6GB memory limit for this Velero pod which should be more than sufficient no?)
The last logs I have before the OOM:
I tried to give as much context and informations as possible but if you need others details don't hesitate to ping me, it's a quite urgent issue to us...
What did you expect to happen:
I don't think it's normal that Velero takes so much memory in just a minute during maintenance tasks.
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
bundle-2024-03-08-09-47-09.tar.gz
Environment:
velero version
): v1.13.0velero client config get features
):kubectl version
): v1.28.3/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: