Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature to create a schedule for maintenance #5307

Open
navilg opened this issue Sep 7, 2022 · 11 comments
Open

Feature to create a schedule for maintenance #5307

navilg opened this issue Sep 7, 2022 · 11 comments
Assignees
Labels
Enhancement/User End-User Enhancement to Velero

Comments

@navilg
Copy link

navilg commented Sep 7, 2022

Describe the problem/challenge you have
[A description of the current limitation/problem/challenge that you are experiencing.]

I have multiple backups running on my cluster which are few hours apart. When maintenance runs, Many time (mostly) backup in progress and maintenance clashes (I believe seeing the error message) and then we get error restic prune failed since repo is locked by another process.

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

It would have been useful if we could schedule the maintenance same as we schedule backup. In this way, users will have flexibility to choose maintenance time and frequency as per their need. Schedule can be anything, either installation time or post-installation feature.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Velero version (use velero version): 1.6.2
  • Kubernetes version (use kubectl version): 1.17.9
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@Lyndon-Li
Copy link
Contributor

Do you have multiple BackupStorageLocation that all points to the same bucket in the same object store?

@navilg
Copy link
Author

navilg commented Sep 8, 2022

Do you have multiple BackupStorageLocation that all points to the same bucket in the same object store?

No. I have only one backupstoragelocation (default), pointing to a S3 bucket.

@navilg
Copy link
Author

navilg commented Sep 8, 2022

@Lyndon-Li Can velero backup and restic maintenance run concurrently ? Or is there any check in place to check if any backup is progress before restic prune runs automatically everyweek ?

@cleverhu
Copy link
Contributor

cleverhu commented Sep 9, 2022

Before each backup, judge whether there is a backup in progress. If there is an ongoing task, try again later. Another way is to let users set how many goroutinues can be set to back up at the same time. What do you think? @Lyndon-Li

@cleverhu
Copy link
Contributor

cleverhu commented Sep 9, 2022

I pulled a request for this, but I'm not sure if it's appropriate? Is there a more elegant way to achieve it.

@navilg
Copy link
Author

navilg commented Sep 9, 2022

@cleverhu I was checking if there is any check in place before running maintenance task. Maintenance task (restic prune) fails if restic repo is already locked by another process (backup task or other). So if there is a check/judge before running restic prune, so that it only starts when there is no ongoing task which has locked the repo or else it should retry the maintenance task after sometime.

@cleverhu
Copy link
Contributor

cleverhu commented Sep 9, 2022

I found a issue talked about this, #487. It seems better.

@cleverhu
Copy link
Contributor

cleverhu commented Sep 9, 2022

@cleverhu I was checking if there is any check in place before running maintenance task. Maintenance task (restic prune) fails if restic repo is already locked by another process (backup task or other). So if there is a check/judge before running restic prune, so that it only starts when there is no ongoing task which has locked the repo or else it should retry the maintenance task after sometime.

The contributor said that only one backup can be handling on at the same time, so my code should be superfluous. I have to continue to work hard to find out how to detect that restrict is occupied.

@qiuming-best qiuming-best added the Enhancement/User End-User Enhancement to Velero label Sep 14, 2022
@Lyndon-Li
Copy link
Contributor

@navilg

Can velero backup and restic maintenance run concurrently ?

This is not decided by Velero but by Restic. That is Restic doesn't allows the backup and maintenance to run concurrently. On the other hand, Velero is preparing to integrate with other repositories, they may not have this restriction, for example, Kopia.

is there any check in place to check if any backup is progress before restic prune runs automatically everyweek

This kind of check won't work in all cases, for example, the backup and maintenance run in different processes, it is hard to make the synchronization in this case.
We indeed have an option default-restic-prune-frequency along with the Velero server, the default value for Restic is 1 week. Therefore, I don't think this frequency causes the current problem, so the maintenance scheduler won't fix the problem.
On the other hand, as the current behavior, if the current maintenance fails, Velero tries to restart it as soon as possible, it means, the failure recovery doesn't respect to the frequency value. I think this is the problem you are facing. We already have an issue.

Further speaking, different repositories have an ideal maintenance frequency, with the ideal frequency, we can get the optimized performance of the repository and the maintenance itself. Therefore it's better use the default one instead of let users control this flexibly, unless there is a particular reason.

@Lyndon-Li
Copy link
Contributor

@cleverhu

Before each backup, judge whether there is a backup in progress. If there is an ongoing task, try again later. Another way is to let users set how many goroutinues can be set to back up at the same time

I didn't get your meaning -- why do you want to check an ongoing backup for another backup since we are talking about the backup - maintenance contention problem.
If you are talking about checking ongoing backup for a maintenance task or vice versa, two reasons indicate that it is not an urgent task:

  • Not all repositories have this restriction, so we need to do this for Restic only
  • There is no problem for a backup/maintenance to fail once as long as we can restart it in a rational time later. This is even true for maintenance, because we can even skip it once. For a backup, it is better that we can restart a failed backup, but Velero currently cannot support this, this requires a lot of work

@Lyndon-Li
Copy link
Contributor

Conclusively, at present I don't think we need to change the behavior of maintenance schedule for below two reasons:

  • We will have Kopia repository in v1.10, and it brings much more friendly experience of maintenance
  • The frequent failures of Restic maintenance doesn't because of the scheduler itself but because of this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement/User End-User Enhancement to Velero
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants