Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup.Status.CSIVolumeSnapshotsCompleted < CSIVolumeSnapshotsAttempted or nil when backup phase is completed #7047

Closed
kaovilai opened this issue Nov 1, 2023 · 3 comments · Fixed by #7046
Assignees
Labels
Area/CSI Related to Container Storage Interface support target/1.12.2
Milestone

Comments

@kaovilai
Copy link
Contributor

kaovilai commented Nov 1, 2023

What steps did you take and what happened:

It is possible for runBackup function, which is the only place that updates

  • Backup.Status.CSIVolumeSnapshotsCompleted
  • backup.Status.VolumeSnapshotsCompleted

to finish and move Phase towards and including completion without waiting for XSnapshotsCompleted to all be completed first.

What did you expect to happen:

  • Backup to eventually (even after completion phase is ok) update its backup.Status.VolumeSnapshotsCompleted

Solutions:

  • another controller in the background to update XCompleted status for snapshots
  • copy following into other backup_x_controller.go and don't stop processing until backup.Status.VolumeSnapshotsCompleted == backup.Status.VolumeSnapshotsAttempted or timeout

backup.Status.CSIVolumeSnapshotsCompleted++

Anything else you would like to add:

Original issue: https://issues.redhat.com/browse/OADP-3005

Environment:

  • Velero version (use velero version):
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@sseago
Copy link
Collaborator

sseago commented Nov 1, 2023

This is a regression from 1.11 because in 1.11, CSI snapshotting was synchronous, so it was all complete at end of backup_controller processing. Now that waiting for ReadyToUse is done asynchronously, in many cases none of the snapshots are complete when we calculate the number of complete snapshots. Recalculating this one more time in the finalizer controller is probably the easiest way to fix this.

@Lyndon-Li Lyndon-Li added the Area/CSI Related to Container Storage Interface support label Nov 2, 2023
@Lyndon-Li Lyndon-Li added this to the v1.13 milestone Nov 6, 2023
@allenxu404
Copy link
Contributor

Hi @kaovilai, Has the PR been merged into release-1.12 yet? If not, could you help cherry-pick it to release-1.12?

@kaovilai
Copy link
Contributor Author

Done! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/CSI Related to Container Storage Interface support target/1.12.2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants