-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow-on fixes for BIAv2 controller work #5971
Conversation
dc24acf
to
a4f0ea8
Compare
Codecov Report
@@ Coverage Diff @@
## main #5971 +/- ##
==========================================
- Coverage 39.75% 39.70% -0.06%
==========================================
Files 256 256
Lines 23237 23135 -102
==========================================
- Hits 9239 9185 -54
+ Misses 13300 13260 -40
+ Partials 698 690 -8
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@@ -334,7 +249,7 @@ func (c *asyncBackupOperationsReconciler) updateBackupAndOperationsJSON( | |||
removeIfComplete = false | |||
return errors.Wrap(err, "error uploading backup json") | |||
} | |||
if err := operations.uploadProgress(backupStore, backup.Name); err != nil { | |||
if err := operations.UploadProgress(backupStore, backup.Name); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there is still a contest to upload the same object:
itemOperationsMap.PutOperationsForBackup
update the cache when there are changes- When describing the same backup, it calls
ItemOperationsMap.UpdateForBackup
. If there are changes, the later callsUploadProgress
to upload the object - At the meantime, another turn of reconciler may see the operation comes to terminal phase and call the
UploadProgress
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lyndon-Li It's not a problem, though. While progress may be slightly out-of-date due to the time uploading/downloading, the value will be correct as of the time the user issued the command. For 1.12 we want to add a new CRD to pull status directly from the velero pod, which will eliminate this bit, but that is too large a change to fit into the 1.11 time frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sseago I think the contest will cause some problem though the opportunity is very small:
- In the backup_operation controller, a backup is seen to get the terminal phase, so
operations.UploadProgress
is called here at line 252, therefore, the operation persisted in the backup store is with a terminal status - Meanwhile, the operation should be removed because it has completed, but as you can see the deletion happens at line 228
itemOperationsMap.DeleteOperationsForBackup
, which is inside a defer function, or means, it is not in the same atomic operation withoperations.UploadProgress
- Suppose right before
itemOperationsMap.DeleteOperationsForBackup
is called, a backup describe is triggered, so it callsItemOperationsMap.UpdateForBackup
, the later can still get the operation from the cache map because it is not deleted yet, but the one it gets is not in a terminal phase - Then
itemOperationsMap.UpdateForBackup
callsoperations.UploadProgress
and upload a operation in the backup store with a unterminated status, or out-of-date operation - Then
itemOperationsMap.DeleteOperationsForBackup
is called - Finally, when users call backup describe again, nothing will be uploaded again because the operation has been deleted from the cache
- As a result, the operation in the backup store will be in a unterminated status forever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the fix effort, it should be very small: look like we miss one PutOperationsForBackup
between operations.UploadProgress
and itemOperationsMap.DeleteOperationsForBackup
when the backup is detected to get the terminal status. But this Put should be atomic with either Upload or Delete somehow. For sure, not all cases need this way, only the last upload/deletion does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lyndon-Li Ahh, I see the issue. So while the map maintains pointers to OperationsForBackup
structs, GetOperationsForBackup
gets a deep copy (so we don't need to hold the lock for long) -- but this means that there's a separation between upload of a list of operation and the map getting updated. I think the solution here may be to create BackupItemOperationsMap.UploadProgressAndPutOperationsForBackup
which grabs the lock, uploads progress, and updates the map atomically. Then I can make OperationsForBackup.UploadProgress
an unexported func and only call it from the method above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Now when the controller uploads progress, we also update the operations map in the same function call, and it's done while holding the operations map lock, so the describe call into the map won't happen in-between these actions.
@sseago Reasons:
|
@Lyndon-Li I agree with your suggestions for the finalize phase name change. I'll fix that when I rebase this PR to deal with the new conflicts with main branch. |
7a86eac
to
53ddf1b
Compare
Signed-off-by: Scott Seago <sseago@redhat.com>
LGTM |
Thank you for contributing to Velero!
Please add a summary of your change
Follow-on PR for BIAv2 controller implementation.
This includes the changes discussed but not implemented in #5849
This includes several items:
Does your change fix a particular issue?
Fixes #(issue)
Please indicate you've done the following:
/kind changelog-not-required
as a comment on this pull request.site/content/docs/main
.