-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[design] Proposal of triggering backups based on Kubernetes events. #2119
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Trigger backups based on Kubernetes events | ||
|
||
Triggering backups with Velero based on Kubernetes api events (Terminating, deleting, CrashLoopBackOff, etc..). | ||
|
||
## Goals | ||
|
||
- Rather than performing backups manually or by scheduling them, backups are created based on events. | ||
- Recovering the components of the cluster when they were deleted accidently. | ||
- Blocking the delete process of a component in the cluster if it was a human intervention and trigger a backup. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an interesting idea - can you go into detail on how you'd determined whether the deletion was triggered by a human or not? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am thinking about watching all the delete events and listening to the kubectl delete commands, this way we could distinguish if the action has been performed by a human or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. Do you know how to distinguish this via the events emitted on the Kubernetes API server? They may labelled or annotated differently, but I'm not sure. If they are, I'd like to see that called out in the document. |
||
|
||
## Non Goals | ||
|
||
- N/A | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think some non-goals here might be useful in providing constraints around the design. |
||
|
||
## Background | ||
|
||
## High-Level Design | ||
|
||
A custom controller that list and watch all the events of Kubenretes and based on that it trigger backups. | ||
|
||
## Detailed Design | ||
|
||
A custom controller where it have its own refelctor to list and watch a specific components using the kubernetes watch api. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would users specify what components to watch? How would users specify what events to watch for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this could be achieved through namespace selectors. And for specifying what events to watch; this should be by default watching any delete events.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Namespaces are one way, sure. I was also thinking about specifying per resource type. I'd recommend providing explicit examples of what the user would type to list out events and namespaces. What would the configuration file or CRD look like? Would it be an addition to an existing CRD? These kinds of questions would be good to be answered in the proposal itself. |
||
It will add the object mentored with its status/current event to a queue. | ||
Then pop up the object based on its status (criticity and priority). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is the priority determined? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say based on labels. E.g status: critical There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So each object would have to be labelled? Would this be a standard label, or are we introducing a new label just for this? You can incorporate your answers into the proposal doc. |
||
Block the delete process if it was performed from a human being and trigger a backup for that component. | ||
Trying to catch the errors happening on the components so it predicts when they will be down and trigger backups. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure how this part connects to backups - there's something I'm missing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For example, we could use the help of other monitoring tools (promtail and loki), based on metrics we can for example predict if the application is going to be down (consuming many resources e.g memory, disk, etc..). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that's getting outside of Velero's scope and into machine learning. While I think it's useful functionality for sure, I don't think I would accept this portion into Velero core. |
||
|
||
## Alternatives Considered | ||
|
||
Apply the same design in a separate plugin as a new kind. | ||
|
||
## Security Considerations | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify this some? How is this different from what Velero already does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Velero, we create backups manually or by scheduling them, what is different for waht Velero already does is that we trigger backups for the components that are being called to be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're envisioning that this would intercept a delete request and back up the components before allowing them to delete?