New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[design] Proposal of triggering backups based on Kubernetes events. #2119

Closed

ezzoueidi wants to merge 1 commit into vmware-tanzu:master from ezzoueidi:k8s-event

Contributor

ezzoueidi commented Dec 11, 2019 •

edited by carlisia

Closes #2111

Any new ideas/thoughts would be much appreciated.

Signed-off-by: Naeil Ezzoueidi naeilzoueidi@ubuntu.com

nrb added the Area/Design label


          Add proposal design of triggering backups based on Kubernetes events.

5d3a7b2

Signed-off-by: Naeil Ezzoueidi <naeilzoueidi@ubuntu.com>

Contributor

carlisia commented Dec 11, 2019

👀

Contributor

carlisia commented Dec 12, 2019

I'm still not quite sure that this belongs in Velero core. At the very least, I'd like to see more users requesting this feature.

My initial comment: This proposal has as a goal to trigger a backup in case of an accidental deletion, and to block a deletion of a component if it was performed by a human. I'd like to see it address how the system would distinguish an accidental from an intentional deletion.

The part that proposes blocking deletions could also be addressed with more details. It states it would happen in the case of human intervention (and trigger a backup), but how would the system distinguish this?

nrb suggested changes

View reviewed changes

Contributor

nrb left a comment

While I think triggering of backups based on kubernetes events is definitely an interesting feature, I think it's also one that's complicated.

I'd like more details on how users define the particular resources and actions to watch for before accepting this proposal, and even if accepted, I don't believe that's a commitment from the core team to implement it any time soon.

That being said, I also believe this is something that could very well be implemented as an external controller by someone else, and deployed alongside Velero.

Also, the 3rd goal mentioned doesn't seem to fit to me - I would feel better about this proposal if it was excluded.

design/trigger-backups-api-events.md


		## Non Goals

		- N/A

Contributor

nrb Dec 16, 2019

I think some non-goals here might be useful in providing constraints around the design.

design/trigger-backups-api-events.md

+              ## Goals
+              - Rather than performing backups manually or by scheduling them, backups are created based on events.
+              - Recovering the components of the cluster when they were deleted accidently.

Contributor

nrb Dec 16, 2019

Can you clarify this some? How is this different from what Velero already does?

Contributor Author

ezzoueidi Dec 29, 2019

With Velero, we create backups manually or by scheduling them, what is different for waht Velero already does is that we trigger backups for the components that are being called to be deleted

Contributor

nrb Jan 7, 2020

So you're envisioning that this would intercept a delete request and back up the components before allowing them to delete?

design/trigger-backups-api-events.md

+              - Rather than performing backups manually or by scheduling them, backups are created based on events.
+              - Recovering the components of the cluster when they were deleted accidently.
+              - Blocking the delete process of a component in the cluster if it was a human intervention and trigger a backup.

Contributor

nrb Dec 16, 2019

This is an interesting idea - can you go into detail on how you'd determined whether the deletion was triggered by a human or not?

Contributor Author

ezzoueidi Dec 29, 2019

I am thinking about watching all the delete events and listening to the kubectl delete commands, this way we could distinguish if the action has been performed by a human or not.

Contributor

nrb Jan 7, 2020

Ok. Do you know how to distinguish this via the events emitted on the Kubernetes API server? They may labelled or annotated differently, but I'm not sure.

If they are, I'd like to see that called out in the document.

design/trigger-backups-api-events.md


		## Detailed Design

		A custom controller where it have its own refelctor to list and watch a specific components using the kubernetes watch api.

Contributor

nrb Dec 16, 2019

How would users specify what components to watch? How would users specify what events to watch for?

Contributor Author

ezzoueidi Dec 29, 2019

this could be achieved through namespace selectors. And for specifying what events to watch; this should be by default watching any delete events..

Contributor

nrb Jan 7, 2020

Namespaces are one way, sure. I was also thinking about specifying per resource type.

I'd recommend providing explicit examples of what the user would type to list out events and namespaces. What would the configuration file or CRD look like? Would it be an addition to an existing CRD? These kinds of questions would be good to be answered in the proposal itself.

design/trigger-backups-api-events.md

+              A custom controller where it have its own refelctor to list and watch a specific components using the kubernetes watch api.
+              It will add the object mentored with its status/current event to a queue.
+              Then pop up the object based on its status (criticity and priority).

Contributor

nrb Dec 16, 2019

How is the priority determined?

Contributor Author

ezzoueidi Dec 29, 2019

I would say based on labels. E.g status: critical

Contributor

nrb Jan 7, 2020

So each object would have to be labelled? Would this be a standard label, or are we introducing a new label just for this?

You can incorporate your answers into the proposal doc.

design/trigger-backups-api-events.md

+              It will add the object mentored with its status/current event to a queue.
+              Then pop up the object based on its status (criticity and priority).
+              Block the delete process if it was performed from a human being and trigger a backup for that component.
+              Trying to catch the errors happening on the components so it predicts when they will be down and trigger backups.

Contributor

nrb Dec 16, 2019

I'm not sure how this part connects to backups - there's something I'm missing.

Contributor Author

ezzoueidi Dec 29, 2019

For example, we could use the help of other monitoring tools (promtail and loki), based on metrics we can for example predict if the application is going to be down (consuming many resources e.g memory, disk, etc..).

Contributor

nrb Jan 7, 2020

I think that's getting outside of Velero's scope and into machine learning. While I think it's useful functionality for sure, I don't think I would accept this portion into Velero core.

nrb assigned ezzoueidi

Contributor Author

ezzoueidi commented Dec 23, 2019

Thank you @carlisia and @nrb for the review and for the comments, sorry for the delay though, I will reply and push some modifications this week.

Contributor Author

ezzoueidi commented Dec 29, 2019

I'm still not quite sure that this belongs in Velero core. At the very least, I'd like to see more users requesting this feature.

This also could be implemented as a plugin with new kind. IMHO, this is a good feature as it really covers a real scenario that can happen.

My initial comment: This proposal has as a goal to trigger a backup in case of an accidental deletion, and to block a deletion of a component if it was performed by a human. I'd like to see it address how the system would distinguish an accidental from an intentional deletion.

I would say that this is the role of a custom controller that uses the watch kubernetes api and listens on any kubectl delete commands to distinguish if the actions has been performed by a human or not.
Either ways, I see that backing up the components in both situations is good, what do you think?

nrb reviewed

View reviewed changes

Contributor

nrb left a comment

This also could be implemented as a plugin with new kind. IMHO, this is a good feature as it really covers a real scenario that can happen.

I honestly see this proposal as being far larger than a plugin. It's a new type of controller, possibly multiple controllers. These controllers could also easily live outside the Velero codebase, I think, triggering backups by submitting a Backup CRD to the kubernetes API server.

Either ways, I see that backing up the components in both situations is good, what do you think?

I think that this is a complicated thing to achieve. I would like a lot more detail on exactly how a system would determine whether a deletion was from a human or automated - was it a cascading delete? Are there finalizers blocking deletion? And when I say detail, I think here I would really like examples of data structure fields or functions showing how the decision is made.

I'd also like to know - if this is based on deletion events, is the intention to force the delete to pause until the backup runs and completes? What happens in the case of a finalizer that isn't removed and the deletion never actually happens?

These scenarios can get pretty complicated, and I'd like to see them at least mentioned in the proposal.

Contributor

carlisia commented Jan 28, 2020

I'm putting this on the agenda for tomorrow's meeting. @nzoueidi if you could join it would be great: https://velero.io/community/.

Contributor

carlisia commented Jan 28, 2020

We discussed this in our community meeting today. The general consensus is that this request overall is useful but it is a great use-case for an operator, and not so much for inclusion into the Velero code base.

We will leave this PR open for a couple weeks with the intent of welcoming additional opinions in favor of this request, in case we are missing the needs of more users and how they would be using this. Else we'll close it but this conversation can always be picked up later.

Will post a link to the meeting recording once it's up on YT.

Contributor

carlisia commented Jan 28, 2020

Recording of our meeting: https://www.youtube.com/watch?v=SrGH2_ufWJ8&list=PL7bmigfV0EqQRysvqvqOtRNk4L5S7uqwM&index=2&t=0s

Member

skriss commented Feb 11, 2020

we haven't gotten any further input here; I'm going close this out.

If someone works on this outside of core Velero, we'd love to see it!

skriss closed this

skriss mentioned this pull request

Trigger backups based on k8s events #2111

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment