Skip to content
This repository has been archived by the owner on Sep 14, 2020. It is now read-only.

Need advice on the way to achieve service monitoring using a daemon #379

Open
2 tasks done
gautamp8 opened this issue Jun 26, 2020 · 3 comments
Open
2 tasks done
Labels
question Further information is requested

Comments

@gautamp8
Copy link
Contributor

Question

I want to monitor metrics coming from a Service object my handler has created. On certain logic - like a metric crossing a certain threshold, I want to update the number of replicas of a deployment object(another resource created by operator itself).

I read about Daemons from the documentation. I just want to know if using a daemon to make an API call to service at regular intervals to get that metric and patch the deployment accordingly is the right approach or not.

This approach looked right from this excerpt of documentation - What to use when?

FWIW - I'm trying to build a POC for a celery operator using Kopf. I wish to use flower service my operator has created to monitor the broker queue length and autoscale the worker deployments accordingly. I'm a beginner to the operator world rn, It would be good to know some advice on any specifics/gotchas I should take care of for this particular scenario.

Let me know if this question is not very informative or need changes to title/description to make it helpful to others.

Checklist

Keywords

monitoring, service, daemon examples, daemon use-case

@gautamp8 gautamp8 added the question Further information is requested label Jun 26, 2020
@eshepelyuk
Copy link

Quite interesting topic. Want to hear feedback as well.

@nolar
Copy link
Contributor

nolar commented Jul 2, 2020

@Brainbreaker Sorry for slightly late response for almost a week.

Yes, using either daemons or timers is the right way to do this task. They were actually designed for the purpose of monitoring something "outside", i.e. for this exact purpose.

Depending on whether the remote system can do "blocking" long-running requests (like K8s's own watching), it could be daemons. If it cannot — which I expect for the majority of the systems — timers are better, they save a little bit of RAM for you.

Timers are also better for another reason. When a remote system provides some metrics, you can put them into the resource's status via the patch kwarg (patch.status['smthng'] = value), and it will be applied to the resource once the timer function is finished. On every timer's occasion. For daemons, you have to apply that yourselves (because daemons never exit, kind of). Not a big deal, but can be convenient.

There is one aspect you should work out in advance: the error handling. If updating of a deployment fails, what should follow? In theory, timers are retried with backoff=60 (seconds; configurable; not the same as interval=…), so everything might be okay and as expected. But maybe you do not want to poll the remote system too often on every retry, so you would want to store the retrieved metric on the resource into a field .status.m, and modify the attached deployment in the on-update/on-field(field=status.m) handlers separately. But these are low-level details, actually. And over-complication based on assumptions.

Let me know if it works for you. If there are any confusing moments in using timers/daemons for this exact task, they'd better be fixed & clarified in the framework.

@gautamp8
Copy link
Contributor Author

gautamp8 commented Jul 4, 2020

@nolar Thank you so much for the detailed response.

Yes, using either daemons or timers is the right way to do this task. They were actually designed for the purpose of monitoring something "outside", i.e. for this exact purpose.
Got it.

Depending on whether the remote system can do "blocking" long-running requests (like K8s's own watching), it could be daemons. If it cannot — which I expect for the majority of the systems — timers are better, they save a little bit of RAM for you.

Makes sense. Thanks for mentioning this.

Timers are also better for another reason. When a remote system provides some metrics, you can put them into the resource's status via the patch kwarg (patch.status['smthng'] = value), and it will be applied to the resource once the timer function is finished. On every timer's occasion. For daemons, you have to apply that yourselves (because daemons never exit, kind of). Not a big deal, but can be convenient.

There is one aspect you should work out in advance: the error handling. If updating of a deployment fails, what should follow? In theory, timers are retried with backoff=60 (seconds; configurable; not the same as interval=…), so everything might be okay and as expected. But maybe you do not want to poll the remote system too often on every retry, so you would want to store the retrieved metric on the resource into a field .status.m, and modify the attached deployment in the on-update/on-field(field=status.m) handlers separately. But these are low-level details, actually. And over-complication based on assumptions.

These are actually good insights that I couldn't think of in the first go. Thank you for sharing this. I'm going to try out Timers for this task and report back anything I'm confused about/need help with. Would be willing to improve documentation too if needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants