Need advice on the way to achieve service monitoring using a daemon #379

gautamp8 · 2020-06-26T18:43:52Z

Question

I want to monitor metrics coming from a Service object my handler has created. On certain logic - like a metric crossing a certain threshold, I want to update the number of replicas of a deployment object(another resource created by operator itself).

I read about Daemons from the documentation. I just want to know if using a daemon to make an API call to service at regular intervals to get that metric and patch the deployment accordingly is the right approach or not.

This approach looked right from this excerpt of documentation - What to use when?

FWIW - I'm trying to build a POC for a celery operator using Kopf. I wish to use flower service my operator has created to monitor the broker queue length and autoscale the worker deployments accordingly. I'm a beginner to the operator world rn, It would be good to know some advice on any specifics/gotchas I should take care of for this particular scenario.

Let me know if this question is not very informative or need changes to title/description to make it helpful to others.

Checklist

I have read the documentation and searched there for the problem
I have searched in the GitHub Issues for similar questions

Keywords

monitoring, service, daemon examples, daemon use-case

The text was updated successfully, but these errors were encountered:

eshepelyuk · 2020-06-26T19:25:21Z

Quite interesting topic. Want to hear feedback as well.

nolar · 2020-07-02T19:04:58Z

@Brainbreaker Sorry for slightly late response for almost a week.

Yes, using either daemons or timers is the right way to do this task. They were actually designed for the purpose of monitoring something "outside", i.e. for this exact purpose.

Depending on whether the remote system can do "blocking" long-running requests (like K8s's own watching), it could be daemons. If it cannot — which I expect for the majority of the systems — timers are better, they save a little bit of RAM for you.

Timers are also better for another reason. When a remote system provides some metrics, you can put them into the resource's status via the patch kwarg (patch.status['smthng'] = value), and it will be applied to the resource once the timer function is finished. On every timer's occasion. For daemons, you have to apply that yourselves (because daemons never exit, kind of). Not a big deal, but can be convenient.

There is one aspect you should work out in advance: the error handling. If updating of a deployment fails, what should follow? In theory, timers are retried with backoff=60 (seconds; configurable; not the same as interval=…), so everything might be okay and as expected. But maybe you do not want to poll the remote system too often on every retry, so you would want to store the retrieved metric on the resource into a field .status.m, and modify the attached deployment in the on-update/on-field(field=status.m) handlers separately. But these are low-level details, actually. And over-complication based on assumptions.

Let me know if it works for you. If there are any confusing moments in using timers/daemons for this exact task, they'd better be fixed & clarified in the framework.

gautamp8 · 2020-07-04T05:01:18Z

@nolar Thank you so much for the detailed response.

Yes, using either daemons or timers is the right way to do this task. They were actually designed for the purpose of monitoring something "outside", i.e. for this exact purpose.
Got it.

Depending on whether the remote system can do "blocking" long-running requests (like K8s's own watching), it could be daemons. If it cannot — which I expect for the majority of the systems — timers are better, they save a little bit of RAM for you.

Makes sense. Thanks for mentioning this.

Timers are also better for another reason. When a remote system provides some metrics, you can put them into the resource's status via the patch kwarg (patch.status['smthng'] = value), and it will be applied to the resource once the timer function is finished. On every timer's occasion. For daemons, you have to apply that yourselves (because daemons never exit, kind of). Not a big deal, but can be convenient.

There is one aspect you should work out in advance: the error handling. If updating of a deployment fails, what should follow? In theory, timers are retried with backoff=60 (seconds; configurable; not the same as interval=…), so everything might be okay and as expected. But maybe you do not want to poll the remote system too often on every retry, so you would want to store the retrieved metric on the resource into a field .status.m, and modify the attached deployment in the on-update/on-field(field=status.m) handlers separately. But these are low-level details, actually. And over-complication based on assumptions.

These are actually good insights that I couldn't think of in the first go. Thank you for sharing this. I'm going to try out Timers for this task and report back anything I'm confused about/need help with. Would be willing to improve documentation too if needed.

gautamp8 added the question Further information is requested label Jun 26, 2020

gautamp8 mentioned this issue Jun 28, 2020

Implement autoscale/downscale workers based on queue length celery/Celery-Kubernetes-Operator#6

Closed

kopf-archiver bot mentioned this issue Aug 19, 2020

Need advice on the way to achieve service monitoring using a daemon nolar/kopf#379

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need advice on the way to achieve service monitoring using a daemon #379

Need advice on the way to achieve service monitoring using a daemon #379

gautamp8 commented Jun 26, 2020

eshepelyuk commented Jun 26, 2020

nolar commented Jul 2, 2020

gautamp8 commented Jul 4, 2020

Need advice on the way to achieve service monitoring using a daemon #379

Need advice on the way to achieve service monitoring using a daemon #379

Comments

gautamp8 commented Jun 26, 2020

Question

Checklist

Keywords

eshepelyuk commented Jun 26, 2020

nolar commented Jul 2, 2020

gautamp8 commented Jul 4, 2020