simple stupid monitoring system
this system has been built to run a bunch of simple checks (standard nagios plugins for the most part) on a small bunch of servers.
As most existing monitoring systems seem to be designed for monitoring a huge amount of checks on a large amount of servers they usually are complicated to setup, require a lot of configuration and seem to be overkill for this simple use case. SSMS tries to solve this issue in a simpler (and possibly stupid) way.
It consists of 2 components - the agent and the server
the agent is the component running on the system which needs to be monitored. It's got a static yaml configuration file of the commands to run and in which interval these need to be run.
It runs one thread per check which alternatingly waits for the configured interval and runs the check command. The results of the checks are stored in memory and can be fetched via http as json.
---
password: HttpBasicAuthPassword
defaults:
interval: 120
checks:
- name: SSL example.com
command:
- /usr/lib/nagios/plugins/check_ssl_cert
- "-H"
- example.com
- "-P"
- https
nagios_check: True
- name: apache running
command:
- /bin/systemctl
- status
- apache2.service
The server is running a simple status web interface and fetches the check results from all agents in the configured interval. It's meant to do some alerting in the future as well but this hasn't been implemented yet.
---
frontend_users:
frontend_username: FrontendPassword
defaults:
interval: 120
password: HttpBasicAuthPassword
servers:
- name: Example-Server 1
url: http://exampleserver1:5001
- name: Example-Server 2
url: http://exampleserver2:5001
interval: 30 # you can overwrite interval and password per server
password: AnotherHttpAuthPassword