Monitoring Eucalyptus Infrastructure using Nagios
Perl Shell
Switch branches/tags
Nothing to show
Pull request Compare This branch is 2 commits ahead, 1 commit behind monolive:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
nagios
nrpe
README.md

README.md

nagios-eucalyptus

Overview

This are some simple Nagios check script to monitor your Eucalyptus cloud. You should probably also monitor all the normal components of a Linux host : CPU, RAM, Disk, SWAP, IO, ...

Prerequisite

The scripts are using Nagios Remote Plugin Executor / NRPE (http://exchange.nagios.org/directory/Addons/Monitoring-Agents/NRPE--2D-Nagios-Remote-Plugin-Executor/details) to communicate in between Nagios server and host.

Installation on Clients

Check NRPE is working

You can check NRPE is working by running check_nrpe from your nagios server. The following command, should return the version of NRPE agent on your client.


/your/nagios/plugins/directory/check_nrpe -H Client_IP
NRPE v2.12

Add the plugins on each nodes

Copy the content of the nrpe plugins directory to your nagios plugin directory. All scripts must be executable.

You need to amend your nrpe configuration to declare your new scripts.


# Check loop takes two arguments. It will generate a warning if you've got less than 10 and an error for less than 5.
command[check_loopback]=/path/to/your/nagios/plugins/check_loop.sh  10 5

# for Ubuntu 
command[check_libvirtd_ubuntu]=/path/to/your/nagios/plugins/check_upstart_status.pl -j libvirt-bin

# for CentOS
command[check_libvirtd_centos]=/path/to/your/nagios/plugins/check_exit_status.pl -s /etc/init.d/libvirtd

Check Upstart http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/Check-Upstart-Job-Status/details

Check Exit Status http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/Check--2Fetc-2Finit-2Ed-script-status/details

Install in Nagios

Declare your commands

Copy the script available in nagios into the nagios plugin directory of your nagios server.

Edit your commands.cfg and add the following


define  command {
        command_name    check_nrpe_command
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

define  command {
        command_name    check_nrpe_command_args
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$
}

define command {
        command_name    check_euca_addresses
        command_line    $USER1$/check_euca_addresses.sh $ARG1$ $ARG2$
}

define command {
        command_name    check_tcp_service
        command_line    $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$
}

define command {
        command_name    check_euca_capacity
        command_line    $USER1$/check_euca_capacity.sh $ARG1$ $ARG2$
}
define command {
        command_name    check_basic_test
        command_line    $USER1$/euca_basic_test.sh
}

Check Eucalyptus cloud service

The cloud service is running on your Cloud Controller, Storage Controller, Walrus.


define service {
        use                             local-service         ; Name of service template to use
        hostgroup_name                  Cloud Controller, Walrus, Storage Controller
        service_description             Eucalyptus-Cloud Service TCP Listen
        check_command                   check_tcp!8773
}

Check Eucalyptus cluster controller service

The cluster controller service ( eucalyptus-cc ) is running on your cluster Controller


define service {
        use                             local-service         ; Name of service template to use
        hostgroup_name                  Cluster Controller
        service_description             Eucalyptus CC Service TCP Listen
        check_command                   check_tcp!8774
}

Check Eucalyptus node controller service

The node controller service ( eucalyptus-nc ) is running on all NCs


define service {
        use                             local-service         ; Name of service template to use
        hostgroup_name                  Node Controller
        service_description             Eucalyptus-nc Service TCP Listen
        check_command                   check_tcp!8775
}

Check libvirtd

On Xen / KVM based Cloud, check that libvirt daemon is running on your Node Controllers


define service {
        use                             local-service         ; Name of service template to use
        hostgroup_name                  Eucalyptus NC
        service_description             Check libvirtd service
        check_command                   check_nrpe_command!check_libvirtd
}

Check available public IP

This will check the amount of public IP available. It will issue a warning if 80% of IP are allocated and a critical error at 95%.

It should run only from one node.


define service{
        use                             local-service         ; Name of service template to use
        host_name                       localhost
        service_description             Eucalyptus IP Addresses
        check_command                   check_euca_addresses!80!95
}

Check cloud capacity

This will check the amount of core available on your cloud. It will issue a warning if 80% of your ressources are allocated and a critical error at 95%

It should run only from one node.



define service{
        use                             local-service         ; Name of service template to use
        host_name                       localhost
        service_description             Eucalyptus Available Capacity
        check_command                   check_euca_capacity!80!95
}

Check Loopback

This will check how many loopback devices are available. It should be used on the Storage Controller / Node Controller.


define service{
        use                             local-service         ; Name of service template to use
        hostgroup_name                  Eucalyptus Servers
        service_description             Check loopback device availability
        check_command                   check_nrpe_command!check_loopback
}

How to extend monitoring.

You can also use eutester to extend Eucalyptus monitoring. Eutester will check that not only the components are available they can also be started / accessed.

eutester : https://github.com/eucalyptus/eutester