No description, website, or topics provided.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
doc
.project
COPYING
README
changelog.txt
check_gpu_sensor
check_gpu_sensor.POD

README

                            check_gpu_sensor
                            =================

 check_gpu_sensor: Nagios/Icinga plugin to check GPU sensors via NVML

 Copyright (C) 2011-2013 Thomas-Krenn.AG,

 This program is free software; you can redistribute it and/or modify it under
 the terms of the GNU General Public License as published by the Free Software
 Foundation; either version 3 of the License, or (at your option) any later
 version.
 
 This program is distributed in the hope that it will be useful, but WITHOUT
 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
 FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
 details.
 
 You should have received a copy of the GNU General Public License along with
 this program; if not, see <http://www.gnu.org/licenses/>.

 HOWTO:
 ------
 http://www.thomas-krenn.com/en/wiki/GPU_Sensor_Monitoring_Plugin
 
 Current Updates:
 ----------------
 https://github.com/thomas-krenn/check_gpu_sensor_v1.git
 
 Example Output:
 ---------------
 Warning - Tesla K20 [persistenceMode = Warning]|ECCL2AggSgl=0;1;2;
 ECCTexAggSgl=0;1;2; memUtilRate=0 PWRUsage=31.23;150;200; ECCRegAggSgl=0;1;2;
 SMClock=705 ECCL1AggSgl=0;1;2; GPUTemperature=38;85;100; memClock=2600
 usedMemory=0.24;95;99; fanSpeed=30;80;95; graphicsClock=705 GPUUtilRate=0
 ECCMemAggSgl=0;1;2;

 Requirements:
 -------------
   o NVDIA display driver
      * Currently the proprietary NVIDIA driver from the official website
       (under Ubuntu 10.4, cf. http://www.thomas-krenn.com/de/wiki/CUDA_Installation)
       as well as the driver from the repositories (with Ubuntu 11.10 and 12.04)
       have been tested successfully.
   o NVML perl bindings
      * The bindigs are required to access the NVML library via Perl. Fetch
       a copy of the bindings from CPAN:
       * http://search.cpan.org/~nvbinding/
       For more information about the NVML library also visit
       http://developer.nvidia.com/nvidia-management-library-nvml.
   o Perl
   o Required perl modules (should be installed per default under Ubuntu)
     * strict
     * warnings
     * Getopt:Long
     * use feature qw(switch)
       feature switch is available with Perl 5.10.1. The ePN of icinga 1.6.1 showed up
       errors with Switch-Case statements. Therefore it was changed to given-when.
     NOT installed per default:
       * Config::IniFiles;
       On Ubuntu you can use
         $ sudo apt-get install libconfig-inifiles-perl
   o Nagios or Icinga

 Installation hints:
 -------------------
   On Ubuntu using the driver from the repositories the file "Makefile.PL" in the 
   nvml package must be modified. Add the parameter '-L/usr/lib/nvidia-current'
   or '-L/usr/lib/nvidia-experimental-304' (for experimental drivers) to 'LIBS'
   in order to get rid of the warning:
   
       Note (probably harmless): No library found for -lnvidia-ml
   
   when calling 'perl Makefile.PL'.

 Notes on sensor warnings:
 -------------------------
 * clocks_throttle_reason_unknown
   This throttle reason can also be reported during PState or clock change
   before driver version r19. Please recheck the throttle reasons with
   $ nvidia-smi -i 0 -q | grep -i throttle -A 5
    Clocks Throttle Reasons
        Idle                    : Not Active
        User Defined Clocks     : Active
        SW Power Cap            : Not Active
        HW Slowdown             : Not Active
        Unknown                 : Not Active
 * fanSpeed
   Currently the sensor "Fan speed" is the speed the fan algorithm tries to
   run the fan. Don't rely on this sensor if you want to know if the fan is
   really working properly.

 Configuration:
 --------------
 * Querying only specific sensors (for performance data):
   ./check_gpu_sensor -d 0 -T GPUTemperature,fanSpeed
   Warning - Tesla K20 [persistenceMode = Warning]|fanSpeed=30;80;95; GPUTemperature=38;85;100;