Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
No description, website, or topics provided.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
check_gpu_sensor ================= check_gpu_sensor: Nagios/Icinga plugin to check GPU sensors via NVML Copyright (C) 2011-2013 Thomas-Krenn.AG, This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, see <http://www.gnu.org/licenses/>. HOWTO: ------ http://www.thomas-krenn.com/en/wiki/GPU_Sensor_Monitoring_Plugin Current Updates: ---------------- https://github.com/thomas-krenn/check_gpu_sensor_v1.git Example Output: --------------- Warning - Tesla K20 [persistenceMode = Warning]|ECCL2AggSgl=0;1;2; ECCTexAggSgl=0;1;2; memUtilRate=0 PWRUsage=31.23;150;200; ECCRegAggSgl=0;1;2; SMClock=705 ECCL1AggSgl=0;1;2; GPUTemperature=38;85;100; memClock=2600 usedMemory=0.24;95;99; fanSpeed=30;80;95; graphicsClock=705 GPUUtilRate=0 ECCMemAggSgl=0;1;2; Requirements: ------------- o NVDIA display driver * Currently the proprietary NVIDIA driver from the official website (under Ubuntu 10.4, cf. http://www.thomas-krenn.com/de/wiki/CUDA_Installation) as well as the driver from the repositories (with Ubuntu 11.10 and 12.04) have been tested successfully. o NVML perl bindings * The bindigs are required to access the NVML library via Perl. Fetch a copy of the bindings from CPAN: * http://search.cpan.org/~nvbinding/ For more information about the NVML library also visit http://developer.nvidia.com/nvidia-management-library-nvml. o Perl o Required perl modules (should be installed per default under Ubuntu) * strict * warnings * Getopt:Long * use feature qw(switch) feature switch is available with Perl 5.10.1. The ePN of icinga 1.6.1 showed up errors with Switch-Case statements. Therefore it was changed to given-when. NOT installed per default: * Config::IniFiles; On Ubuntu you can use $ sudo apt-get install libconfig-inifiles-perl o Nagios or Icinga Installation hints: ------------------- On Ubuntu using the driver from the repositories the file "Makefile.PL" in the nvml package must be modified. Add the parameter '-L/usr/lib/nvidia-current' or '-L/usr/lib/nvidia-experimental-304' (for experimental drivers) to 'LIBS' in order to get rid of the warning: Note (probably harmless): No library found for -lnvidia-ml when calling 'perl Makefile.PL'. Notes on sensor warnings: ------------------------- * clocks_throttle_reason_unknown This throttle reason can also be reported during PState or clock change before driver version r19. Please recheck the throttle reasons with $ nvidia-smi -i 0 -q | grep -i throttle -A 5 Clocks Throttle Reasons Idle : Not Active User Defined Clocks : Active SW Power Cap : Not Active HW Slowdown : Not Active Unknown : Not Active * fanSpeed Currently the sensor "Fan speed" is the speed the fan algorithm tries to run the fan. Don't rely on this sensor if you want to know if the fan is really working properly. Configuration: -------------- * Querying only specific sensors (for performance data): ./check_gpu_sensor -d 0 -T GPUTemperature,fanSpeed Warning - Tesla K20 [persistenceMode = Warning]|fanSpeed=30;80;95; GPUTemperature=38;85;100;