Once you've installed Icinga2 and got it performing some checks, you should install PNP4Nagios.
Without that, you have no check history, no trends, no way of seeing "what's normal for this check at this time of day / week / whatever?".
In my opinion PNP4Nagios should be a standard feature which can be installed alongside Icinga2.
Here's how to get it working under Debian / Devuan (the instructions below work as-is for Wheezy and Jessie, but for Stretch / Ascii, you'll first need to configure both the Jessie and the Jessie-backports repositories (since pnp4nagios is only in Jessie-backports, and it depends on some PHP5 stuff which is only in Jessie, plus the Jessie version of rrdtool)):
# aptitude install -t jessie rrdtool
Package: rrdtool Pin: release n=jessie Pin-Priority: 1001
# aptitude install -R pnp4nagios
# icinga2 feature enable perfdata
AuthName "Icinga Access" AuthType Basic AuthUserFile /etc/icinga/htpasswd.users Require valid-user
# /etc/init.d/icinga2 restart # /etc/init.d/npcd restart # /etc/init.d/apache2 reload
# cd /usr/share/icingaweb2/modules # wget https://github.com/Icinga/icingaweb2-module-pnp/archive/master.zip # unzip master.zip
# mv icingaweb2-module-pnp-master pnp
Service checks should now start showing clickable PNP4Nagios graphs.
The default graphing template for PNP4Nagios uses the "average" function for calculating values over long time periods, and this can very often give highly misleading results; it works far better if you change the file /usr/share/pnp4nagios/html/templates.dist/default.php:
$def[$KEY] = rrd::def ("var1", $VAL['RRDFILE'], $VAL['DS'], "AVERAGE");
to:
$def[$KEY] = rrd::def ("var1", $VAL['RRDFILE'], $VAL['DS'], "MAX");
The reason why "average" is not a good idea is neatly summed up in a quote from Stéphane Bortzmeyer:
Measuring average network latency is about as useful as measuring the mean temperature of patients in a hospital.
Suppose you are measuring network latency (ping round-trip times), and over a ten-minute period you get nine results of 5ms and one result of 100ms.
If you look at a graph of this showing 1-minute intervals, you will see the 100ms spike quite clearly.
If you then look at a longer timescale with 10-minute intervals, though, "average" would show this interval as having a value of 14.5ms, whereas you would probably prefer still to see that there was a 100ms peak sometime during that 10-minute window, and that is what "max" will show you.
Go up
Return to main index.