Icinga2 - Master / Satellite / Agent setup

Unless stated otherwise, all the following commands should be run as root.

Steps which are not mentioned in the official documentation are shown in italics. Steps in the official documentation which you should not perform are also shown in italics.

Concept

Icinga2 supports the concept of a Master server which is (almost) the single point of configuration for an entire network of monitored systems, which communicates either with each of those systems (Agents) directly, or which communicates with one or more Satellite servers, which then each communicate with some of the Agents. Every Agent machine communicates only with one Master or Satellite, and agents can be grouped with Satellites in any way which makes geographical, logical or organisational sense.

This documentation assumes that you do have all three levels; if you do not use a Satellite, then the Agents are simply connected directly to the Master, so these instructions work just as well for that scenario - simply skip anything to do with Satellite configuration.

It is also possible to have a four-layer setup with two levels of Satellites between the Master and the Agents. This is described further in the instructions below. I do not yet know whether four is the limit…

A note on versioning

Icinga2 has been developed to support all three layers running the same version of Icinga2 (of course), and also to support each layer being one major version lower than the layer above (this is assuming that "Icinga2" is the name of the application, and "Icinga2.10.6" is major version 10, minor version 6).

So, for example, you can do: Master 2.11.x ⇒ Satellite 2.10.x ⇒ Agent 2.9.x

Larger variations between layer versioning may work but are not officially supported.

I don't know what the rule is if you manage to make a four-layer (or more?) setup work.

Master setup

Based on https://icinga.com/docs/icinga-2/latest/doc/06-distributed-monitoring/#master-setup

It is assumed that you have already completed the basic setup on this machine.

Run the node wizard or the node setup command (see details below) once on the Master, before setting up any Satellites or Agents:

icinga2 node wizard
icinga2 node setup --parameters...

Satellite setup

I'm assuming you will use "On-Demand CSR Signing", since this is a good deal simpler than "CSR Auto-signing".

It is also assumed that you have already completed the basic setup on this machine.

The same two methods can be used to set up a Satellite machine - the "wizard" which prompts you for answers to questions; or supplying all the answers as parameters to a command which then achieves the same thing - see below.

Obviously, the latter is ideal for any sort of automated setup (puppet, chef, ansible, DIY scripts, etc).

If you want two layers of Satellites between the Master and the Agents, simply repeat the instructions for the second (lower) level of Satellite, entering the details of its Parent Satellite instead of the Master.

I have not yet tried three Satellite layers, and I find it slightly hard to think of a plausible use case for this.

Agent setup

It is assumed that you have already completed the basic setup on this machine.

This is almost identical to the Satellite setup described above, except that the parent node is now the Satellite to which this Agent should connect, rather than the Master. All information in the setup of an Agent should relate to the Agent or its direct Satellite, not to the Master.

The final step, of signing the certificate request, is still performed on the Master, however.

You can use either the wizard or the parameter method of setting up the Agents - see below.

The node setup performs all necessary configuration on the Agent node, however it does not add the Agent node to the upstream configuration (Master and Satellite). The Master needs to know about the Host and what Service Checks to perform on it, and the Satellite needs to know about the Zone and the Endpoint in order to communicate with it.

The wizard method

On the Master:

  1. Run the command:
    icinga2 node wizard
    • Answer n to the first question, since this is a master server
    • Choose the Common Name for the machine - the default is the FQDN, but you may wish to use something more familiar
      • Note that it is quite awkward to change this in future, and even more so once you have set up Satellite and/or Agent machines, so think carefully before pressing return!
    • Choose the Zone name for this machine (there is seldom any good reason to make it different from the Host name)
    • You can nearly always accept the default answer for the subsequent questions
  2. Restart icinga2:
    /etc/init.d/icinga2 restart

On Satellites or Agents:

  1. On each machine, run the same command as you did on the master:
    icinga2 node wizard
    • the answer to the first question is now the default Y instead of n
  2. Choose a Common Name for this Satellite / Agent
  3. Enter the Common Name which you configured on the parent (Master / Satellite)
  4. Assuming that the this machine has network routing to reach its parent, answer Y to "establish a connection to the parent node"
  5. Enter the IP address or DNS name of the Master / Satellite which this Satellite / Agent should connect to
  6. Leave the port number at the default of 5665
  7. You almost certainly do not need to "add more master/satellite endpoints"
  8. After a few seconds, you should see Parent certificate information
    • Verify that the CN matches what you configured on the parent machine
  9. Confirm that the information is correct
  10. Leave the request ticket response empty
  11. Accept default responses for Bind Host and Bind Port
  12. Answer y to "accept config from parent"
  13. Answer y to "accept commands from parent"
  14. The local zone name should be correct - adjust it if not
  15. Enter the zone name you assigned on the Master / Satellite server
  16. There is no reason to "specify additional global zones"
  17. You generally do want to "disable the inclusion of the conf.d directory"

The parameter method (for Satellites or Agents)

You can find out what parameters can be supplied using the command:

icinga2 node setup --help

Setting up the node in this way is a two-step process:

  1. Retrieve the trusted certificate from the parent node
  2. Configure this node and send the certificate signing request to the master

The first step is accomplished with:

mkdir /var/lib/icinga2/certs
chown nagios: /var/lib/icinga2/certs
icinga2 pki save-cert --trustedcert /var/lib/icinga2/certs/ca.crt --host FQDN

Replace "FQDN" with the address or DNS hostname of the parent node (Master or Satellite).

It surprises me that you have to create the certificate directory yourself, rather than the pki command doing it for you, but no doubt there's a good reason for it somewhere. If you use the wizard command, it does create the directory.

The second step is accomplished with:

icinga2 node setup --cn Thisnode --zone Thisnode --trustedcert /var/lib/icinga2/certs/ca.crt --parent_host Parent.FQDN --endpoint Parent.CN --parent_zone Parent.CN --accept-config --accept-commands --disable-confd

Replace "Thisnode" with the CN you wish to use for this node, "Parent.FQDN" with the IP address or DNS hostname of the parent machine, and "Parent.CN" with the CN you assigned to the parent.

Whichever method you used

  • Before restarting the icinga2 daemon as prompted, run the following commands on the master:
    icinga2 ca list
    icinga2 ca sign xxxxx
    • Replace xxxxx with the Fingerprint displayed in the first command for the satellite you are setting up
    • Then restart Icinga2 on the satellite:
      /etc/init.d/icinga2 restart

Zones, Endpoints and Hosts

Icinga has a very strange definition of "Zone" - it is not a geographical region, and it is not a notional collection of machines. It is simply something which needs to be defined for each machine in the system (Masters, Satellites and Agents) and contains only that single machine (there is a small exception if you set up High Availability for Icinga, where you can place two Masters, or two Satellites, into a single zone, but two is the limit). I have no idea why the developers thought this concept was useful, or why they chose the name "Zone" for it. I find it especially misleading that they even use "Europe", "USA" and "Asia" as example zone names in the documentation.

The end result is that for each machine to be monitored, you need three definitions: Zone, Endpoint, and Host, for example:

object Zone "Website" { endpoints = [ "Website" ] }
object Endpoint "Website" { host = "Website" }
object Host "Website" { address = "www.example.com" }

There's a strict 1:1:1 correspondence (with a small exception for HA): a Zone contains one Endpoint (and an Endpoint exists in one Zone); an Endpoint contains one Host, and a Host is in one Endpoint.

The single exception for High Availability is that a Zone may contain two Endpoints (but no more). This only makes sense for Masters or Satellites, and it means that the two machines share the work expected of that Zone.

This combination of Zones, Endpoints and Hosts does seem unnecessarily complicated to me.

There is also an absolutely crucial correspondence between the hierarchy of your network and the directory structure you create under /etc/icinga2/zones.d on the Master node, and (in my opinion) this is not well explained in the official documentation. I managed to find a helpful article by one of the Icinga developers which explains this better, but does not pretend to be a step-by-step guide to Getting it Right.

Under /etc/icinga2/zones.d on the Master server, you should have the following:

  • one directory for every node in your setup (Master, Satellites and Agents), named identically to the Zone for that machine
  • one file per Satellite, under the Master directory, containing a Zone definition and an Endpoint definition for the Satellite. This file can be named however you like, but using the Zone name is a good idea.
  • one file per Agent, inside the directory for the Satellite which that Agent is connected to, containing a Zone definition and an Endpoint definition for the Agent. This file can be named however you like.
  • one file per Agent, inside the directory for that Agent, containing a Host definition for the Agent. This file can be named however you like.

If you want to have two layers of Satellites, then the above directory structure becomes:

  • one directory for every node in your setup (Master, Satellites and Agents), named identically to the Zone for that machine
  • one file per top-level Satellite, under the Master directory, containing a Zone definition and an Endpoint definition for the Satellite. This file can be named however you like, but using the Zone name is a good idea.
  • one file per Agent and lower-level Satellite, inside the directory for the Satellite which that Agent or Satellite is upstream connected to, containing a Zone definition and an Endpoint definition for the Agent / lower-level Satellite. This file can be named however you like.
  • one file per Agent, inside the directory for that Agent, containing a Host definition for the Agent. This file can be named however you like.

Here is an example, based on a master named Eyeball, a Satellite named Middleman and two Agents connected to the Satellite called Fred and George:

  • Eyeball
    • Middleman.conf
      • contains Zone and Endpoint definitions for Middleman
  • Middleman
    • Fred.conf
      • contains Zone and Endpoint definitions for Fred
    • George.conf
      • contains Zone and Endpoint definitions for George
  • Fred
    • Fred.conf
      • contains Host definition for Fred
  • George
    • George.conf
      • contains Host definition for George

The Zone definitions for each top-level Satellite specify the Master as their parent node, and the Zone definitions for each Agent or lower-level Satellite specify the corresponding Satellite as their parent node.

In addition to this, every machine also needs Zone and Endpoint definitions for itself and its directly-connected parent in /etc/icinga2/zones.conf. On Satellite and Agent machines, this is the only file which should contain configuration data (the rest is propagated down to the machine from the Master), and this file should automatically and correctly be created for you by the node wizard or node setup commands.

Hierarchy

Icinga2 supports two methods by which the Master tells the Satellites and the Agents what Service Checks to perform and when. There used to be three, but the one called Bottom-up was deprecated in version 2.6 (2016) and eliminated in 2.8 (2017).

The two methods now available are:

  • Top-Down Command Endpoint
    • The Master server performs all Service Check scheduling and tells the Satellites / Agents when to perform which Service Checks, and gets the responses back
    • If the communication between Master and Satellite, or between Satellite and Agent, breaks down, no Service Checks can be performed and no data is (ever) available about the state of the monitored systems for the duration of the communications breakdown
  • Top-Down Config Sync
    • The Master server sends its configuration to the Satellites, which then pass it on to the Agents, and the Agents then schedule and perform their own Service Checks, passing the results back to the Master
    • If the communication between Master and Satellite, or between Satellite and Agent, breaks down, the Agents continue performing their own checks, and send the results back to the Master once communications are restored
    • The Master has far less work to do, simply sending the configuration to the Satellites and Agents, and then collating the results. It does not have to schedule and send the Service Check commands to every machine being monitored.

I see no reason to use Top-Down Command Endpoint - it places more load on the Master, and if the connection between Master and Satellite, or between Satellite and Agent, breaks, no checks will be performed.

So, I would always use Top-Down Config Sync, and the instructions above for setting up the Satellites and Agents are based on that mechanism.


Go up
Return to main index.