Geo-diverse resources with pacemaker

Condensed summary - if you need more detail, follow the links:

  • Corosync is a cluster communication / co-ordination service
  • Pacemaker is a cluster resource management system
  • Booth is a cluster ticket manager

Corosync joins several machines together in a cluster, without specifying what they're supposed to do. Pacemaker specifies what they're supposed to do - what applications and services are supposed to be running where. Booth is a layer on top which allows multiple clusters to manage resources between them.

Rule number one

You must have an odd number of machines in a cluster.

You cannot call one machine a cluster, therefore the minimum number of machines needed to create cluster is three.

If you want to go and try building a two-machine cluster, by all means go ahead, but don't complain to me when the two machines lose sight of each other (but keep talking to everything else) and it all goes horribly wrong™.

Basic corosync

Here's a simple corosync configuration file (you often don't actually need much more than this):

corosync.conf
totem {
    version: 2
    cluster_name: pleiades
    token: 3000
    token_retransmits_before_loss_const: 10
    clear_node_high_bit: yes
    crypto_cipher: aes256
    crypto_hash: sha1
}

logging {
    fileline: off
    to_stderr: no
    to_logfile: no
    logfile: /var/log/corosync/corosync.log
    to_syslog: yes
    debug: off
    timestamp: on
    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}

quorum {
    provider: corosync_votequorum
}

nodelist {
    node {
        ring0_addr: 198.51.100.1
        nodeid: 1
        name: sterope
    }
    node {
        ring0_addr: 198.51.100.2
        nodeid: 2
        name: merope
    }
    node {
        ring0_addr: 198.51.100.3
        nodeid: 3
        name: electra
    }
}

This sets up a 3-node cluster in which at least two of the machines need to be in communication and running corosync for the cluster to be "active". They don't yet do anything useful, that comes next with pacemaker:

Basic pacemaker

Here's a simple pacemaker configuration file supporting a floating IP address which will be managed on one (and only one) of the three machines in the cluster. If that machine dies, another one takes over the IP address, and you have "a high-availability IP address" (on which you could run some application such as Apache or Asterisk if you wanted to):

cluster.cib
primitive IP-float4 IPaddr2 params ip=198.51.100.42 cidr_netmask=24 meta migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-fail=restart
group floater IP-float4 resource-stickiness=100
property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop start-failure-is-fatal=false cluster-recheck-interval=60s

You use the above configuration file with the command:

  • crm configure load replace cluster.cib

Note that I did not really need to create the group definition "floater" since it only contains a single resource, but in practice I think it's unlikely that you only want one resource to be managed on its own, therefore I've included a group definition which can then be extended by listing the resources one after another (which, incidentally, also determines the order in which they get started and stopped - they're started left to right, and stopped right to left).

Taking things a step further

I had the following setup:

  • A three-node cluster in a data centre in Manchester, managing a floating IP address and several applications
  • Another three-node cluster in a data centre in York, managing the same set of resources, although based on a different floating IP address

I then had a requirement to run one more application either in Manchester or in York, but not both.

My system was already designed so that if the machines at Manchester, for example, became unavailable to the Internet, York would run everything I needed and I was happy. It was fine if both Manchester and York were running all their respective resources at the same time, too.

I asked on the ClusterLabs mailing list about how to achieve this extra resource which ran either in Manchester or in York but not both, and people started pointing me at booth, which when I looked at it seemed like a big over-complication for what I needed.

Fortunately some other people said "you don't need that, try location constraints", so I investigated that, and came up with the following solution, which works nicely:

  1. a single big cluster of seven machines, comprising the three in Manchester plus the three in York, plus one more somewhere else
    • remember Rule One - you can't just join two three-node clusters together and expect the six nodes to work properly, because six is not an odd number
    • I chose a data centre in London for the seventh machine, and it's just a cheap tiny virtual server which can communicate with Manchester and York. It runs corosync and pacemaker but never hosts any resources
  2. a "site" attribute for each of the Manchester and York machines, showing which city they are in
  3. a "location" preference for each of the existing resources, to ensure they continued running where they were supposed to
    • I can't have a floating IP address from the Manchester network assigned to a machine in York; it just doesn't work
  4. a "colocation" preference for the application I needed running somewhere, but only once

The result is the following pacemaker configuration file:

cluster.cib
node tom    attribute site=Man
node dick   attribute site=Man
node harry  attribute site=Man

node fred   attribute site=York
node george attribute site=York
node ron    attribute site=York

primitive Man-float4  IPaddr2 params ip=198.51.100.42 cidr_netmask=24 meta migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-fail=restart
primitive York-float4 IPaddr2 params ip=203.0.113.42  cidr_netmask=24 meta migration-threshold=3  failure-timeout=60 op monitor interval=5 timeout=20 on-fail=restart
primitive Asterisk    asterisk                                        meta migration-threshold=3  failure-timeout=60 op monitor interval=5 timeout=20 on-fail=restart

group Man  Man-float4  resource-stickiness=100
group York York-float4 resource-stickiness=100
group Any  Asterisk    resource-stickiness=100

location use_Man  Man  rule -inf: site ne Man
location use_York York rule -inf: site ne York

location not_Man  Man  resource-discovery=never -inf: bert
location not_York York resource-discovery=never -inf: bert

colocation once 100: Any [ Man York ]

property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop start-failure-is-fatal=false cluster-recheck-interval=60s

Obviously you need to define all seven machines in your corosync.conf file as well (this file must be identical on all servers in the cluster), but that's just a matter of extending the "nodelist" section with more "node" definitions.

The line location use_Man Man rule -inf: site ne Man means "I have a location preference which I choose to call 'use_Man' for the resource group named 'Man' where the rule is 'definitely do not run if the site attribute is not Man'" ("-inf:" means "minus infinity" to pacemaker, and basically means "this is a no-no").

The line colocation once inf: Any [ Man York ] means "I have a co-location preference which I choose to call 'once' for the resource group named 'Any' such that I want it to be running alongside the resource group 'Man' or alongside the resource group 'York', but I do not care which". The square brackets are significant.

The line location not_Man Man resource-discovery=never -inf: bert means "bert (the seventh machine which runs no resources) should not even try to find out whether it is running any resources, just in case they should be turned off". In general, you won't even have the commands on this machine which are needed to check whether resources are running, so allowing pacemaker to do this simply results in complaining messages about "command not installed". Adding these lines keeps it quiet.

PS: Note also that in the location preference rules, "rule -inf: site ne Man" is apparently not the same as "rule inf: site eq Man". You might think so, and it might seem easier to read, but specifying a infinite preference for a resource to run where the site label is "Man" turns out not to be the same (for the people who wrote pacemaker, anyway) as an infinite preference for a resource not to run where the site label is not "Man". I have no idea why this is considered sensible, but the above works for me.

The result

All the resources which were previously defined on the 3-node Manchester cluster now run on one of the (same) three machines in Manchester - assuming that these machines are available. If those machines go down, then these resources do not run at all (just the same as before).

Same thing for the resources at York.

The resource which I want running just once, in either Manchester or York, runs on the same machine as all the resources are on in Manchester, or on the equivalent machine in York, but not both.

Bonuses

Previously I had to have at least two of the three machines at Manchester up and running in order for Manchester's resources to be running. Similarly for two out of three at York.

Under the new arrangement, I need only one machine at Manchester, plus one machine at York, plus any two other machines (one of which can be the one in London) for all resources to be running. That can mean one machine at Manchester and three at York, or one at Manchester, two at York and one in London.

Previously it was impossible to have the Manchester resources available with only one working machine at Manchester, and vice versa for York.

The "colocation" requirement ensures that all the resources which are running in Manchester (possibly including the one "anywhere" resource) are running on a single machine. There are other ways of specifying where you want this "anywhere" resource to run which can result in it running on one of the other two machines in Manchester. This may or may not be a problem for you, but having everything running on one server is much neater for me.


Go up
Return to main index.