Condensed summary - if you need more detail, follow the links:
Corosync joins several machines together in a cluster, without specifying what they're supposed to do. Pacemaker specifies what they're supposed to do - what applications and services are supposed to be running where. Booth is a layer on top which allows multiple clusters to manage resources between them.
You must have an odd number of machines in a cluster.
You cannot call one machine a cluster, therefore the minimum number of machines needed to create cluster is three.
If you want to go and try building a two-machine cluster, by all means go ahead, but don't complain to me when the two machines lose sight of each other (but keep talking to everything else) and it all goes horribly wrong™.
Here's a simple corosync configuration file (you often don't actually need much more than this):
totem { version: 2 cluster_name: pleiades token: 3000 token_retransmits_before_loss_const: 10 clear_node_high_bit: yes crypto_cipher: aes256 crypto_hash: sha1 } logging { fileline: off to_stderr: no to_logfile: no logfile: /var/log/corosync/corosync.log to_syslog: yes debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum } nodelist { node { ring0_addr: 198.51.100.1 nodeid: 1 name: sterope } node { ring0_addr: 198.51.100.2 nodeid: 2 name: merope } node { ring0_addr: 198.51.100.3 nodeid: 3 name: electra } }
This sets up a 3-node cluster in which at least two of the machines need to be in communication and running corosync for the cluster to be "active". They don't yet do anything useful, that comes next with pacemaker:
Here's a simple pacemaker configuration file supporting a floating IP address which will be managed on one (and only one) of the three machines in the cluster. If that machine dies, another one takes over the IP address, and you have "a high-availability IP address" (on which you could run some application such as Apache or Asterisk if you wanted to):
primitive IP-float4 IPaddr2 params ip=198.51.100.42 cidr_netmask=24 meta migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-fail=restart group floater IP-float4 resource-stickiness=100 property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop start-failure-is-fatal=false cluster-recheck-interval=60s
You use the above configuration file with the command:
Note that I did not really need to create the group definition "floater" since it only contains a single resource, but in practice I think it's unlikely that you only want one resource to be managed on its own, therefore I've included a group definition which can then be extended by listing the resources one after another (which, incidentally, also determines the order in which they get started and stopped - they're started left to right, and stopped right to left).
I had the following setup:
I then had a requirement to run one more application either in Manchester or in York, but not both.
My system was already designed so that if the machines at Manchester, for example, became unavailable to the Internet, York would run everything I needed and I was happy. It was fine if both Manchester and York were running all their respective resources at the same time, too.
I asked on the ClusterLabs mailing list about how to achieve this extra resource which ran either in Manchester or in York but not both, and people started pointing me at booth, which when I looked at it seemed like a big over-complication for what I needed.
Fortunately some other people said "you don't need that, try location constraints", so I investigated that, and came up with the following solution, which works nicely:
The result is the following pacemaker configuration file:
node tom attribute site=Man node dick attribute site=Man node harry attribute site=Man node fred attribute site=York node george attribute site=York node ron attribute site=York primitive Man-float4 IPaddr2 params ip=198.51.100.42 cidr_netmask=24 meta migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-fail=restart primitive York-float4 IPaddr2 params ip=203.0.113.42 cidr_netmask=24 meta migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-fail=restart primitive Asterisk asterisk meta migration-threshold=3 failure-timeout=60 op monitor interval=5 timeout=20 on-fail=restart group Man Man-float4 resource-stickiness=100 group York York-float4 resource-stickiness=100 group Any Asterisk resource-stickiness=100 location use_Man Man rule -inf: site ne Man location use_York York rule -inf: site ne York location not_Man Man resource-discovery=never -inf: bert location not_York York resource-discovery=never -inf: bert colocation once 100: Any [ Man York ] property cib-bootstrap-options: stonith-enabled=no no-quorum-policy=stop start-failure-is-fatal=false cluster-recheck-interval=60s
Obviously you need to define all seven machines in your corosync.conf file as well (this file must be identical on all servers in the cluster), but that's just a matter of extending the "nodelist" section with more "node" definitions.
The line location use_Man Man rule -inf: site ne Man means "I have a location preference which I choose to call 'use_Man' for the resource group named 'Man' where the rule is 'definitely do not run if the site attribute is not Man'" ("-inf:" means "minus infinity" to pacemaker, and basically means "this is a no-no").
The line colocation once inf: Any [ Man York ] means "I have a co-location preference which I choose to call 'once' for the resource group named 'Any' such that I want it to be running alongside the resource group 'Man' or alongside the resource group 'York', but I do not care which". The square brackets are significant.
The line location not_Man Man resource-discovery=never -inf: bert means "bert (the seventh machine which runs no resources) should not even try to find out whether it is running any resources, just in case they should be turned off". In general, you won't even have the commands on this machine which are needed to check whether resources are running, so allowing pacemaker to do this simply results in complaining messages about "command not installed". Adding these lines keeps it quiet.
PS: Note also that in the location preference rules, "rule -inf: site ne Man" is apparently not the same as "rule inf: site eq Man". You might think so, and it might seem easier to read, but specifying a infinite preference for a resource to run where the site label is "Man" turns out not to be the same (for the people who wrote pacemaker, anyway) as an infinite preference for a resource not to run where the site label is not "Man". I have no idea why this is considered sensible, but the above works for me.
All the resources which were previously defined on the 3-node Manchester cluster now run on one of the (same) three machines in Manchester - assuming that these machines are available. If those machines go down, then these resources do not run at all (just the same as before).
Same thing for the resources at York.
The resource which I want running just once, in either Manchester or York, runs on the same machine as all the resources are on in Manchester, or on the equivalent machine in York, but not both.
Previously I had to have at least two of the three machines at Manchester up and running in order for Manchester's resources to be running. Similarly for two out of three at York.
Under the new arrangement, I need only one machine at Manchester, plus one machine at York, plus any two other machines (one of which can be the one in London) for all resources to be running. That can mean one machine at Manchester and three at York, or one at Manchester, two at York and one in London.
Previously it was impossible to have the Manchester resources available with only one working machine at Manchester, and vice versa for York.
The "colocation" requirement ensures that all the resources which are running in Manchester (possibly including the one "anywhere" resource) are running on a single machine. There are other ways of specifying where you want this "anywhere" resource to run which can result in it running on one of the other two machines in Manchester. This may or may not be a problem for you, but having everything running on one server is much neater for me.
Go up
Return to main index.