DRBD

DRBD

DRBD is like Raid1 (mirrored drives) but on separate computers.

You could do this by using something like iSCSI and standard mdadm, but DRBD is designed for the job so it's better to use that.

Depending on the age of the Linux system you set it up on, you should find that you get either DRBD 8.4 or 9.x.

These notes discuss version 8.4.10 (for Debian 10 / Devuan Beowulf).

Getting started

Install the package drbd-utils. The main functional components of DRBD itself are already in the kernel.

The documentation on how to define your DRBD devices is pretty good and you should be able to get to the point where you have two machines which know about a DRBD device and know about each other but are not yet connected.

In my case that meant I had:

/etc/drbd.d/global_common.conf

global {
  usage-count no;
}

common {
  net {
    protocol C;
    allow-two-primaries yes;
  }
}

/etc/drbd.d/Share.res

resource "Share" {
  device /dev/drbd0;
  meta-disk internal;

  on "Stella" {
    disk "/dev/LVM/Share";
    address 203.0.113.20:7789;
  }

  on "Foster" {
    disk "/dev/LVM/Share";
    address 203.0.113.16:7789;
  }

  net {
    csums-alg sha1;
  }

  startup {
    become-primary-on both;
  }
}

Avoiding tedious startup replication 1

The documentation on how to copy data between two machines and avoid DRBD trying to send the whole lot across the network is less good, however. In particular the phrase "you have issued the commands for initial resource promotion on your local node" leaves a lot to be desired.

So, here is how to go about it. I'm assuming that the name of the shared resource is Share, and also that you want the resource to be primary on both nodes.

On the first node (as root):

drbdadm create-md Share
drbdadm up Share
drbdadm disconnect Share
drbdadm primary --force Share
drbdadm new-current-uuid --clear-bitmap Share
drbdadm down Share
drbdadm up Share
now copy the device which DRBD is being set up on to the equivalent device on the second node
- in my case I used a 100Gbyte LVM partition, which had been filled with zeroes before any of the above commands were run
- this meant I only had to copy the DRBD metadata from the end of the partition to the second node
- dd bs=65536 skip=1638349 if=/dev/LVM/Share | ssh node2 dd bs=65536 seek=1638349 of=/dev/LVM/Share
- that's just over 3Mbytes instead of the entire 100Gbyte partition
drbdadm disconnect Share
drbdadm new-current-uuid Share
drbdadm connect Share
drbdadm primary Share

On the second node (as root):

drbdadm up Share
drbdadm primary Share

On both nodes, cat /proc/drbd should now show cs:Connected, ro:Primary/Primary and ds:UpToDate/UpToDate

version: 8.4.10 (api:1/proto:86-101)
srcversion: 473968AD625BA317874A57E 
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:512 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Avoiding tedious startup replication 2

The new documentation is much better, and gives very simple commands which can be used to get two machines in sync when you have no data on either which you care about. This is essentially the equivalent of buying a second-hand disk, partitioning and formatting it for yourself, without caring about what data was on it to start with.

On both nodes (as root):

drbdadm create-md Share
drbdadm up Share

On one node (as root):

drbdadm new-current-uuid --clear-bitmap Share/0

On both nodes (as root):

drbdadm primary Share

On both nodes, cat /proc/drbd should now show cs:Connected, ro:Primary/Primary and ds:UpToDate/UpToDate

version: 8.4.10 (api:1/proto:86-101)
srcversion: 473968AD625BA317874A57E 
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:0 dw:0 dr:512 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

These commands worked perfectly well on an 8.4.10 system - it's the documentation which has been improved, not the software (in this case).

Using the device

You now have a block device which can be written to on either node, and will replicate the changes to the other node.

Do NOT format this with an EXT3 or EXT4 file system, because these do not understand being mounted on more than one machine at the same time. In my case, I used OCFS2 which is a file system which does understand being mounted on multiple nodes simultaneously (and is also built into the Linux kernel).

Go up
Return to main index.