Pandora: Documentation en: DRBD

From Pandora FMS Wiki
Jump to: navigation, search

Go back to Pandora FMS documentation index

1 HA in Pandora FMS with DRBD

1.1 Introduction to DRBD

The Distributed Replicated Block Device (DRBD) is a software-based, shared-nothing, replicated storage solution mirroring the content of block devices (hard disks, partitions, logical volumes etc.) between servers.

DRBD mirrors data:

  • In real time. Replication occurs continuously, while applications modify the data on the device.
  • Transparently. The applications that store their data on the mirrored device are oblivious of the fact that the data is in fact stored on several computers.
  • Synchronously or asynchronously. With synchronous mirroring, a writing application is notified of write completion only after the write has been carried out on both computer systems. Asynchronous mirroring means the writing application is notified of write completion when the write has completed locally, but before the write has propagated to the peer system.



Drbd.png



Over DRBD you can provide a cluster on almost everything you can replicate in disk. In our specific case when want to "clusterize" only the database, but we also could replicate a entire Pandora FMS setup, including server, local agents and of course database.

DRBD is a RAID-1/TCP based kernel module, very easy to setup and really fast and error-proof. You can get more information about DRBD in their website at http://www.drbd.org

DRBD is OpenSource.

1.2 Initial enviroment

We want to have a MySQL cluster in a HA configuration based on a master (active) and slave (passive). Several Pandora FMS servers and console will use a virtual IP address to connect with the running node which contains a MySQL server.

This is the network configuration for the two nodes running the MySQL cluster:

192.168.10.101 (castor) -> Master 192.168.10.102 (pollux) -> Slave 192.168.10.100 virtual-ip

In our scenario, the only Pandora FMS server is running here:

192.168.10.1 pandora -> mysql app

Each node, has two harddisks:

/dev/sda with the standard linux system. /dev/sdb with an empty, unformatted disk, ready to have the RAID1 setup with DRBD.

We assume you have time synchonized between all nodes, this is extremely IMPORTANT, if not, please synchronize it before continue, using ntp or equivalent mechanism.

1.3 Install packages

Install following packages (debian)

apt-get install heartbeat drbd8-utils drbd8-modules-2.6-686 mysql

Install following packages (suse)

drbd heartbeat hearbeat-resources resource-agents mysql-server

1.4 DRBD setup

1.4.1 Initial DRBD setup

Edit /etc/drbd.conf

global {
  usage-count no;
}

common {
 protocol C;
}

resource mysql {
   on castor {
       device /dev/drbd1;
       disk /dev/sdb1;
       address 192.168.10.101:7789;
       meta-disk internal;
   }
   on pollux {
       device /dev/drbd1;
       disk /dev/sdb1;
       address 192.168.10.102:7789;
       meta-disk internal;
   }
   disk {
       on-io-error detach; # Desconectamos el disco en caso de error de bajo nivel.
   }
   net {
       max-buffers 2048; #Bloques de datos en memoria antes de escribir a disco.
       ko-count 4; # Maximos intentos antes de desconectar.
   }
   syncer {
       rate 10M; # Valor recomendado de sincronización para redes de 100 Mb´s..
       al-extents 257; 
   }
   startup {
       wfc-timeout 0; # drbd init script esperará ilimitadamente los recursos.
       degr-wfc-timeout 120; # 2 minuteos
   }
}

1.4.2 Setup DRBD nodes

You need to have a completelly empty disk on /dev/sdb (even without partitioning).

Do a partition in /dev/sdb1 (linux type).

fdisk /dev/sdb

Delete all information on it

dd if=/dev/zero of=/dev/sdb1 bs=1M count=128

(Do it in both nodes)

And create the internal structure in disk for drbd with following commands in both nodes:

drbdadm create-md mysql
drbdadm up mysql

(Again, do it in both nodes)

1.4.3 Initial disk (Primary node)

The last command to setup DRBD, and only on the primary node, it's to initialize the resource and set as primary:

drbdadm -- --overwrite-data-of-peer primary mysql

After issuing this command, the initial full synchronization will commence. You will be able to monitor its progress via /proc/drbd. It may take some time depending on the size of the device.

By now, your DRBD device is fully operational, even before the initial synchronization has completed (albeit with slightly reduced performance). You may now create a filesystem on the device, use it as a raw block device, mount it, and perform any other operation you would with an accessible block device.

castor:/etc# cat /proc/drbd 
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by [email protected], 2008-11-12 16:40:33

 1: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r---
    ns:44032 nr:0 dw:0 dr:44032 al:0 bm:2 lo:0 pe:0 ua:0 ap:0
	[>....................] sync'ed:  2.2% (2052316/2096348)K
	finish: 0:03:04 speed: 11,008 (11,008) K/sec
	resync: used:0/61 hits:2749 misses:3 starving:0 dirty:0 changed:3
	act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

1.4.4 Create the partition on primary node

Do it ONLY in the primary node, will be replicated to the other nodes automatically. You operate with the DRBD block device, forget to use physical device.

castor:~# mkfs.ext3 /dev/drbd1

Use is like a standard partition from now, and mount to your disk in primary NODE as follows:

castor# mkdir /drbd_mysql
castor# mount /dev/drbd1 /drbd_mysql/

You cannot do this (mount) in the secondary, to do it, you before need to promote to primary, and previously, need to degrade primary to secondary:

In primary (castor):

castor# drbdadm secondary mysql 

In secondary (pollux):

pollux# drbdadm primary mysql

1.4.5 Getting information about system status

Executed from current master node (castor) :

castor:/# drbdadm state mysql
Primary/Secondary
castor:/# drbdadm dstate mysql
UpToDate/UpToDate

And from pollux (backup, replicating disk):

pollux:~# drbdadm state mysql
Secondary/Primary
pollux:~# drbdadm dstate mysql
UpToDate/UpToDate

1.4.6 Setting up the mysql in the DRBD disk

We suppose you have all the information about mysql in following directories (may differ depending on Linux distro):

/etc/mysql/my.cnf
/var/lib/mysql/

First, stop the mysql in the primary and secondary nodes.

In the primary node:

Move all data to mounted partition in the primary nodes and delete all the relevant mysql information in the secondary node:

mv /etc/mysql/my.cnf /drbd_mysql/
mv /var/lib/mysql /drbd_mysql/mysql
mv /etc/mysql/debian.cnf /drbd_mysql/

Link new location to original ubication:

ln -s /drbd_mysql/mysql/ /var/lib/mysql
ln -s /drbd_mysql/my.cnf /etc/mysql/my.cnf
ln -s /etc/mysql/debian.cnf /drbd_mysql/debian.cnf

Restart mysql.

In the secondary node:

rm -Rf /etc/mysql/my.cnf
rm -Rf /var/lib/mysql
ln -s /drbd_mysql/mysql/ /var/lib/mysql
ln -s /drbd_mysql/my.cnf /etc/mysql/my.cnf

1.4.7 Create the Pandora FMS database

We assume you have the default SQL files to create the Pandora FMS database files at /tmp

mysql -u root -p
mysql> create database pandora;
mysql> use pandora;
mysql> source /tmp/pandoradb.sql;
mysql> source /tmp/pandoradb_data.sql;

Set permissions:

mysql> grant all privileges on pandora.* to [email protected] identified by 'pandora';
mysql> flush privileges;

1.4.8 Manual split brain recovery

DRBD detects split brain at the time connectivity becomes available again and the peer nodes exchange the initial DRBD protocol handshake. If DRBD detects that both nodes are (or were at some point, while disconnected) in the primary role, it immediately tears down the replication connection. The tell-tale sign of this is a message like the following appearing in the system log:

Split-Brain detected, dropping connection!

After split brain has been detected, one node will always have the resource in a StandAlone connection state. The other might either also be in the StandAlone state (if both nodes detected the split brain simultaneously), or in WFConnection (if the peer tore down the connection before the other node had a chance to detect split brain).

In this case, our secondary node (castor) is alone:

castor:~# cat /proc/drbd 
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by [email protected], 2008-11-12  16:40:33

  1: cs:WFConnection st:Secondary/Unknown ds:UpToDate/DUnknown C r---
     ns:0 nr:0 dw:0 dr:0 al:0 bm:7 lo:0 pe:0 ua:0 ap:0
	 resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
   	 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

At this point, unless you configured DRBD to automatically recover from split brain, you must manually intervene by selecting one node whose modifications will be discarded (this node is referred to as the split brain victim). This intervention is made with the following commands:

drbdadm secondary mysql
drbdadm -- --discard-my-data connect mysql

On the other node (the split brain survivor), if its connection state is also StandAlone, you would enter:

drbdadm connect mysql

See the status:

 pollux:/# cat /proc/drbd 
 version: 8.0.14 (api:86/proto:86)
 GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by [email protected], 2008-11-12 16:40:33
 
  1: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent C r---
     ns:34204 nr:0 dw:190916 dr:46649 al:12 bm:24 lo:0 pe:4 ua:20 ap:0
	 [============>.......] sync'ed: 66.7% (23268/57348)K
 	 finish: 0:00:02 speed: 11,360 (11,360) K/sec
	 resync: used:1/61 hits:2149 misses:4 starving:0 dirty:0 changed:4
   	 act_log: used:0/257 hits:118 misses:12 starving:0 dirty:0 changed:12

1.4.9 Manual switchover

In the current primary

1. Stop mysql

/etc/init.d/mysql stop

2. Umount partition

umount /dev/drbd1

3. Degrade to secondary

drbdadm secondary mysql

In the current secondary

4. Promote to primary

drbdadm primary mysql

5. Mount partition

mount /dev/drbd1 /drbd_mysql

6. Start MySQL

/etc/init.d/mysql start

1.5 Setup Hearbeat

1.5.1 Configuring heartbeat

We suppose you have installed hearbeat packages and the drbd utils, which includes a heartbeat resource file in /etc/ha.d/resource.d/drbddisk

First, you need to enable ip_forwarding.

In DEBIAN systems, edit /etc/sysctl.conf and modify following line:

net.ipv4.ip_forward = 1

In SUSE systems, just use YAST and set forwarding active in the interface for heartbeat (in this documentation is eth1).

Setup the ip address /etc/hosts in both hosts:

192.168.10.101  castor
192.168.10.102  pollux

1.5.2 Main Heartbeat file: /etc/ha.d/ha.cf

Edit /etc/ha.d/ha.cf file as follows in both nodes:

# Sample file for /etc/ha.d/ha.cf
# (c) Artica ST 2010

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
bcast eth1
auto_failback on   
# auto_failback on Make the cluster go back to master when master onde gets up.
# auto_failback off Let the master node to secondary when it gets up after a failure.
ping 192.168.10.1  # Gateway de nuestra red que debe responder al ping
apiauth ipfail gid=haclient uid=hacluster #o los que corresponda
node castor
node pollux

1.5.3 HA resources file

Edit /etc/ha.d/haresources in both hosts:

castor drbddisk Filesystem::/dev/drbd1::/drbd_mysql::ext3 mysql 192.168.10.100

This defines the default "master" node. In that lines you defines the default node name, it's resource script to start/stop the node, the filesystem and mount point, the drbd resource (mysql) and the virtual IP address (192.168.10.100).

1.5.4 Settingup authentication

Edit /etc/ha.d/authkeys in both hosts:

auth 2
2 sha1 09c16b57cf08c966768a17417d524cb681a05549

The number "2" means you have two nodes, and the hash is a sha1 HASH.

Do a chmod of /etc/ha.d/authkeys

chmod 600 /etc/ha.d/authkeys

Deativate the automatic mysql daemon startup, from now, should be managed by heartbeat.

rm /etc/rc2.d/S??mysql

1.5.5 First start of heartbeat

First at all, be sure DRBD is ok and running fine, MySQL is working and database is created.

Start heartbeat in both systems, but FIRST in the primary node:

In castor:

/etc/init.d/heartbeat start

In pollux:

/etc/init.d/heartbeat start

Logs in /var/log/ha-log should be enought to know if everything is OK. Master node (castor) should have the virtual IP address. Change pandora configuration files on console and server to use the Virtual IP and restart the Pandora FMS server.

You need to have a Pandora FMS server watchdog, to detect when the connection is down or use the restart option in pandora_server.conf:

restart 1
restart_delay 60

1.6 Testing the HA: Total failure test

1. Start a web browser and open a session. Put the server view in autorefresh mode with 5 secs of interval:

2. Shutdown the primary node:

Push the poweoff button.

-or-

Execute 'halt' on root console

3. Put a tail -f /var/log/ha-log in secondary node to watch how is working the switchover.

4. Switchover can take 3-5 seconds.

Go back to Pandora FMS documentation index