Pandora: Documentation en: MySQL Replica

From Pandora FMS Wiki
Jump to: navigation, search

Go back to Pandora FMS documentation index

1 MySQL Binary Replication model for HA

1.1 Introduction

This setup is proposed to have a full HA enviroment in Pandora FMS, based on an active/passive model. Standard MySQL (not MySQL cluster), allow to have a single MASTER (allowing INSERT/UPDATE operations) and several SLAVES, allowing only read operations. This is used in several enviroments to have a distributed database model, in Pandora all operations read/write are done against the same DB server, so this model cannot be used, anyway, replication is used also to have a "copy" of your primary database, so in a failure event, you can "raise" the slave to be the master database and use it.

We use UCARP application to provide the Virtual IP (VIP) mechanism to have a realtime H/A. In the simplest model, with two UCARP daemons running, if the master fails, the secondary will take the VIP and proceed with normal operation. An slave will resume the MySQL operations on the Pandora FMS Server / Console, and user will not notice anything.

After the failover, you will need to restore (manually, because it's a very delicated process), the master system and transfer all data from slave to the master again.

1.2 Comparison versus other MySQL HA models

There are many ways to implement MySQL HA, we have explored three:

  • MySQL Cluster: Very complex and with a performance penalty, is the unique way to have a real active/active (cluster) enviroment. Described in depth in our documentation.
  • MySQL Binary Replica / ucarp: Simple at fist, fast and very standard, but with several scripts and complexity to get back the master in the system. This documentation.
  • DRBD / heartbeat : Simple, fast and based on system block devices. Also described in our documentation. It's the official way to implement HA in Pandora FMS.

In our opinion, the best way to implement the HA is to have the simplest possible setup, because when something fails, any extra complexity will led to confusion and data loss if procedures are not extremely well tested and written. Most times, operators only follow procedures and cannot react to things outside the procedures, and HA could be very difficult to have exact procedures in most cases.

1.3 Initial enviroment

This is a brief overview our test scenario:

192.168.10.101 (castor) -> Master

192.168.10.102 (pollux) -> Slave

192.168.10.100 virtual-ip

192.168.10.1 pandora -> mysql app

1.3.1 Setting up the Mysql Server

1.3.1.1 Master node (Castor)

Edit my.cnf file (debian systems):

[mysqld]
bind-address=0.0.0.0
log_bin=/var/log/mysql/mysql-bin.log
server-id=1
innodb_flush_log_at_trx_commit=1
sync_binlog=1
binlog_do_db=pandora
binlog_ignore_db=mysql

1.3.1.2 Slave node (Pollux)

Edit my.cnf file:

[mysqld]
bind-address=0.0.0.0
server-id=2
innodb_flush_log_at_trx_commit=1
sync_binlog=1

1.3.1.3 Creating a User for Replication

Each slave must connect to the master using a MySQL user name and password, so there must be a user account on the master that the slave can use to connect. Any account can be used for this operation, providing it has been granted the REPLICATION SLAVE privilege.

mysql> CREATE USER 'replica'@'192.168.10.102' IDENTIFIED BY 'slayer72';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'replica'@'192.168.10.102';
mysql> FLUSH PRIVILEGES;

1.3.1.4 Install your pandora DB

Create a new one from installation .sql files or dump your current one in the master node (Castor)

Login in the master server:

mysql> create database pandora;
mysql> use pandora;
mysql> source /tmp/pandoradb.sql;
mysql> source /tmp/pandoradb_data.sql;

1.3.1.5 Setting Up Replication with Existing Data

Now we want to replicate the initial state of the loaded database in the MASTER node (castor). This is the "start" point to replicate all information to the slave, and assumes you have your database "FROZEN" in the time you make the "photo", after doing the photo a "coordinates" are given and writen in the SQL dump, if master database continues writting data, doesn't matter, replication will continue to replicate all changes from the initial coordinates. Think about this as a lineal path, and you "freeze" a start point for the slave to start to replicate the information. Follow these steps:

1. Start a session on the master by connecting to it with the command-line client, and flush all tables and block write statements by executing the FLUSH TABLES WITH READ LOCK statement:

mysql> FLUSH TABLES WITH READ LOCK;

2. Database writes are now blocked. Use the SHOW MASTER STATUS statement to determine the current binary log file name and position:

mysql > SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000003 | 98       | pandora      | mysql            |
+------------------+----------+--------------+------------------+

The File column shows the name of the log file and Position shows the position within the file. In this example, the binary log file is mysql-bin.000003 and the position is 98. You need them later when you are setting up the slave. They represent the replication coordinates at which the slave should begin processing new updates from the master.

3. Open a shell and do a mysqldump command:

$ mysqldump -u root -pnone pandora -B --master-data > /tmp/dbdump.sql

This dump is "special" and contains the coordinates for the slave server (--master-data), and also (-B) create the database and uses in on the created .SQL dump.

4. Unlock your Mysql primary server:

mysql> unlock tables;

5. Copy the SQL file to the SLAVE server (ftp. ssh...)

6. Connect to mysql console, and stop your SLAVE server;

mysql> SLAVE STOP;

7. Drop your current pandora database in the SLAVE server (if exists)

mysql> drop database pandora;

8. Enter the following SQL sentence to prepare credencials to stablish communication with master:

mysql> CHANGE MASTER TO MASTER_HOST='192.168.10.101', MASTER_USER='replica', MASTER_PASSWORD='slayer72';

Take note that is pointing to the current MASTER server (192.168.10.101).

9. Import the dump sql taken from the current Master server:

mysql> SOURCE /tmp/dbdump.sql;

10. Start SLAVE

mysql> SLAVE START;

11. Watch status of synchonization

mysql> SHOW SLAVE STATUS;

12. You should see "Waiting for master to send events" to confirm everything is OK.

1.4 Setting up the SQL server to serve Pandora server

In both servers:

mysql> grant all privileges on pandora.* to pandora@192.168.10.1 identified by 'pandora';
mysql> flush privileges;

1.4.1 Start Pandora Server

Everything should go fine.

Check if everything is correct:

In slave server and master server take a look on running processes with following SQL command:

mysql> show processlist;

Should show something like:


+----+-------------+-----------+------+---------+------+----------------------------------------------------------
| Id | User        | Host      | db   | Command | Time | State                                                                 | Info             |
+----+-------------+-----------+------+---------+------+----------------------------------------------------------
| 32 | root        | localhost | NULL | Sleep   |   72 |                                                                       | NULL             | 
| 36 | system user |           | NULL | Connect |  906 | Waiting for master to send event                                      | NULL             | 
| 37 | system user |           | NULL | Connect |    4 | Has read all relay log; waiting for the slave I/O thread to update it | NULL             | 
| 39 | root        | localhost | NULL | Query   |    0 | NULL                                                                  | show processlist | 
+----+-------------+-----------+------+---------+------+----------------------------------------------------------

1.5 Switchover

That means: do the slave to become the master. In the event MASTER server is down, or for any reason the VIP points to the SLAVE server, you must be sure that the SLAVE server executes following SQL commands:

mysql> STOP SLAVE; 
mysql> RESET MASTER;

Your Slave server is now working as MASTER. SLAVE doesnt use the replication log from the MASTER and the MASTER is now "out of sync", that means if your Pandora FMS points to the old-master server, will have old information. This is one of the most problematic points and most problems comes from here.

The first "Switchover", that means, when the official MASTER goes down, and the official SLAVE becomes the NEW master, is not a problem, it's fully automatic since systems do the queries against the SLAVE / New master server. The problem is the "second" switchover, that means, when you want to have the old-master to become the official master again.

In this step you need to re-done the full process to sync all the HA model, that means.

1. Stop all pandoras.

2. Dump the database from the old-slave (Pollux) to a clean SQL:

$ mysqldump -B -u root -pnone pandora > /tmp/pandoradump.sql

3. Copy the sql dump to the official master (Castor)

4. Restore the SQL and drop all old information

mysql> drop database pandora;
mysql> source /tmp/pandoradump.sql;

5. In this point both databases are equal, so just obtain the coordinates to set slave back "to replicate" and degrade to SLAVE. Get the coordinates from the official MASTER:

mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000003 | 234234   | pandora      | mysql            |
+------------------+----------+--------------+------------------+

(File and Position are the coordinates)

6. Use this SQL in the SLAVE:

mysql> SLAVE STOP;
myqsl> CHANGE MASTER TO MASTER_HOST='192.168.10.101', MASTER_USER='replica', MASTER_PASSWORD='slayer72', MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=234234;
mysql> SLAVE START;

7. Everything should be ok, so you can now restart your VIP processes to asssign the VIP to the official master (Castor) and let Pollux again the slave role.


Info.png

There is another way to implement failover which supposes MASTER/SLAVE role is not fixed, but that means this "relative" role should be implemented in the VIP model, using UCARP that means to change the priority in vhid. Another way to solve this problem is to use Heartbeat VIP mechanism (See our docs about DRBD)

 


1.6 Setting up the load balancing mechanism

We are using UCARP, which uses CARP protocol (http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protoco). More information on: http://ucarp.org/

Get the package and install it. Setup is very easy, you need to have a ucarp process running on each mysql server.

1.6.1 Castor / Master

ucarp --interface=eth1 --srcip=192.168.10.101 --vhid=1 --pass=pandora  --addr=192.168.10.100 --upscript=/etc/vip-up.sh --downscript=/etc/vip-down.sh &

1.6.2 Pollux / Slave

ucarp --interface=eth1 --srcip=192.168.10.102 --vhid=2 --pass=pandora  --addr=192.168.10.100 --upscript=/etc/vip-up.sh --downscript=/etc/vip-down.sh &

1.6.2.1 Contents of scripts

[/etc/vip-up.sh]

#!/bin/bash
/sbin/ifconfig "$1":254 "$2" netmask 255.255.255.0

[/etc/vip-down.sh]

#!/bin/bash
/sbin/ifconfig "$1":254 down

1.6.2.2 Some proposed scripts

[/etc/mysql-create-full-replica.sh]

#!/bin/bash
echo "FLUSH TABLES WITH READ LOCK;" | mysql -u root -pnone -D pandora
mysqldump -u root -pnone pandora -B --master-data > /tmp/dbdump.sql
echo "UNLOCK TABLES;" | mysql -u root -pnone -D pandora

[/etc/mysql-restore-replica.sh]

scp root@192.168.10.101:/tmp/dbdump.sql .
echo "SLAVE STOP; drop database pandora; SOURCE /tmp/dbdump.sql;" | mysql -u root -pnone -D pandora

[/etc/mysql-become-slave.sh]

echo "CHANGE MASTER TO MASTER_HOST='192.168.10.101', MASTER_USER='replica', MASTER_PASSWORD='slayer72'; SLAVE  START;" | mysql -u root -pnone

[/etc/mysql-become-master.sh]

echo "STOP SLAVE; RESET MASTER;" | mysql -u root -pnone

Go back to Pandora FMS documentation index