MySQL (Master-Master) cluster with High Availability – Part 2 of 2

In my previous post I described how to create a MySQL Replication (Master-Master) cluster. In this post we will build Cluster-IP address for our MySQL cluster. At the end of this post we will have a MySQL Cluster with High Availability.

Install & Configure Corosync and Pacemaker on server MYSQL01

First we need to add the repository to install Corosync and Pacemaker.

user@MYSQL01:~$ sudo bash -c 'cat > /etc/apt/sources.list.d/jessie-backports.list << "EOF"
deb http://http.debian.net/debian jessie-backports main
EOF'

Update the repositry and install Corosync and Pacemaker.

user@MYSQL01:~$ sudo apt-get update && sudo apt-get install -t jessie-backports pacemaker crmsh -y

Stop Pacemaker service .

user@MYSQL01:~$ sudo service pacemaker stop

Create a copy of the default Corosync config-file

user@MYSQL01:~$ sudo cp /etc/corosync/corosync.conf /etc/corosync/corosync.conf_orig

Delete all the content of Corosync config-file

user@MYSQL01:~$ sudo bash -c 'cat /dev/null > /etc/corosync/corosync.conf'

open the corosync.conf file for editing. I use nano

user@MYSQL01:~$ sudo nano /etc/corosync/corosync.conf

Replace the contents of corosync.conf with this configuration.
Note: This config-file may differ from you own environment. Replace the values that are specific to your configuration.

totem {
    version: 2
    cluster_name: "MySQL Repl (M-M)"
    token: 3000
    token_retransmits_before_loss_const: 10
    join: 60
    consensus: 5000
    vsftype: none
    max_messages: 20
    clear_node_high_bit: yes
    secauth: off
    threads: 0
    rrp_mode: active

    transport: udpu
    interface {
        ringnumber: 0
        bindnetaddr: 10.40.0.0
        mcastport: 5405
    }

}

nodelist {
  node {
    ring0_addr: 10.40.0.18
    name: MYSQL01
    nodeid: 1
  }
  node {
    ring0_addr: 10.40.0.19
    name: MYSQL02
    nodeid: 2
  }
}

amf {
    mode: disabled
}

service {
     ver:       0
     name:      pacemaker
}

aisexec {
        user:   root
        group:  root
}

logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        to_syslog: yes
        syslog_facility: daemon
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

quorum {
  provider: corosync_votequorum
  two_node: 1
  wait_for_all: 1
  last_man_standing: 1
  auto_tie_breaker: 0
}

Restart Corosync to load the new configuration

user@MYSQL01:~$ sudo service corosync restart

Start Pacemaker

user@MYSQL01:~$ sudo service pacemaker start

Install & Configure Corosync and Pacemaker on server MYSQL02

Repeat the exact same steps that we followed on MYSQL01.
Start once again by adding the repository

user@MYSQL02:~$ sudo bash -c 'cat > /etc/apt/sources.list.d/jessie-backports.list << "EOF"
deb http://http.debian.net/debian jessie-backports main
EOF'

Update the repositry and install Corosync and Pacemaker.

user@MYSQL02:~$ sudo apt-get update && sudo apt-get install -t jessie-backports pacemaker crmsh -y

Stop Pacemaker service .

user@MYSQL02:~$ sudo service pacemaker stop

Create a copy of the default Corosync config-file

user@MYSQL02:~$ sudo cp /etc/corosync/corosync.conf /etc/corosync/corosync.conf_orig

Delete all the content of Corosync config-file

user@MYSQL02:~$ sudo bash -c 'cat /dev/null > /etc/corosync/corosync.conf'

open the corosync.conf file for editing. I use nano

user@MYSQL02:~$ sudo nano /etc/corosync/corosync.conf

Replace the contents of corosync.conf with this configuration.
Note: This config-file may differ from you own environment. Replace the values that are specific to your configuration.

totem {
    version: 2
    cluster_name: "MySQL Repl (M-M)"
    token: 3000
    token_retransmits_before_loss_const: 10
    join: 60
    consensus: 5000
    vsftype: none
    max_messages: 20
    clear_node_high_bit: yes
    secauth: off
    threads: 0
    rrp_mode: active

    transport: udpu
    interface {
        ringnumber: 0
        bindnetaddr: 10.40.0.0
        mcastport: 5405
    }

}

nodelist {
  node {
    ring0_addr: 10.40.0.18
    name: MYSQL01
    nodeid: 1
  }
  node {
    ring0_addr: 10.40.0.19
    name: MYSQL02
    nodeid: 2
  }
}

amf {
    mode: disabled
}

service {
     ver:       0
     name:      pacemaker
}

aisexec {
        user:   root
        group:  root
}

logging {
        fileline: off
        to_stderr: yes
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        to_syslog: yes
        syslog_facility: daemon
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

quorum {
  provider: corosync_votequorum
  two_node: 1
  wait_for_all: 1
  last_man_standing: 1
  auto_tie_breaker: 0
}

Restart Corosync to load the new configuration

user@MYSQL02:~$ sudo service corosync restart

Start Pacemaker

user@MYSQL02:~$ sudo service pacemaker start

Now that Corosync and Pacemaker is installed and configured on both servers let’s check if our cluster works.

user@MYSQL02:~$ sudo crm status

The output should look something like this (if not, wait for 30 seconds, then run the command again):

Stack: corosync
Current DC: MYSQL01 (version 1.1.15-e174ec8) - partition with quorum
Last updated: Tue Apr 25 09:44:27 2017 Last change: Tue Apr 25 09:40:24 2017 by hacluster via crmd on MYSQL01

2 nodes and 0 resources configured

Online: [ MYSQL01 MYSQL02 ]

Full list of resources:

Configure Cluster Properties

Now we’re ready to configure the properties of Pacemaker.
Note: All Pacemaker (crm) commands can be run from either node server, as it automatically synchronizes all cluster-related changes across all member nodes.

First we want to disable STONITH (Shoot The Other Node In The Head).
Note: STONITH is the ability to remove faulty nodes if they no longer meets the cluster requirments.

user@MYSQL01:~$ sudo crm configure property stonith-enabled=false

Second we want to ignore the quorum policy.
Note: This setting only applies to 2-node clusters.

user@MYSQL01:~$ sudo crm configure property no-quorum-policy=ignore

Thirth and last we want to configure weight-points.
Note: Weight-points are usefull when a node goes down and up, this configuration makes sure that the resources running on the other are kept there in case of flapping.

user@MYSQL01:~$ sudo crm configure property default-resource-stickiness=100

Create Cluster Resources

Now that we have configured the properties we’re ready to create resources.

Open a Pacemaker shell

user@MYSQL01:~$ sudo crm

Enter configuration mode

crm(live)# configure

We start by monitoring pacemaker. If a node loses his network connection, we want to failover all resources to the other node.

crm(live)configure# primitive Ping-Gateway ocf:pacemaker:ping \
    params dampen="20s" multiplier="1000" \
    host_list="10.40.0.1" \
    op monitor interval="5s" timeout="60s" \
    op start interval="0" timeout="60s" \
    op stop interval="0" timeout="60s"
clone clone_Ping-Gateway Ping-Gateway

Let’s create a Cluster-IP address for our MySQL cluster

crm(live)configure# primitive Cluster-IP ocf:heartbeat:IPaddr2 \
    params ip="10.40.0.20" cidr_netmask="24" \
    meta migration-threshold=2 \
    op monitor interval=20 \
    timeout=60 \
    on-fail=restart

Create a resource group for our MySQL cluster and add the Cluster-IP address

crm(live)configure# group MySQL-Cluster Cluster-IP

Create a location rule which will failover resources to another node.
Note: You can also use a resource instead of a group, but I find much easier because you can add/delete resources from a group. Otherwise you need to delete the location rule first and re-create

crm(live)configure# location PingGroup MySQL-Cluster rule -inf: not_defined pingd or pingd lte 0

The configuration is now complete.
Let’s check if the config is done correctly

crm(live)configure# show

We shoud see the following output:
Note: This output may differ from you own environment.

node 1: MYSQL01
node 2: MYSQL02
primitive Ping-Gateway ocf:pacemaker:ping \
        params dampen=20s multiplier=1000 host_list=10.40.0.1 \
        op monitor interval=5s timeout=60s \
        op start interval=0 timeout=60s \
        op stop interval=0 timeout=60s
primitive Cluster-IP IPaddr2 \
        params ip=10.40.0.20 cidr_netmask=24 \
        meta migration-threshold=2 \
        op monitor interval=20 timeout=60 on-fail=restart
group MySQL-Cluster Cluster-IP
clone clone_Ping-Gateway Ping-Gateway
location PingGroup MySQL-Cluster \
        rule -inf: not_defined pingd or pingd lte 0
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.15-e174ec8 \
        cluster-infrastructure=corosync \
        cluster-name=debian \
        stonith-enabled=false \
        default-resource-stickiness=100

If the config looks correctly you can commit the changes

crm(live)configure# commit

Exit the Pacemaker shell and check if the Cluster-IP is up-and-running.

crm(live)configure# exit
bye
user@MYSQL01:~$ sudo crm status

We shoud see the following output:
Note: This output may differ from you own environment.

Stack: corosync
Current DC: MYSQL01 (version 1.1.15-e174ec8) - partition with quorum
Last updated: Tue Apr 25 15:21:27 2017          Last change: Tue Apr 25 15:03:14 2017 by root via cibadmin on MYSQL01

2 nodes and 3 resources configured

Online: [ MYSQL01 MYSQL02 ]

Full list of resources:

 Clone Set: clone_Ping-Gateway [Ping-Gateway]
     Started: [ MYSQL01 MYSQL02 ]
 Resource Group: MySQL-Cluster
     Cluster-IP (ocf::heartbeat:IPaddr2):       Started MYSQL01

As you can see the Cluster-IP is started on MYSQL01
Note: It could take a while before the Cluster-IP is started

We only have one final test left and that is the simulate a node failure.

On MYSQL02 we can monitoring our Pacemaker cluster with the command below.
Note: This command monitors realtime the status of our cluster

user@MYSQL01:~$ sudo crm_mon

On MYSQL01 we’re going to unplug the networkcable. Let’s check what happening in the cluster.

Before 

Stack: corosync
Current DC: MYSQL01 (version 1.1.15-e174ec8) - partition with quorum
Last updated: Tue Apr 25 17:35:04 2017          Last change: Tue Apr 25 17:25:02 2017 by root via crm_resource on MYSQL01

2 nodes and 3 resources configured

Online: [ MYSQL01 MYSQL02 ]

 Clone Set: clone_Ping-Gateway [Ping-Gateway]
     Started: [ MYSQL01 MYSQL02 ]
 Resource Group: MySQL-Cluster
     Cluster-IP (ocf::heartbeat:IPaddr2):       Started MYSQL01

After

Stack: corosync
Current DC: MYSQL02 (version 1.1.15-e174ec8) - partition with quorum
Last updated: Tue Apr 25 17:36:07 2017          Last change: Tue Apr 25 17:25:02 2017 by root via crm_resource on MYSQL01

2 nodes and 3 resources configured

Online: [ MYSQL02 ]
OFFLINE: [ MYSQL01 ]

 Clone Set: clone_Ping-Gateway [Ping-Gateway]
     Started: [ MYSQL02 ]
 Resource Group: MySQL-Cluster
     Cluster-IP (ocf::heartbeat:IPaddr2):       Started MYSQL02

As you can see Pacemaker has detected a failure on my MYSQL01 server and has migrated the Cluster-IP to MYSQL02.

Congratz! You have created a working MySQL replication (Master-Master) cluster with a Cluster-IP for High Availability.

Leave a Reply

Your email address will not be published. Required fields are marked *