千家信息网

mysql5.6基于GTID模式之高可用架构搭建-MHA(mha0.56)

发表于:2024-11-14 作者:千家信息网编辑
千家信息网最后更新 2024年11月14日,一、环境部署:mysql1:192.168.110.131 作为mastermysql2:192.168.110.132 作为slavemysql3:192.168.110.130 作为slave,同
千家信息网最后更新 2024年11月14日mysql5.6基于GTID模式之高可用架构搭建-MHA(mha0.56)

一、环境部署:

mysql1:192.168.110.131 作为master

mysql2:192.168.110.132 作为slave

mysql3:192.168.110.130 作为slave,同时作为MHA的管理机

虚拟IP:192.168.110.100

二、mysql主从环境搭建和MHA安装

1、mysql主从搭建自行搭建(基于GTID复制,打开log_bin,复制规则默认,复制所有库表),这里不再说明。

2、安装MHA节点软件:

rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm

3、安装MHA管理节点:

rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y

4、配置MHA管理目录和配置:

mkdir -p /etc/masterha #工作目录

mkdir -p /var/log/masterha/app1/ #日志目录

cat /etc/masterha/app1.cnf #配置文件

[server default]manager_log=/var/log/masterha/app1/manager.logmanager_workdir=/var/log/masterha/app1master_ip_failover_script="/usr/bin/master_ip_failover"master_ip_online_change_script="/usr/bin/master_ip_online_change"ping_interval=1remote_workdir=/tmprepl_password=replrepl_user=replsecondary_check_script="/usr/bin/masterha_secondary_check -s 192.168.110.132 -s 192.168.110.130 --user=root --master_host=mysql1 --master_ip=192.168.110.131 --master_port=3306"ssh_user=rootuser=mhapassword=mha[server1]hostname=mysql1master_binlog_dir="/data/mysql/mysql3306/binlog"port=3306[server2]hostname=mysql2master_binlog_dir="/data/mysql/mysql3306/binlog"port=3306candidate_master=1check_repl_delay=0[server3]hostname=ansiblemaster_binlog_dir="/data/mysql/mysql3306/binlog"port=3306no_master=1[binlog1]hostname=mysql2master_binlog_dir=/tmp/binlogignore_fail=1no_master=1


5、配置MHA管理数据库账号mha:

mysql>GRANT ALL PRIVILEGES ON *.* TO 'mha'@'192.168.110.%' IDENTIFIED BY 'mha';

6、配置主库的binlog实时备份,脚本部署在mysql2上(非必须,如无,请去掉app1.cnf里面的[binlog1]配置信息):


cat binlog.sh

#!/bin/shBACKUP_BIN="/usr/bin/mysqlbinlog"LOCAL_BACKUP_DIR="/tmp/binlog"BACKUP_LOG="/tmp/binlog/binary.log"REMOTE_HOST="192.168.110.131"REMOTE_PORT="3306"REMOTE_USER="root"REMOTE_PASS="root"FIRST_BINLOG="mysql-bin.000001"#time to wait before reconnecting after failureSLEEP_SECONDS=10##create local_backup_dir if necessarymkdir -p ${LOCAL_BACKUP_DIR}cd ${LOCAL_BACKUP_DIR}## 运行while循环,连接断开后等待指定时间,重新连接while :do  if [ `ls -A "${LOCAL_BACKUP_DIR}" |wc -l` -eq 0 ];then     LAST_FILE=${FIRST_BINLOG}  else     LAST_FILE=`ls -l ${LOCAL_BACKUP_DIR} | grep -v backuplog | grep mysql-bin |tail -n 1 |awk '{print $9}'`  fi  ${BACKUP_BIN} --raw --read-from-remote-server --stop-never --host=${REMOTE_HOST} --port=${REMOTE_PORT} --user=${REMOTE_USER} --password=${REMOTE_PASS} ${LAST_FILE}   echo "`date +"%Y/%m/%d %H:%M:%S"` mysqlbinlog停止,返回代码:$?" | tee -a ${BACKUP_LOG}  echo "${SLEEP_SECONDS}秒后再次连接并继续备份" | tee -a ${BACKUP_LOG}    sleep ${SLEEP_SECONDS}done

7、修改/usr/bin/master_ip_failover和master_ip_online_change脚本,这里采用脚本切换VIP的方式实现VIP漂移,需要在脚本上加上对VIP漂移的处理逻辑。

/usr/bin/master_ip_failover脚本如下:

#!/usr/bin/env perl  use strict;use warnings FATAL => 'all';  use Getopt::Long;  my (    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port);  my $gateway = '192.168.110.2';my $vip = '192.168.110.100';  # Virtual IPmy $key = "2";my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip;/sbin/arping -I eth0 -c 3 -s $vip $gateway >/dev/null 2>&1";my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";$ssh_user = "root";  GetOptions(    'command=s'          => \$command,    'ssh_user=s'         => \$ssh_user,    'orig_master_host=s' => \$orig_master_host,    'orig_master_ip=s'   => \$orig_master_ip,    'orig_master_port=i' => \$orig_master_port,    'new_master_host=s'  => \$new_master_host,    'new_master_ip=s'    => \$new_master_ip,    'new_master_port=i'  => \$new_master_port,);  exit &main();  sub main {      print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";      if ( $command eq "stop" || $command eq "stopssh" ) {          # $orig_master_host, $orig_master_ip, $orig_master_port are passed.        # If you manage master ip address at global catalog database,        # invalidate orig_master_ip here.        my $exit_code = 1;          #eval {        #    print "Disabling the VIP on old master: $orig_master_host \n";        #    &stop_vip();        #    $exit_code = 0;        #};            eval {                print "Disabling the VIP on old master: $orig_master_host \n";                #my $ping=`ping -c 1 10.0.0.13 | grep "packet loss" | awk -F',' '{print $3}' | awk '{print $1}'`;                #if ( $ping le "90.0%" && $ping gt "0.0%" ){                #$exit_code = 0;                #}                #else {                  &stop_vip();                  # updating global catalog, etc                $exit_code = 0;                  #}        };            if ($@) {            warn "Got Error: $@\n";            exit $exit_code;        }        exit $exit_code;    }    elsif ( $command eq "start" ) {          # all arguments are passed.        # If you manage master ip address at global catalog database,        # activate new_master_ip here.        # You can also grant write access (create user, set read_only=0, etc) here.        my $exit_code = 10;        eval {            print "Enabling the VIP - $vip on the new master - $new_master_host \n";            &start_vip();            $exit_code = 0;        };        if ($@) {            warn $@;            exit $exit_code;        }        exit $exit_code;    }    elsif ( $command eq "status" ) {        print "Checking the Status of the script.. OK \n";        `ssh $ssh_user\@$orig_master_ip \" $ssh_start_vip \"`;        exit 0;    }    else {        &usage();        exit 1;    }}  # A simple system call that enable the VIP on the new mastersub start_vip() {    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;}  # A simple system call that disable the VIP on the old_mastersub stop_vip() {    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;}  sub usage {    print    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";}  # the end.

/usr/bin/master_ip_online_change脚本如下:

#!/usr/bin/env perl#  Copyright (C) 2011 DeNA Co.,Ltd.##  This program is free software; you can redistribute it and/or modify#  it under the terms of the GNU General Public License as published by#  the Free Software Foundation; either version 2 of the License, or#  (at your option) any later version.##  This program is distributed in the hope that it will be useful,#  but WITHOUT ANY WARRANTY; without even the implied warranty of#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the#  GNU General Public License for more details.##  You should have received a copy of the GNU General Public License#   along with this program; if not, write to the Free Software#  Foundation, Inc.,#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA## Note: This is a sample script and is not complete. Modify the script based on your environment.use strict;use warnings FATAL => 'all';use Getopt::Long;use MHA::DBHelper;use MHA::NodeUtil;use Time::HiRes qw( sleep gettimeofday tv_interval );use Data::Dumper;my $_tstart;my $_running_interval = 0.1;my (  $command,              $orig_master_is_new_slave, $orig_master_host,  $orig_master_ip,       $orig_master_port,         $orig_master_user,  $orig_master_password, $orig_master_ssh_user,     $new_master_host,  $new_master_ip,        $new_master_port,          $new_master_user,  $new_master_password,  $new_master_ssh_user,);my $vip = '192.168.110.100';  # Virtual IPmy $key = "2";my $gateway = '192.168.110.2';my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip;/sbin/arping -I eth0 -c 3 -s $vip $gateway >/dev/null 2>&1";my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";my $sshuser = "root"; GetOptions(  'command=s'                => \$command,  'orig_master_is_new_slave' => \$orig_master_is_new_slave,  'orig_master_host=s'       => \$orig_master_host,  'orig_master_ip=s'         => \$orig_master_ip,  'orig_master_port=i'       => \$orig_master_port,  'orig_master_user=s'       => \$orig_master_user,  'orig_master_password=s'   => \$orig_master_password,  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,  'new_master_host=s'        => \$new_master_host,  'new_master_ip=s'          => \$new_master_ip,  'new_master_port=i'        => \$new_master_port,  'new_master_user=s'        => \$new_master_user,  'new_master_password=s'    => \$new_master_password,  'new_master_ssh_user=s'    => \$new_master_ssh_user,);exit &main();sub current_time_us {  my ( $sec, $microsec ) = gettimeofday();  my $curdate = localtime($sec);  return $curdate . " " . sprintf( "d", $microsec );}sub sleep_until {  my $elapsed = tv_interval($_tstart);  if ( $_running_interval > $elapsed ) {    sleep( $_running_interval - $elapsed );  }}sub get_threads_util {  my $dbh                    = shift;  my $my_connection_id       = shift;  my $running_time_threshold = shift;  my $type                   = shift;  $running_time_threshold = 0 unless ($running_time_threshold);  $type                   = 0 unless ($type);  my @threads;  my $sth = $dbh->prepare("SHOW PROCESSLIST");  $sth->execute();  while ( my $ref = $sth->fetchrow_hashref() ) {    my $id         = $ref->{Id};    my $user       = $ref->{User};    my $host       = $ref->{Host};    my $command    = $ref->{Command};    my $state      = $ref->{State};    my $query_time = $ref->{Time};    my $info       = $ref->{Info};    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);    next if ( $my_connection_id == $id );    next if ( defined($query_time) && $query_time < $running_time_threshold );    next if ( defined($command)    && $command eq "Binlog Dump" );    next if ( defined($user)       && $user eq "system user" );    next      if ( defined($command)      && $command eq "Sleep"      && defined($query_time)      && $query_time >= 1 );    if ( $type >= 1 ) {      next if ( defined($command) && $command eq "Sleep" );      next if ( defined($command) && $command eq "Connect" );    }    if ( $type >= 2 ) {      next if ( defined($info) && $info =~ m/^select/i );      next if ( defined($info) && $info =~ m/^show/i );    }    push @threads, $ref;  }  return @threads;}sub main {  if ( $command eq "stop" ) {    ## Gracefully killing connections on the current master    # 1. Set read_only= 1 on the new master    # 2. DROP USER so that no app user can establish new connections    # 3. Set read_only= 1 on the current master    # 4. Kill current queries    # * Any database access failure will result in script die.    my $exit_code = 1;    eval {      ## Setting read_only=1 on the new master (to avoid accident)      my $new_master_handler = new MHA::DBHelper();      # args: hostname, port, user, password, raise_error(die_on_error)_or_not      $new_master_handler->connect( $new_master_ip, $new_master_port,        $new_master_user, $new_master_password, 1 );      print current_time_us() . " Set read_only on the new master.. ";      $new_master_handler->enable_read_only();      if ( $new_master_handler->is_read_only() ) {        print "ok.\n";      }      else {        die "Failed!\n";      }      $new_master_handler->disconnect();      # Connecting to the orig master, die if any database error happens      my $orig_master_handler = new MHA::DBHelper();      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,        $orig_master_user, $orig_master_password, 1 );      ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand      $orig_master_handler->disable_log_bin_local();      #print current_time_us() . " Drpping app user on the orig master..\n";      #FIXME_xxx_drop_app_user($orig_master_handler);      ## Waiting for N * 100 milliseconds so that current connections can exit      my $time_until_read_only = 15;      $_tstart = [gettimeofday];      my @threads = get_threads_util( $orig_master_handler->{dbh},        $orig_master_handler->{connection_id} );      while ( $time_until_read_only > 0 && $#threads >= 0 ) {        if ( $time_until_read_only % 5 == 0 ) {          printf"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",            current_time_us(), $#threads + 1, $time_until_read_only * 100;          if ( $#threads < 5 ) {            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"              foreach (@threads);          }        }        sleep_until();        $_tstart = [gettimeofday];        $time_until_read_only--;        @threads = get_threads_util( $orig_master_handler->{dbh},          $orig_master_handler->{connection_id} );      }      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write      print current_time_us() . " Set read_only=1 on the orig master.. ";      $orig_master_handler->enable_read_only();      if ( $orig_master_handler->is_read_only() ) {        print "ok.\n";      }      else {        die "Failed!\n";      }      ## Waiting for M * 100 milliseconds so that current update queries can complete      my $time_until_kill_threads = 5;      @threads = get_threads_util( $orig_master_handler->{dbh},        $orig_master_handler->{connection_id} );      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {        if ( $time_until_kill_threads % 5 == 0 ) {          printf"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;          if ( $#threads < 5 ) {            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"              foreach (@threads);          }        }        sleep_until();        $_tstart = [gettimeofday];        $time_until_kill_threads--;        @threads = get_threads_util( $orig_master_handler->{dbh},          $orig_master_handler->{connection_id} );      }      ## Terminating all threads      print current_time_us() . " Killing all application threads..\n";      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );      print current_time_us() . " done.\n";      $orig_master_handler->enable_log_bin_local();      $orig_master_handler->disconnect();      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK      print "Disabling the VIP on old master: $orig_master_host \n";      &stop_vip();      $exit_code = 0;    };    if ($@) {      warn "Got Error: $@\n";      exit $exit_code;    }    exit $exit_code;  }  elsif ( $command eq "start" ) {    ## Activating master ip on the new master    # 1. Create app user with write privileges    # 2. Moving backup script if needed    # 3. Register new master's ip to the catalog database# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.# If exit code is 0 or 10, MHA does not abort    my $exit_code = 10;    eval {      my $new_master_handler = new MHA::DBHelper();      # args: hostname, port, user, password, raise_error_or_not      $new_master_handler->connect( $new_master_ip, $new_master_port,        $new_master_user, $new_master_password, 1 );      ## Set read_only=0 on the new master      $new_master_handler->disable_log_bin_local();      print current_time_us() . " Set read_only=0 on the new master.\n";      $new_master_handler->disable_read_only();      ## Creating an app user on the new master      #print current_time_us() . " Creating app user on the new master..\n";      #FIXME_xxx_create_app_user($new_master_handler);      $new_master_handler->enable_log_bin_local();      $new_master_handler->disconnect();            print "Enabling the VIP - $vip on the new master - $new_master_host \n";      &start_vip();       ## Update master ip on the catalog database, etc      $exit_code = 0;    };    if ($@) {      warn "Got Error: $@\n";      exit $exit_code;    }    exit $exit_code;  }  elsif ( $command eq "status" ) {    # do nothing    exit 0;  }  else {    &usage();    exit 1;  }}# A simple system call that enable the VIP on the new mastersub start_vip() {    `ssh $sshuser\@$new_master_host \" $ssh_start_vip \"`;}# A simple system call that disable the VIP on the old_mastersub stop_vip() {    `ssh $sshuser\@$orig_master_host \" $ssh_stop_vip \"`;}sub usage {  print"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";  die;}
注:FIXME_xxx_drop_app_user和FIXME_xxx_create_app_user请注释掉,运行该脚本会报错,暂时注销掉,不影响后续的进程。

8、配置SSH免密登陆:(以ansible为例)

ssh-keygen -t rsassh-copy-id -i ~/.ssh/id_rsa.pub root@mysql1ssh-copy-id -i ~/.ssh/id_rsa.pub root@mysql2ssh-copy-id -i ~/.ssh/id_rsa.pub root@ansible

基本上MHA manager,MNA node,以及二次检测的节点,都需要互相信任

注:这里配置免密的时候采用主机名,那在app1.cnf里的hostname对应也要配置成主机名。如果是IP,那就配置成IP。


9、配置sysbench压测工具,模拟生产对数据库进行压测。

安装sysbench请自行百度,这里只提供压测脚本。

mysql>grant all privileges on sbtest1.* on root@'192.168.110.%' identified by 'root';


vim sysbench.sh

#!/bin/shfunction prepare(){sysbench --test=oltp --mysql-host=192.168.110.100 --mysql-port=3306 --mysql-user=root --mysql-password=root --mysql-db=sbtest1 --oltp-num-tables=10 --oltp-table-size=500000 --report-interval=10  --max-requests=0 --oltp-test-mode=nontrx --oltp-nontrx-mode=select --oltp-read-only=off --max-time=30 --num-threads=10 prepare}function run(){sysbench --test=oltp --mysql-host=192.168.110.100 --mysql-port=3306 --mysql-user=root --mysql-password=root --mysql-db=sbtest1 --oltp-num-tables=10 --oltp-table-size=500000 --report-interval=10  --max-requests=0 --oltp-test-mode=nontrx --oltp-nontrx-mode=select --oltp-read-only=off --max-time=30 --num-threads=10 run}function cleanup(){sysbench --test=oltp --mysql-host=192.168.110.100 --mysql-port=3306 --mysql-user=root --mysql-password=root --mysql-db=sbtest1 --oltp-num-tables=10 --oltp-table-size=500000 --report-interval=10  --max-requests=0 --oltp-test-mode=nontrx --oltp-nontrx-mode=select --oltp-read-only=off --max-time=30 --num-threads=10 cleanup}case "$1" inprepare)prepare;;run)run;;cleanup)cleanup;;*)echo "Usage: $0 prepare|run|cleanup"esac


脚本对VIP所在的mysql即主库进行压测。


10、检测SSH状态:

[root@ansible app1]# masterha_check_ssh --conf=/etc/masterha/app1.cnf Wed Mar 15 13:10:05 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Wed Mar 15 13:10:05 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Wed Mar 15 13:10:05 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Wed Mar 15 13:10:05 2017 - [info] Starting SSH connection tests..Wed Mar 15 13:10:06 2017 - [debug] Wed Mar 15 13:10:05 2017 - [debug]  Connecting via SSH from root@mysql1(192.168.110.131:22) to root@mysql2(192.168.110.132:22)..Wed Mar 15 13:10:05 2017 - [debug]   ok.Wed Mar 15 13:10:05 2017 - [debug]  Connecting via SSH from root@mysql1(192.168.110.131:22) to root@ansible(192.168.110.130:22)..Wed Mar 15 13:10:06 2017 - [debug]   ok.Wed Mar 15 13:10:06 2017 - [debug] Wed Mar 15 13:10:05 2017 - [debug]  Connecting via SSH from root@mysql2(192.168.110.132:22) to root@mysql1(192.168.110.131:22)..Wed Mar 15 13:10:06 2017 - [debug]   ok.Wed Mar 15 13:10:06 2017 - [debug]  Connecting via SSH from root@mysql2(192.168.110.132:22) to root@ansible(192.168.110.130:22)..Wed Mar 15 13:10:06 2017 - [debug]   ok.Wed Mar 15 13:10:07 2017 - [debug] Wed Mar 15 13:10:06 2017 - [debug]  Connecting via SSH from root@ansible(192.168.110.130:22) to root@mysql1(192.168.110.131:22)..Wed Mar 15 13:10:06 2017 - [debug]   ok.Wed Mar 15 13:10:06 2017 - [debug]  Connecting via SSH from root@ansible(192.168.110.130:22) to root@mysql2(192.168.110.132:22)..Wed Mar 15 13:10:07 2017 - [debug]   ok.Wed Mar 15 13:10:07 2017 - [info] All SSH connection tests passed successfully.

11、检测复制状态:

[root@ansible app1]# masterha_check_repl --conf=/etc/masterha/app1.cnf

Tue Mar 14 17:29:38 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Tue Mar 14 17:29:38 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Tue Mar 14 17:29:38 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Tue Mar 14 17:29:38 2017 - [info] MHA::MasterMonitor version 0.56.Tue Mar 14 17:29:38 2017 - [info] GTID failover mode = 0Tue Mar 14 17:29:38 2017 - [info] Dead Servers:Tue Mar 14 17:29:38 2017 - [info] Alive Servers:Tue Mar 14 17:29:38 2017 - [info]   mysql1(192.168.110.131:3306)Tue Mar 14 17:29:38 2017 - [info]   mysql2(192.168.110.132:3306)Tue Mar 14 17:29:38 2017 - [info]   ansible(192.168.110.130:3306)Tue Mar 14 17:29:38 2017 - [info] Alive Slaves:Tue Mar 14 17:29:38 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 17:29:38 2017 - [info]     GTID ONTue Mar 14 17:29:38 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 17:29:38 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Tue Mar 14 17:29:38 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 17:29:38 2017 - [info]     GTID ONTue Mar 14 17:29:38 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 17:29:38 2017 - [info] Current Alive Master: mysql1(192.168.110.131:3306)Tue Mar 14 17:29:38 2017 - [info] Checking slave configurations..Tue Mar 14 17:29:38 2017 - [info]  read_only=1 is not set on slave mysql2(192.168.110.132:3306).Tue Mar 14 17:29:38 2017 - [info]  read_only=1 is not set on slave ansible(192.168.110.130:3306).Tue Mar 14 17:29:38 2017 - [info] Checking replication filtering settings..Tue Mar 14 17:29:38 2017 - [info]  binlog_do_db= , binlog_ignore_db= Tue Mar 14 17:29:38 2017 - [info]  Replication filtering check ok.Tue Mar 14 17:29:38 2017 - [info] GTID (with auto-pos) is not supportedTue Mar 14 17:29:38 2017 - [info] Starting SSH connection tests..Tue Mar 14 17:29:40 2017 - [info] All SSH connection tests passed successfully.Tue Mar 14 17:29:40 2017 - [info] Checking MHA Node version..Tue Mar 14 17:29:40 2017 - [info]  Version check ok.Tue Mar 14 17:29:40 2017 - [info] Checking SSH publickey authentication settings on the current master..Tue Mar 14 17:29:40 2017 - [info] HealthCheck: SSH to mysql1 is reachable.Tue Mar 14 17:29:40 2017 - [info] Checking recovery script configurations on mysql1(192.168.110.131:3306)..Tue Mar 14 17:29:40 2017 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/mysql3306/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000001 Tue Mar 14 17:29:40 2017 - [info]   Connecting to root@192.168.110.131(mysql1:22)..   Creating /tmp if not exists..    ok.  Checking output directory is accessible or not..   ok.  Binlog found at /data/mysql/mysql3306/binlog, up to mysql-bin.000001Tue Mar 14 17:29:41 2017 - [info] Binlog setting check done.Tue Mar 14 17:29:41 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..Tue Mar 14 17:29:41 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=mysql2 --slave_ip=192.168.110.132 --slave_port=3306 --workdir=/tmp --target_version=5.6.34-log --manager_version=0.56 --relay_dir=/data/mysql/mysql3306/relaylog --current_relay_log=slave-relay-bin.000002  --slave_pass=xxxTue Mar 14 17:29:41 2017 - [info]   Connecting to root@192.168.110.132(mysql2:22)..   Checking slave recovery environment settings..    Relay log found at /data/mysql/mysql3306/relaylog, up to slave-relay-bin.000002    Temporary relay log file is /data/mysql/mysql3306/relaylog/slave-relay-bin.000002    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done.    Testing mysqlbinlog output.. done.    Cleaning up test file(s).. done.Tue Mar 14 17:29:42 2017 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=ansible --slave_ip=192.168.110.130 --slave_port=3306 --workdir=/tmp --target_version=5.6.34-log --manager_version=0.56 --relay_dir=/data/mysql/mysql3306/relaylog --current_relay_log=slave-relay-bin.000002  --slave_pass=xxxTue Mar 14 17:29:42 2017 - [info]   Connecting to root@192.168.110.130(ansible:22)..   Checking slave recovery environment settings..    Relay log found at /data/mysql/mysql3306/relaylog, up to slave-relay-bin.000002    Temporary relay log file is /data/mysql/mysql3306/relaylog/slave-relay-bin.000002    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done.    Testing mysqlbinlog output.. done.    Cleaning up test file(s).. done.Tue Mar 14 17:29:42 2017 - [info] Slaves settings check done.Tue Mar 14 17:29:42 2017 - [info] mysql1(192.168.110.131:3306) (current master) +--mysql2(192.168.110.132:3306) +--ansible(192.168.110.130:3306)Tue Mar 14 17:29:42 2017 - [info] Checking replication health on mysql2..Tue Mar 14 17:29:42 2017 - [info]  ok.Tue Mar 14 17:29:42 2017 - [info] Checking replication health on ansible..Tue Mar 14 17:29:42 2017 - [info]  ok.Tue Mar 14 17:29:42 2017 - [info] Checking master_ip_failover_script status:Tue Mar 14 17:29:42 2017 - [info]   /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:2 down==/sbin/ifconfig eth0:2 192.168.110.100;/sbin/arping -I eth0 -c 3 -s 192.168.110.100 192.168.110.2 >/dev/null 2>&1===Checking the Status of the script.. OK Tue Mar 14 17:29:45 2017 - [info]  OK.Tue Mar 14 17:29:45 2017 - [warning] shutdown_script is not defined.Tue Mar 14 17:29:45 2017 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.


13、启动、关闭和检测manager状态:

vim masterha_manager.sh

#!/bin/shfunction start(){nohup /usr/bin/masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 & }function stop(){/usr/bin/masterha_stop --conf=/etc/masterha/app1.cnf}function status(){/usr/bin/masterha_check_status --conf=/etc/masterha/app1.cnf}case "$1" instart)start;;stop)stop;;status)status;;*)echo "Usage: $0 start|stop"esac


14、初始化VIP:

首次启用MHA,需要提前手工配置VIP在mysql主库上。

shell>/sbin/ifconfig eth0:2 192.168.110.100


三、场景测试:

1、自动failover:


  • 启动manager进程:sh masterha_manager.sh start

  • 启动压测脚本:sh sysbench.sh prepare

  • 中断mysql2的slave线程,以造成mysql2有大量延迟:mysql>stop slave;

  • 压测一段时间

  • 启动mysql2的slave线程,检查slave状态。并可以强行杀掉主库进程(pkill -9 mysql),即使sysbench.sh还未跑完的时候

  • 观察此时MHA manager进程状态和日志信息。

  • 检验VIP是否切换,检查主库是否切换,从库是否从新主库那里复制。

  • 把旧主库拉起来,检查3台mysql数据库的GTID号是否一致,是否有丢数据的情况。


测试检验出来的情况:

1、即使在配置文件中配置mysql2为candidate_master=1和check_repl_delay=0,但是mysql2延迟量太大的话,MHA还是会选择拥有最新数据的slave作为master。

2、3台数据库的GTID号一致,没有丢数据。MHA基于GTID的架构也可以保证数据的高一致性。

3、MHA0.56基于GTID模式对MySQL一主一从的架构的支持好像有问题,如果从库没有延迟的话,可以正常failover,如果从库有延迟的话,MHA的manager进程会一直卡在Waiting all logs to be applied..这个进度下,也没有报错。(一开始测试是一主一从架构,后来改为一主两从)


日志如下:

[root@ansible app1]# cat manager.log

Tue Mar 14 16:25:27 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Tue Mar 14 16:25:27 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Tue Mar 14 16:25:27 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Tue Mar 14 16:26:28 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Tue Mar 14 16:26:28 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Tue Mar 14 16:26:28 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..ive Servers:Tue Mar 14 16:25:28 2017 - [info]   mysql1(192.168.110.131:3306)Tue Mar 14 16:25:28 2017 - [info]   mysql2(192.168.110.132:3306)Tue Mar 14 16:25:28 2017 - [info]   ansible(192.168.110.130:3306)Tue Mar 14 16:25:28 2017 - [info] Alive Slaves:Tue Mar 14 16:25:28 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:25:28 2017 - [info]     GTID ONTue Mar 14 16:25:28 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:25:28 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Tue Mar 14 16:25:28 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:25:28 2017 - [info]     GTID ONTue Mar 14 16:25:28 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:25:28 2017 - [info] Current Alive Master: mysql1(192.168.110.131:3306)Tue Mar 14 16:25:28 2017 - [info] Checking slave configurations..Tue Mar 14 16:25:28 2017 - [info]  read_only=1 is not set on slave mysql2(192.168.110.132:3306).Tue Mar 14 16:25:28 2017 - [info]  read_only=1 is not set on slave ansible(192.168.110.130:3306).Tue Mar 14 16:25:28 2017 - [info] Checking replication filtering settings..Tue Mar 14 16:25:28 2017 - [info]  binlog_do_db= , binlog_ignore_db= Tue Mar 14 16:25:28 2017 - [info]  Replication filtering check ok.Tue Mar 14 16:25:28 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.Tue Mar 14 16:25:29 2017 - [info] HealthCheck: SSH to mysql2 is reachable.Tue Mar 14 16:25:29 2017 - [info] Binlog server mysql2 is reachable.Tue Mar 14 16:25:29 2017 - [info] Checking recovery script configurations on mysql2(192.168.110.132:3306)..Tue Mar 14 16:25:29 2017 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/tmp/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000001 Tue Mar 14 16:25:29 2017 - [info]   Connecting to root@192.168.110.132(mysql2:22)..   Creating /tmp if not exists..    ok.  Checking output directory is accessible or not..   ok.  Binlog found at /tmp/binlog, up to mysql-bin.000015Tue Mar 14 16:25:29 2017 - [info] Binlog setting check done.Tue Mar 14 16:25:29 2017 - [info] Checking SSH publickey authentication settings on the current master..Tue Mar 14 16:25:34 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to mysql1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.Tue Mar 14 16:25:34 2017 - [info] mysql1(192.168.110.131:3306) (current master) +--mysql2(192.168.110.132:3306) +--ansible(192.168.110.130:3306)Tue Mar 14 16:25:34 2017 - [info] Checking master_ip_failover_script status:Tue Mar 14 16:25:34 2017 - [info]   /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:2 down==/sbin/ifconfig eth0:2 192.168.110.100;/sbin/arping -I eth0 -c 3 -s 192.168.110.100 192.168.110.2 >/dev/null 2>&1===Checking the Status of the script.. OK Tue Mar 14 16:25:49 2017 - [info]  OK.Tue Mar 14 16:25:49 2017 - [warning] shutdown_script is not defined.Tue Mar 14 16:25:49 2017 - [info] Set master ping interval 1 seconds.Tue Mar 14 16:25:49 2017 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s 192.168.110.132 -s 192.168.110.130 --user=root --master_host=mysql1 --master_ip=192.168.110.131 --master_port=3306Tue Mar 14 16:25:49 2017 - [info] Starting ping health check on mysql1(192.168.110.131:3306)..Tue Mar 14 16:25:49 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..Tue Mar 14 16:26:25 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)Tue Mar 14 16:26:25 2017 - [info] Executing SSH check script: exit 0Tue Mar 14 16:26:25 2017 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s 192.168.110.132 -s 192.168.110.130 --user=root --master_host=mysql1 --master_ip=192.168.110.131 --master_port=3306  --user=root  --master_host=mysql1  --master_ip=192.168.110.131  --master_port=3306 --master_user=mha --master_password=mha --ping_type=SELECTMonitoring server 192.168.110.132 is reachable, Master is not reachable from 192.168.110.132. OK.Tue Mar 14 16:26:26 2017 - [info] HealthCheck: SSH to mysql1 is reachable.Monitoring server 192.168.110.130 is reachable, Master is not reachable from 192.168.110.130. OK.Tue Mar 14 16:26:26 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start.Tue Mar 14 16:26:26 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Tue Mar 14 16:26:26 2017 - [warning] Connection failed 2 time(s)..Tue Mar 14 16:26:27 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Tue Mar 14 16:26:27 2017 - [warning] Connection failed 3 time(s)..Tue Mar 14 16:26:28 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Tue Mar 14 16:26:28 2017 - [warning] Connection failed 4 time(s)..Tue Mar 14 16:26:28 2017 - [warning] Master is not reachable from health checker!Tue Mar 14 16:26:28 2017 - [warning] Master mysql1(192.168.110.131:3306) is not reachable!Tue Mar 14 16:26:28 2017 - [warning] SSH is reachable.Tue Mar 14 16:26:28 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..Tue Mar 14 16:26:28 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Tue Mar 14 16:26:28 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Tue Mar 14 16:26:28 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Tue Mar 14 16:26:28 2017 - [info] GTID failover mode = 1Tue Mar 14 16:26:28 2017 - [info] Dead Servers:Tue Mar 14 16:26:28 2017 - [info]   mysql1(192.168.110.131:3306)Tue Mar 14 16:26:28 2017 - [info] Alive Servers:Tue Mar 14 16:26:28 2017 - [info]   mysql2(192.168.110.132:3306)Tue Mar 14 16:26:28 2017 - [info]   ansible(192.168.110.130:3306)Tue Mar 14 16:26:28 2017 - [info] Alive Slaves:Tue Mar 14 16:26:28 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:26:28 2017 - [info]     GTID ONTue Mar 14 16:26:28 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:26:28 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Tue Mar 14 16:26:28 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:26:28 2017 - [info]     GTID ONTue Mar 14 16:26:28 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:26:28 2017 - [info] Checking slave configurations..Tue Mar 14 16:26:28 2017 - [info]  read_only=1 is not set on slave mysql2(192.168.110.132:3306).Tue Mar 14 16:26:28 2017 - [info]  read_only=1 is not set on slave ansible(192.168.110.130:3306).Tue Mar 14 16:26:28 2017 - [info] Checking replication filtering settings..Tue Mar 14 16:26:28 2017 - [info]  Replication filtering check ok.Tue Mar 14 16:26:28 2017 - [info] Master is down!Tue Mar 14 16:26:28 2017 - [info] Terminating monitoring script.Tue Mar 14 16:26:28 2017 - [info] Got exit code 20 (Master dead).Tue Mar 14 16:26:28 2017 - [info] MHA::MasterFailover version 0.56.Tue Mar 14 16:26:28 2017 - [info] Starting master failover.Tue Mar 14 16:26:28 2017 - [info] Tue Mar 14 16:26:28 2017 - [info] * Phase 1: Configuration Check Phase..Tue Mar 14 16:26:28 2017 - [info] Tue Mar 14 16:26:28 2017 - [info] HealthCheck: SSH to mysql2 is reachable.Tue Mar 14 16:26:28 2017 - [info] Binlog server mysql2 is reachable.Tue Mar 14 16:26:29 2017 - [info] GTID failover mode = 1Tue Mar 14 16:26:29 2017 - [info] Dead Servers:Tue Mar 14 16:26:29 2017 - [info]   mysql1(192.168.110.131:3306)Tue Mar 14 16:26:29 2017 - [info] Checking master reachability via MySQL(double check)...Tue Mar 14 16:26:29 2017 - [info]  ok.Tue Mar 14 16:26:29 2017 - [info] Alive Servers:Tue Mar 14 16:26:29 2017 - [info]   mysql2(192.168.110.132:3306)Tue Mar 14 16:26:29 2017 - [info]   ansible(192.168.110.130:3306)Tue Mar 14 16:26:29 2017 - [info] Alive Slaves:Tue Mar 14 16:26:29 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:26:29 2017 - [info]     GTID ONTue Mar 14 16:26:29 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:26:29 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Tue Mar 14 16:26:29 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:26:29 2017 - [info]     GTID ONTue Mar 14 16:26:29 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:26:29 2017 - [info] Starting GTID based failover.Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] ** Phase 1: Configuration Check Phase completed.Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] Forcing shutdown so that applications never connect to the current master..Tue Mar 14 16:26:29 2017 - [info] Executing master IP deactivation script:Tue Mar 14 16:26:29 2017 - [info]   /usr/bin/master_ip_failover --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 --command=stopssh --ssh_user=root  IN SCRIPT TEST====/sbin/ifconfig eth0:2 down==/sbin/ifconfig eth0:2 192.168.110.100;/sbin/arping -I eth0 -c 3 -s 192.168.110.100 192.168.110.2 >/dev/null 2>&1===Disabling the VIP on old master: mysql1 Tue Mar 14 16:26:29 2017 - [info]  done.Tue Mar 14 16:26:29 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Tue Mar 14 16:26:29 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] * Phase 3: Master Recovery Phase..Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:133309374Tue Mar 14 16:26:29 2017 - [info] Retrieved Gtid Set: c897448e-07c8-11e7-8af3-000c29918f29:1-244Tue Mar 14 16:26:29 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Tue Mar 14 16:26:29 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:26:29 2017 - [info]     GTID ONTue Mar 14 16:26:29 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:26:29 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:31201438Tue Mar 14 16:26:29 2017 - [info] Retrieved Gtid Set: c897448e-07c8-11e7-8af3-000c29918f29:1-81Tue Mar 14 16:26:29 2017 - [info] Oldest slaves:Tue Mar 14 16:26:29 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:26:29 2017 - [info]     GTID ONTue Mar 14 16:26:29 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:26:29 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] * Phase 3.3: Determining New Master Phase..Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [warning]  Slave mysql2(192.168.110.132:3306) SQL Thread delays too much. Latest log file:mysql-bin.000001:133309374, Current log file:mysql-bin.000001:31152313. This server is not selected as a new master because recovery will take long time.Tue Mar 14 16:26:29 2017 - [info] Searching new master from slaves..Tue Mar 14 16:26:29 2017 - [info]  Candidate masters from the configuration file:Tue Mar 14 16:26:29 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:26:29 2017 - [info]     GTID ONTue Mar 14 16:26:29 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:26:29 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Tue Mar 14 16:26:29 2017 - [info]  Non-candidate masters:Tue Mar 14 16:26:29 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledTue Mar 14 16:26:29 2017 - [info]     GTID ONTue Mar 14 16:26:29 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Tue Mar 14 16:26:29 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Tue Mar 14 16:26:29 2017 - [info]  Searching from candidate_master slaves which have received the latest relay log events..Tue Mar 14 16:26:29 2017 - [info]   Not found.Tue Mar 14 16:26:29 2017 - [info]  Searching from all candidate_master slaves..Tue Mar 14 16:26:29 2017 - [info]   Not found.Tue Mar 14 16:26:29 2017 - [info]  Searching from all slaves which have received the latest relay log events..Tue Mar 14 16:26:29 2017 - [info] New master is ansible(192.168.110.130:3306)Tue Mar 14 16:26:29 2017 - [info] Starting master failover..Tue Mar 14 16:26:29 2017 - [info] From:mysql1(192.168.110.131:3306) (current master) +--mysql2(192.168.110.132:3306) +--ansible(192.168.110.130:3306)To:ansible(192.168.110.130:3306) (new master) +--mysql2(192.168.110.132:3306)Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] * Phase 3.3: New Master Recovery Phase..Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info]  Waiting all logs to be applied.. Tue Mar 14 16:26:29 2017 - [info]   done.Tue Mar 14 16:26:29 2017 - [info] -- Saving binlog from host mysql2 started, pid: 39734Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] Log messages from mysql2 ...Tue Mar 14 16:26:29 2017 - [info] Tue Mar 14 16:26:29 2017 - [info] Fetching binary logs from binlog server mysql2..Tue Mar 14 16:26:29 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file=mysql-bin.000001  --start_pos=133309374 --output_file=/tmp/saved_binlog_binlog1_20170314162628.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.6.34-log  --binlog_dir=/tmp/binlog   Creating /tmp if not exists..    ok. Concat binary/relay logs from mysql-bin.000001 pos 133309374 to mysql-bin.000015 EOF into /tmp/saved_binlog_binlog1_20170314162628.binlog ..Target file /tmp/binlog/mysql-bin.000001 size=92195648, this is smaller than exec pos 133309374. Skipping.Event not exists.Tue Mar 14 16:26:29 2017 - [info] Additional events were not found from the binlog server. No need to save.Tue Mar 14 16:26:29 2017 - [info] End of log messages from mysql2.Tue Mar 14 16:26:29 2017 - [info] No binlog events found from mysql2. SkippingTue Mar 14 16:26:29 2017 - [info] Getting new master's binlog name and position..Tue Mar 14 16:26:29 2017 - [info]  mysql-bin.000001:133306806Tue Mar 14 16:26:29 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='ansible or 192.168.110.130', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Mar 14 16:26:29 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 133306806, c897448e-07c8-11e7-8af3-000c29918f29:1-244Tue Mar 14 16:26:29 2017 - [info] Executing master IP activate script:Tue Mar 14 16:26:29 2017 - [info]   /usr/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 --new_master_host=ansible --new_master_ip=192.168.110.130 --new_master_port=3306 --new_master_user='mha' --new_master_password='mha'  Unknown option: new_master_userUnknown option: new_master_passwordIN SCRIPT TEST====/sbin/ifconfig eth0:2 down==/sbin/ifconfig eth0:2 192.168.110.100;/sbin/arping -I eth0 -c 3 -s 192.168.110.100 192.168.110.2 >/dev/null 2>&1===Enabling the VIP - 192.168.110.100 on the new master - ansible Tue Mar 14 16:26:32 2017 - [info]  OK.Tue Mar 14 16:26:33 2017 - [info] ** Finished master recovery successfully.Tue Mar 14 16:26:33 2017 - [info] * Phase 3: Master Recovery Phase completed.Tue Mar 14 16:26:33 2017 - [info] Tue Mar 14 16:26:33 2017 - [info] * Phase 4: Slaves Recovery Phase..Tue Mar 14 16:26:33 2017 - [info] Tue Mar 14 16:26:33 2017 - [info] Tue Mar 14 16:26:33 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Tue Mar 14 16:26:33 2017 - [info] Tue Mar 14 16:26:33 2017 - [info] -- Slave recovery on host mysql2(192.168.110.132:3306) started, pid: 39751. Check tmp log /var/log/masterha/app1/mysql2_3306_20170314162628.log if it takes time..Tue Mar 14 16:27:14 2017 - [info] Tue Mar 14 16:27:14 2017 - [info] Log messages from mysql2 ...Tue Mar 14 16:27:14 2017 - [info] Tue Mar 14 16:26:33 2017 - [info]  Resetting slave mysql2(192.168.110.132:3306) and starting replication from the new master ansible(192.168.110.130:3306)..Tue Mar 14 16:26:33 2017 - [info]  Executed CHANGE MASTER.Tue Mar 14 16:26:33 2017 - [info]  Slave started.Tue Mar 14 16:27:14 2017 - [info]  gtid_wait(c897448e-07c8-11e7-8af3-000c29918f29:1-244) completed on mysql2(192.168.110.132:3306). Executed 168 events.Tue Mar 14 16:27:14 2017 - [info] End of log messages from mysql2.Tue Mar 14 16:27:14 2017 - [info] -- Slave on host mysql2(192.168.110.132:3306) started.Tue Mar 14 16:27:14 2017 - [info] All new slave servers recovered successfully.Tue Mar 14 16:27:14 2017 - [info] Tue Mar 14 16:27:14 2017 - [info] * Phase 5: New master cleanup phase..Tue Mar 14 16:27:14 2017 - [info] Tue Mar 14 16:27:14 2017 - [info] Resetting slave info on the new master..Tue Mar 14 16:27:15 2017 - [info]  ansible: Resetting slave info succeeded.Tue Mar 14 16:27:15 2017 - [info] Master failover to ansible(192.168.110.130:3306) completed successfully.Tue Mar 14 16:27:15 2017 - [info] ----- Failover Report -----app1: MySQL Master failover mysql1(192.168.110.131:3306) to ansible(192.168.110.130:3306) succeededMaster mysql1(192.168.110.131:3306) is down!Check MHA Manager logs at ansible:/var/log/masterha/app1/manager.log for details.Started automated(non-interactive) failover.Invalidated master IP address on mysql1(192.168.110.131:3306)Selected ansible(192.168.110.130:3306) as a new master.ansible(192.168.110.130:3306): OK: Applying all logs succeeded.ansible(192.168.110.130:3306): OK: Activated master IP address.mysql2(192.168.110.132:3306): OK: Slave started, replicating from ansible(192.168.110.130:3306)ansible(192.168.110.130:3306): Resetting slave info succeeded.Master failover to ansible(192.168.110.130:3306) completed successfully.


如果app.cnf配置成:mysql2是candidate_master,而ansible是no_master。但是mysql2延迟量太大,ansible拥有最新数据却不能成为master的情况下,那么MHA的manager会报错退出,自动failover失败。

[server1]hostname=mysql1master_binlog_dir="/data/mysql/mysql3306/binlog"port=3306[server2]hostname=mysql2master_binlog_dir="/data/mysql/mysql3306/binlog"port=3306candidate_master=1check_repl_delay=0[server3]hostname=ansiblemaster_binlog_dir="/data/mysql/mysql3306/binlog"port=3306no_master=1

日志如下:

Wed Mar 15 16:21:58 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Wed Mar 15 16:21:58 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Wed Mar 15 16:21:58 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Wed Mar 15 16:41:35 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Wed Mar 15 16:41:35 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Wed Mar 15 16:41:35 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..ive Servers:Wed Mar 15 16:22:01 2017 - [info]   mysql1(192.168.110.131:3306)Wed Mar 15 16:22:01 2017 - [info]   mysql2(192.168.110.132:3306)Wed Mar 15 16:22:01 2017 - [info]   ansible(192.168.110.130:3306)Wed Mar 15 16:22:01 2017 - [info] Alive Slaves:Wed Mar 15 16:22:01 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:22:01 2017 - [info]     GTID ONWed Mar 15 16:22:01 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 16:22:01 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 16:22:01 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:22:01 2017 - [info]     GTID ONWed Mar 15 16:22:01 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Wed Mar 15 16:22:01 2017 - [info]     Not candidate for the new Master (no_master is set)Wed Mar 15 16:22:01 2017 - [info] Current Alive Master: mysql1(192.168.110.131:3306)Wed Mar 15 16:22:01 2017 - [info] Checking slave configurations..Wed Mar 15 16:22:01 2017 - [info]  read_only=1 is not set on slave ansible(192.168.110.130:3306).Wed Mar 15 16:22:01 2017 - [info] Checking replication filtering settings..Wed Mar 15 16:22:01 2017 - [info]  binlog_do_db= , binlog_ignore_db= Wed Mar 15 16:22:01 2017 - [info]  Replication filtering check ok.Wed Mar 15 16:22:01 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.Wed Mar 15 16:22:02 2017 - [info] HealthCheck: SSH to mysql2 is reachable.Wed Mar 15 16:22:02 2017 - [info] Binlog server mysql2 is reachable.Wed Mar 15 16:22:02 2017 - [info] Checking recovery script configurations on mysql2(192.168.110.132:3306)..Wed Mar 15 16:22:02 2017 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/tmp/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000003 Wed Mar 15 16:22:02 2017 - [info]   Connecting to root@192.168.110.132(mysql2:22)..   Creating /tmp if not exists..    ok.  Checking output directory is accessible or not..   ok.  Binlog found at /tmp/binlog, up to mysql-bin.000015Wed Mar 15 16:22:02 2017 - [info] Binlog setting check done.Wed Mar 15 16:22:02 2017 - [info] Checking SSH publickey authentication settings on the current master..Wed Mar 15 16:22:07 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to mysql1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342.Wed Mar 15 16:22:07 2017 - [info] mysql1(192.168.110.131:3306) (current master) +--mysql2(192.168.110.132:3306) +--ansible(192.168.110.130:3306)Wed Mar 15 16:22:07 2017 - [info] Checking master_ip_failover_script status:Wed Mar 15 16:22:07 2017 - [info]   /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:2 down==/sbin/ifconfig eth0:2 192.168.110.100;/sbin/arping -I eth0 -c 3 -s 192.168.110.100 192.168.110.2 >/dev/null 2>&1===Checking the Status of the script.. OK Wed Mar 15 16:22:24 2017 - [info]  OK.Wed Mar 15 16:22:24 2017 - [warning] shutdown_script is not defined.Wed Mar 15 16:22:24 2017 - [info] Set master ping interval 1 seconds.Wed Mar 15 16:22:24 2017 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s 192.168.110.132 -s 192.168.110.130 --user=root --master_host=mysql1 --master_ip=192.168.110.131 --master_port=3306Wed Mar 15 16:22:24 2017 - [info] Starting ping health check on mysql1(192.168.110.131:3306)..Wed Mar 15 16:22:24 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..Wed Mar 15 16:41:32 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)Wed Mar 15 16:41:32 2017 - [info] Executing SSH check script: exit 0Wed Mar 15 16:41:32 2017 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s 192.168.110.132 -s 192.168.110.130 --user=root --master_host=mysql1 --master_ip=192.168.110.131 --master_port=3306  --user=root  --master_host=mysql1  --master_ip=192.168.110.131  --master_port=3306 --master_user=mha --master_password=mha --ping_type=SELECTMonitoring server 192.168.110.132 is reachable, Master is not reachable from 192.168.110.132. OK.Wed Mar 15 16:41:33 2017 - [info] HealthCheck: SSH to mysql1 is reachable.Monitoring server 192.168.110.130 is reachable, Master is not reachable from 192.168.110.130. OK.Wed Mar 15 16:41:33 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start.Wed Mar 15 16:41:33 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Wed Mar 15 16:41:33 2017 - [warning] Connection failed 2 time(s)..Wed Mar 15 16:41:34 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Wed Mar 15 16:41:34 2017 - [warning] Connection failed 3 time(s)..Wed Mar 15 16:41:35 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Wed Mar 15 16:41:35 2017 - [warning] Connection failed 4 time(s)..Wed Mar 15 16:41:35 2017 - [warning] Master is not reachable from health checker!Wed Mar 15 16:41:35 2017 - [warning] Master mysql1(192.168.110.131:3306) is not reachable!Wed Mar 15 16:41:35 2017 - [warning] SSH is reachable.Wed Mar 15 16:41:35 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..Wed Mar 15 16:41:35 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Wed Mar 15 16:41:35 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Wed Mar 15 16:41:35 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Wed Mar 15 16:41:35 2017 - [info] GTID failover mode = 1Wed Mar 15 16:41:35 2017 - [info] Dead Servers:Wed Mar 15 16:41:35 2017 - [info]   mysql1(192.168.110.131:3306)Wed Mar 15 16:41:35 2017 - [info] Alive Servers:Wed Mar 15 16:41:35 2017 - [info]   mysql2(192.168.110.132:3306)Wed Mar 15 16:41:35 2017 - [info]   ansible(192.168.110.130:3306)Wed Mar 15 16:41:35 2017 - [info] Alive Slaves:Wed Mar 15 16:41:35 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:41:35 2017 - [info]     GTID ONWed Mar 15 16:41:35 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 16:41:35 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 16:41:35 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:41:35 2017 - [info]     GTID ONWed Mar 15 16:41:35 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Wed Mar 15 16:41:35 2017 - [info]     Not candidate for the new Master (no_master is set)Wed Mar 15 16:41:35 2017 - [info] Checking slave configurations..Wed Mar 15 16:41:35 2017 - [info]  read_only=1 is not set on slave ansible(192.168.110.130:3306).Wed Mar 15 16:41:35 2017 - [info] Checking replication filtering settings..Wed Mar 15 16:41:35 2017 - [info]  Replication filtering check ok.Wed Mar 15 16:41:35 2017 - [info] Master is down!Wed Mar 15 16:41:35 2017 - [info] Terminating monitoring script.Wed Mar 15 16:41:35 2017 - [info] Got exit code 20 (Master dead).Wed Mar 15 16:41:35 2017 - [info] MHA::MasterFailover version 0.56.Wed Mar 15 16:41:35 2017 - [info] Starting master failover.Wed Mar 15 16:41:35 2017 - [info] Wed Mar 15 16:41:35 2017 - [info] * Phase 1: Configuration Check Phase..Wed Mar 15 16:41:35 2017 - [info] Wed Mar 15 16:41:36 2017 - [info] HealthCheck: SSH to mysql2 is reachable.Wed Mar 15 16:41:36 2017 - [info] Binlog server mysql2 is reachable.Wed Mar 15 16:41:36 2017 - [info] GTID failover mode = 1Wed Mar 15 16:41:36 2017 - [info] Dead Servers:Wed Mar 15 16:41:36 2017 - [info]   mysql1(192.168.110.131:3306)Wed Mar 15 16:41:36 2017 - [info] Checking master reachability via MySQL(double check)...Wed Mar 15 16:41:36 2017 - [info]  ok.Wed Mar 15 16:41:36 2017 - [info] Alive Servers:Wed Mar 15 16:41:36 2017 - [info]   mysql2(192.168.110.132:3306)Wed Mar 15 16:41:36 2017 - [info]   ansible(192.168.110.130:3306)Wed Mar 15 16:41:36 2017 - [info] Alive Slaves:Wed Mar 15 16:41:36 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:41:36 2017 - [info]     GTID ONWed Mar 15 16:41:36 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 16:41:36 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 16:41:36 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:41:36 2017 - [info]     GTID ONWed Mar 15 16:41:36 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Wed Mar 15 16:41:36 2017 - [info]     Not candidate for the new Master (no_master is set)Wed Mar 15 16:41:36 2017 - [info] Starting GTID based failover.Wed Mar 15 16:41:36 2017 - [info] Wed Mar 15 16:41:36 2017 - [info] ** Phase 1: Configuration Check Phase completed.Wed Mar 15 16:41:36 2017 - [info] Wed Mar 15 16:41:36 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Wed Mar 15 16:41:36 2017 - [info] Wed Mar 15 16:41:36 2017 - [info] Forcing shutdown so that applications never connect to the current master..Wed Mar 15 16:41:36 2017 - [info] Executing master IP deactivation script:Wed Mar 15 16:41:36 2017 - [info]   /usr/bin/master_ip_failover --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 --command=stopssh --ssh_user=root  IN SCRIPT TEST====/sbin/ifconfig eth0:2 down==/sbin/ifconfig eth0:2 192.168.110.100;/sbin/arping -I eth0 -c 3 -s 192.168.110.100 192.168.110.2 >/dev/null 2>&1===Disabling the VIP on old master: mysql1 Wed Mar 15 16:41:36 2017 - [info]  done.Wed Mar 15 16:41:36 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Wed Mar 15 16:41:36 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Wed Mar 15 16:41:36 2017 - [info] Wed Mar 15 16:41:36 2017 - [info] * Phase 3: Master Recovery Phase..Wed Mar 15 16:41:36 2017 - [info] Wed Mar 15 16:41:36 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Wed Mar 15 16:41:36 2017 - [info] Wed Mar 15 16:41:36 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000003:311461358Wed Mar 15 16:41:36 2017 - [info] Retrieved Gtid Set: c897448e-07c8-11e7-8af3-000c29918f29:575-1104Wed Mar 15 16:41:36 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Wed Mar 15 16:41:36 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:41:36 2017 - [info]     GTID ONWed Mar 15 16:41:36 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Wed Mar 15 16:41:36 2017 - [info]     Not candidate for the new Master (no_master is set)Wed Mar 15 16:41:36 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000003:94335516Wed Mar 15 16:41:36 2017 - [info] Retrieved Gtid Set: c897448e-07c8-11e7-8af3-000c29918f29:575-756Wed Mar 15 16:41:36 2017 - [info] Oldest slaves:Wed Mar 15 16:41:36 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:41:36 2017 - [info]     GTID ONWed Mar 15 16:41:36 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 16:41:36 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 16:41:36 2017 - [info] Wed Mar 15 16:41:36 2017 - [info] * Phase 3.3: Determining New Master Phase..Wed Mar 15 16:41:36 2017 - [info] Wed Mar 15 16:41:36 2017 - [warning]  Slave mysql2(192.168.110.132:3306) SQL Thread delays too much. Latest log file:mysql-bin.000003:311461358, Current log file:mysql-bin.000003:84722518. This server is not selected as a new master because recovery will take long time.Wed Mar 15 16:41:36 2017 - [info] Searching new master from slaves..Wed Mar 15 16:41:36 2017 - [info]  Candidate masters from the configuration file:Wed Mar 15 16:41:36 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:41:36 2017 - [info]     GTID ONWed Mar 15 16:41:36 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 16:41:36 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 16:41:36 2017 - [info]  Non-candidate masters:Wed Mar 15 16:41:36 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:41:36 2017 - [info]     GTID ONWed Mar 15 16:41:36 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 16:41:36 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 16:41:36 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 16:41:36 2017 - [info]     GTID ONWed Mar 15 16:41:36 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Wed Mar 15 16:41:36 2017 - [info]     Not candidate for the new Master (no_master is set)Wed Mar 15 16:41:36 2017 - [info]  Searching from candidate_master slaves which have received the latest relay log events..Wed Mar 15 16:41:36 2017 - [info]   Not found.Wed Mar 15 16:41:36 2017 - [info]  Searching from all candidate_master slaves..Wed Mar 15 16:41:36 2017 - [info]   Not found.Wed Mar 15 16:41:36 2017 - [info]  Searching from all slaves which have received the latest relay log events..Wed Mar 15 16:41:36 2017 - [info]   Not found.Wed Mar 15 16:41:36 2017 - [info]  Searching from all slaves..Wed Mar 15 16:41:36 2017 - [info]   Not found.Wed Mar 15 16:41:36 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln970] None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too  far behind.Wed Mar 15 16:41:36 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_manager line 65Wed Mar 15 16:41:36 2017 - [info] ----- Failover Report -----app1: MySQL Master failover mysql1(192.168.110.131:3306)Master mysql1(192.168.110.131:3306) is down!Check MHA Manager logs at ansible:/var/log/masterha/app1/manager.log for details.Started automated(non-interactive) failover.Invalidated master IP address on mysql1(192.168.110.131:3306)None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too  far behind.Got Error so couldn't continue failover from here.


2、手工failover:

脚本:

vim masterha_switch.sh

#!/bin/shconf="/etc/masterha/app1.cnf"dead_master="mysql1"new_master="mysql2"port="3306"function manual_failover(){##人工切换需要停掉manager进程/usr/bin/masterha_stop --conf=/etc/masterha/app1.cnf /usr/bin/masterha_master_switch --conf=${conf} --master_state=dead  --dead_master_host=${dead_master} --dead_master_port=${port} --new_master_host=${new_master} --new_master_port=${port} --ignore_last_failover }function manual_online_change(){##人工切换需要停掉manager进程/usr/bin/masterha_stop --conf=/etc/masterha/app1.cnf/usr/bin/masterha_master_switch --conf=${conf} --master_state=alive --new_master_host=${new_master} --new_master_port=${port} --orig_master_is_new_slave --running_updates_limit=10000 --interactive=1 < /dev/null > /var/log/masterha/app1/manual_online_change.log 2>&1 &}case "$1" inmanual_failover)manual_failover;;manual_online_change)manual_online_change;;*)echo "Usage: $0 manual_failover|manual_online_change"esac

前提条件:

1、mysql1(即主库发生故障无法连接)异常,这里直接将mysql1的数据库服务shutdown即可。

2、必须关闭MHA的manager进程。


日志如下:

Wed Mar 15 12:36:23 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Wed Mar 15 12:36:23 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Wed Mar 15 12:36:23 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Wed Mar 15 12:36:23 2017 - [info] MHA::MasterFailover version 0.56.Wed Mar 15 12:36:23 2017 - [info] Starting master failover.Wed Mar 15 12:36:23 2017 - [info] Wed Mar 15 12:36:23 2017 - [info] * Phase 1: Configuration Check Phase..Wed Mar 15 12:36:23 2017 - [info] Wed Mar 15 12:36:23 2017 - [info] HealthCheck: SSH to mysql2 is reachable.Wed Mar 15 12:36:23 2017 - [info] Binlog server mysql2 is reachable.Wed Mar 15 12:36:23 2017 - [info] GTID failover mode = 1Wed Mar 15 12:36:23 2017 - [info] Dead Servers:Wed Mar 15 12:36:23 2017 - [info]   mysql1(192.168.110.131:3306)Wed Mar 15 12:36:23 2017 - [info] Checking master reachability via MySQL(double check)...Wed Mar 15 12:36:23 2017 - [info]  ok.Wed Mar 15 12:36:23 2017 - [info] Alive Servers:Wed Mar 15 12:36:23 2017 - [info]   mysql2(192.168.110.132:3306)Wed Mar 15 12:36:23 2017 - [info]   ansible(192.168.110.130:3306)Wed Mar 15 12:36:23 2017 - [info] Alive Slaves:Wed Mar 15 12:36:23 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 12:36:23 2017 - [info]     GTID ONWed Mar 15 12:36:23 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 12:36:23 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 12:36:23 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 12:36:23 2017 - [info]     GTID ONWed Mar 15 12:36:23 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Master mysql1(192.168.110.131:3306) is dead. Proceed? (yes/NO): yesWed Mar 15 12:36:29 2017 - [info] Starting GTID based failover.Wed Mar 15 12:36:29 2017 - [info] Wed Mar 15 12:36:29 2017 - [info] ** Phase 1: Configuration Check Phase completed.Wed Mar 15 12:36:29 2017 - [info] Wed Mar 15 12:36:29 2017 - [info] * Phase 2: Dead Master Shutdown Phase..Wed Mar 15 12:36:29 2017 - [info] Wed Mar 15 12:36:29 2017 - [info] HealthCheck: SSH to mysql1 is reachable.Wed Mar 15 12:36:30 2017 - [info] Forcing shutdown so that applications never connect to the current master..Wed Mar 15 12:36:30 2017 - [info] Executing master IP deactivation script:Wed Mar 15 12:36:30 2017 - [info]   /usr/bin/master_ip_failover --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 --command=stopssh --ssh_user=root  IN SCRIPT TEST====/sbin/ifconfig eth0:2 down==/sbin/ifconfig eth0:2 192.168.110.100;/sbin/arping -I eth0 -c 3 -s 192.168.110.100 192.168.110.2 >/dev/null 2>&1===Disabling the VIP on old master: mysql1 Wed Mar 15 12:36:30 2017 - [info]  done.Wed Mar 15 12:36:30 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Wed Mar 15 12:36:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.Wed Mar 15 12:36:30 2017 - [info] Wed Mar 15 12:36:30 2017 - [info] * Phase 3: Master Recovery Phase..Wed Mar 15 12:36:30 2017 - [info] Wed Mar 15 12:36:30 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..Wed Mar 15 12:36:30 2017 - [info] Wed Mar 15 12:36:30 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:191Wed Mar 15 12:36:30 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):Wed Mar 15 12:36:30 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 12:36:30 2017 - [info]     GTID ONWed Mar 15 12:36:30 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 12:36:30 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 12:36:30 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 12:36:30 2017 - [info]     GTID ONWed Mar 15 12:36:30 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Wed Mar 15 12:36:30 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:191Wed Mar 15 12:36:30 2017 - [info] Oldest slaves:Wed Mar 15 12:36:30 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 12:36:30 2017 - [info]     GTID ONWed Mar 15 12:36:30 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 12:36:30 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 12:36:30 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 12:36:30 2017 - [info]     GTID ONWed Mar 15 12:36:30 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)Wed Mar 15 12:36:30 2017 - [info] Wed Mar 15 12:36:30 2017 - [info] * Phase 3.3: Determining New Master Phase..Wed Mar 15 12:36:30 2017 - [info] Wed Mar 15 12:36:30 2017 - [info] mysql2 can be new master.Wed Mar 15 12:36:30 2017 - [info] New master is mysql2(192.168.110.132:3306)Wed Mar 15 12:36:30 2017 - [info] Starting master failover..Wed Mar 15 12:36:30 2017 - [info] From:mysql1(192.168.110.131:3306) (current master) +--mysql2(192.168.110.132:3306) +--ansible(192.168.110.130:3306)To:mysql2(192.168.110.132:3306) (new master) +--ansible(192.168.110.130:3306)Starting master switch from mysql1(192.168.110.131:3306) to mysql2(192.168.110.132:3306)? (yes/NO): yesWed Mar 15 12:36:35 2017 - [info] New master decided manually is mysql2(192.168.110.132:3306)Wed Mar 15 12:36:35 2017 - [info] Wed Mar 15 12:36:35 2017 - [info] * Phase 3.3: New Master Recovery Phase..Wed Mar 15 12:36:35 2017 - [info] Wed Mar 15 12:36:35 2017 - [info]  Waiting all logs to be applied.. Wed Mar 15 12:36:35 2017 - [info]   done.Wed Mar 15 12:36:35 2017 - [info] -- Saving binlog from host mysql2 started, pid: 43271Wed Mar 15 12:36:36 2017 - [info] Wed Mar 15 12:36:36 2017 - [info] Log messages from mysql2 ...Wed Mar 15 12:36:36 2017 - [info] Wed Mar 15 12:36:35 2017 - [info] Fetching binary logs from binlog server mysql2..Wed Mar 15 12:36:35 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file=mysql-bin.000002  --start_pos=191 --output_file=/tmp/saved_binlog_binlog1_20170315123623.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.6.34-log  --binlog_dir=/tmp/binlog   Creating /tmp if not exists..    ok. Concat binary/relay logs from mysql-bin.000002 pos 191 to mysql-bin.000015 EOF into /tmp/saved_binlog_binlog1_20170315123623.binlog ..No additional binlog events found.Event not exists.Wed Mar 15 12:36:36 2017 - [info] Additional events were not found from the binlog server. No need to save.Wed Mar 15 12:36:36 2017 - [info] End of log messages from mysql2.Wed Mar 15 12:36:36 2017 - [info] No binlog events found from mysql2. SkippingWed Mar 15 12:36:36 2017 - [info] Getting new master's binlog name and position..Wed Mar 15 12:36:36 2017 - [info]  mysql-bin.000001:320182589Wed Mar 15 12:36:36 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='mysql2 or 192.168.110.132', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Wed Mar 15 12:36:36 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 320182589, c897448e-07c8-11e7-8af3-000c29918f29:1-574Wed Mar 15 12:36:36 2017 - [info] Executing master IP activate script:Wed Mar 15 12:36:36 2017 - [info]   /usr/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 --new_master_host=mysql2 --new_master_ip=192.168.110.132 --new_master_port=3306 --new_master_user='mha' --new_master_password='mha'  Unknown option: new_master_userUnknown option: new_master_passwordIN SCRIPT TEST====/sbin/ifconfig eth0:2 down==/sbin/ifconfig eth0:2 192.168.110.100;/sbin/arping -I eth0 -c 3 -s 192.168.110.100 192.168.110.2 >/dev/null 2>&1===Enabling the VIP - 192.168.110.100 on the new master - mysql2 Wed Mar 15 12:36:39 2017 - [info]  OK.Wed Mar 15 12:36:39 2017 - [info] Setting read_only=0 on mysql2(192.168.110.132:3306)..Wed Mar 15 12:36:39 2017 - [info]  ok.Wed Mar 15 12:36:39 2017 - [info] ** Finished master recovery successfully.Wed Mar 15 12:36:39 2017 - [info] * Phase 3: Master Recovery Phase completed.Wed Mar 15 12:36:39 2017 - [info] Wed Mar 15 12:36:39 2017 - [info] * Phase 4: Slaves Recovery Phase..Wed Mar 15 12:36:39 2017 - [info] Wed Mar 15 12:36:39 2017 - [info] Wed Mar 15 12:36:39 2017 - [info] * Phase 4.1: Starting Slaves in parallel..Wed Mar 15 12:36:39 2017 - [info] Wed Mar 15 12:36:39 2017 - [info] -- Slave recovery on host ansible(192.168.110.130:3306) started, pid: 43277. Check tmp log /var/log/masterha/app1/ansible_3306_20170315123623.log if it takes time..Wed Mar 15 12:36:40 2017 - [info] Wed Mar 15 12:36:40 2017 - [info] Log messages from ansible ...Wed Mar 15 12:36:40 2017 - [info] Wed Mar 15 12:36:39 2017 - [info]  Resetting slave ansible(192.168.110.130:3306) and starting replication from the new master mysql2(192.168.110.132:3306)..Wed Mar 15 12:36:39 2017 - [info]  Executed CHANGE MASTER.Wed Mar 15 12:36:40 2017 - [info]  Slave started.Wed Mar 15 12:36:40 2017 - [info]  gtid_wait(c897448e-07c8-11e7-8af3-000c29918f29:1-574) completed on ansible(192.168.110.130:3306). Executed 0 events.Wed Mar 15 12:36:40 2017 - [info] End of log messages from ansible.Wed Mar 15 12:36:40 2017 - [info] -- Slave on host ansible(192.168.110.130:3306) started.Wed Mar 15 12:36:40 2017 - [info] All new slave servers recovered successfully.Wed Mar 15 12:36:40 2017 - [info] Wed Mar 15 12:36:40 2017 - [info] * Phase 5: New master cleanup phase..Wed Mar 15 12:36:40 2017 - [info] Wed Mar 15 12:36:40 2017 - [info] Resetting slave info on the new master..Wed Mar 15 12:36:40 2017 - [info]  mysql2: Resetting slave info succeeded.Wed Mar 15 12:36:40 2017 - [info] Master failover to mysql2(192.168.110.132:3306) completed successfully.Wed Mar 15 12:36:40 2017 - [info] ----- Failover Report -----app1: MySQL Master failover mysql1(192.168.110.131:3306) to mysql2(192.168.110.132:3306) succeededMaster mysql1(192.168.110.131:3306) is down!Check MHA Manager logs at ansible for details.Started manual(interactive) failover.Invalidated master IP address on mysql1(192.168.110.131:3306)Selected mysql2(192.168.110.132:3306) as a new master.mysql2(192.168.110.132:3306): OK: Applying all logs succeeded.mysql2(192.168.110.132:3306): OK: Activated master IP address.ansible(192.168.110.130:3306): OK: Slave started, replicating from mysql2(192.168.110.132:3306)mysql2(192.168.110.132:3306): Resetting slave info succeeded.Master failover to mysql2(192.168.110.132:3306) completed successfully

3、手工online_change:

前提条件:

1、为了保证数据的高一致性,所有slave的IO线程、SQL线程必须都在运行。

2、slave的Seconds_Behind_Master也必须小于running_updates_limit,如果不指定该参数值,默认为1s。这样的话,slave但凡有延迟的话,也无法online_change成功。

3、master端,通过show processlist输出,没有一个更新花费的时间大于running_updates_limit秒。

4、停止MHA的manager进程


日志如下:

Wed Mar 15 13:00:36 2017 - [info] MHA::MasterRotate version 0.56.Wed Mar 15 13:00:36 2017 - [info] Starting online master switch..Wed Mar 15 13:00:36 2017 - [info] Wed Mar 15 13:00:36 2017 - [info] * Phase 1: Configuration Check Phase..Wed Mar 15 13:00:36 2017 - [info] Wed Mar 15 13:00:36 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.Wed Mar 15 13:00:36 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..Wed Mar 15 13:00:36 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..Wed Mar 15 13:00:36 2017 - [info] GTID failover mode = 1Wed Mar 15 13:00:36 2017 - [info] Current Alive Master: mysql1(192.168.110.131:3306)Wed Mar 15 13:00:36 2017 - [info] Alive Slaves:Wed Mar 15 13:00:36 2017 - [info]   mysql2(192.168.110.132:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 13:00:36 2017 - [info]     GTID ONWed Mar 15 13:00:36 2017 - [info]     Replicating from mysql1(192.168.110.131:3306)Wed Mar 15 13:00:36 2017 - [info]     Primary candidate for the new Master (candidate_master is set)Wed Mar 15 13:00:36 2017 - [info]   ansible(192.168.110.130:3306)  Version=5.6.34-log (oldest major version between slaves) log-bin:enabledWed Mar 15 13:00:36 2017 - [info]     GTID ONWed Mar 15 13:00:36 2017 - [info]     Replicating from 192.168.110.131(192.168.110.131:3306)It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on mysql1(192.168.110.131:3306)? (YES/no): yesWed Mar 15 13:00:38 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..Wed Mar 15 13:00:38 2017 - [info]  ok.Wed Mar 15 13:00:38 2017 - [info] Checking MHA is not monitoring or doing failover..Wed Mar 15 13:00:38 2017 - [info] Checking replication health on mysql2..Wed Mar 15 13:00:38 2017 - [info]  ok.Wed Mar 15 13:00:38 2017 - [info] Checking replication health on ansible..Wed Mar 15 13:00:38 2017 - [info]  ok.Wed Mar 15 13:00:38 2017 - [info] mysql2 can be new master.Wed Mar 15 13:00:38 2017 - [info] From:mysql1(192.168.110.131:3306) (current master) +--mysql2(192.168.110.132:3306) +--ansible(192.168.110.130:3306)To:mysql2(192.168.110.132:3306) (new master) +--ansible(192.168.110.130:3306) +--mysql1(192.168.110.131:3306)Starting master switch from mysql1(192.168.110.131:3306) to mysql2(192.168.110.132:3306)? (yes/NO): yesWed Mar 15 13:00:39 2017 - [info] Checking whether mysql2(192.168.110.132:3306) is ok for the new master..Wed Mar 15 13:00:39 2017 - [info]  ok.Wed Mar 15 13:00:39 2017 - [info] mysql1(192.168.110.131:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.Wed Mar 15 13:00:39 2017 - [info] mysql1(192.168.110.131:3306): Resetting slave pointing to the dummy host.Wed Mar 15 13:00:39 2017 - [info] ** Phase 1: Configuration Check Phase completed.Wed Mar 15 13:00:39 2017 - [info] Wed Mar 15 13:00:39 2017 - [info] * Phase 2: Rejecting updates Phase..Wed Mar 15 13:00:39 2017 - [info] Wed Mar 15 13:00:39 2017 - [info] Executing master ip online change script to disable write on the current master:Wed Mar 15 13:00:39 2017 - [info]   /usr/bin/master_ip_online_change --command=stop --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='mha' --new_master_host=mysql2 --new_master_ip=192.168.110.132 --new_master_port=3306 --new_master_user='mha' --new_master_password='mha' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slaveWed Mar 15 13:00:39 2017 797307 Set read_only on the new master.. ok.Wed Mar 15 13:00:39 2017 800510 Waiting all running 2 threads are disconnected.. (max 1500 milliseconds){'Time' => '57','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '148','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Host' => '192.168.110.130:43084'}{'Time' => '56','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '149','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Host' => '192.168.110.132:47611'}Wed Mar 15 13:00:40 2017 305276 Waiting all running 2 threads are disconnected.. (max 1000 milliseconds){'Time' => '58','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '148','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Host' => '192.168.110.130:43084'}{'Time' => '57','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '149','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Host' => '192.168.110.132:47611'}Wed Mar 15 13:00:40 2017 809931 Waiting all running 2 threads are disconnected.. (max 500 milliseconds){'Time' => '58','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '148','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Host' => '192.168.110.130:43084'}{'Time' => '57','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '149','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Host' => '192.168.110.132:47611'}Wed Mar 15 13:00:41 2017 314315 Set read_only=1 on the orig master.. ok.Wed Mar 15 13:00:41 2017 317389 Waiting all running 2 queries are disconnected.. (max 500 milliseconds){'Time' => '59','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '148','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Host' => '192.168.110.130:43084'}{'Time' => '58','Command' => 'Binlog Dump GTID','db' => undef,'Id' => '149','Info' => undef,'User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Host' => '192.168.110.132:47611'}Wed Mar 15 13:00:41 2017 819707 Killing all application threads..Wed Mar 15 13:00:41 2017 822709 done.Disabling the VIP on old master: mysql1 Wed Mar 15 13:00:42 2017 - [info]  ok.Wed Mar 15 13:00:42 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):Wed Mar 15 13:00:42 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..Wed Mar 15 13:00:42 2017 - [info]  ok.Wed Mar 15 13:00:42 2017 - [info] Orig master binlog:pos is mysql-bin.000003:191.Wed Mar 15 13:00:42 2017 - [info]  Waiting to execute all relay logs on mysql2(192.168.110.132:3306)..Wed Mar 15 13:00:42 2017 - [info]  master_pos_wait(mysql-bin.000003:191) completed on mysql2(192.168.110.132:3306). Executed 0 events.Wed Mar 15 13:00:42 2017 - [info]   done.Wed Mar 15 13:00:42 2017 - [info] Getting new master's binlog name and position..Wed Mar 15 13:00:42 2017 - [info]  mysql-bin.000001:320182589Wed Mar 15 13:00:42 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='mysql2 or 192.168.110.132', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';Wed Mar 15 13:00:42 2017 - [info] Executing master ip online change script to allow write on the new master:Wed Mar 15 13:00:42 2017 - [info]   /usr/bin/master_ip_online_change --command=start --orig_master_host=mysql1 --orig_master_ip=192.168.110.131 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='mha' --new_master_host=mysql2 --new_master_ip=192.168.110.132 --new_master_port=3306 --new_master_user='mha' --new_master_password='mha' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slaveWed Mar 15 13:00:42 2017 202426 Set read_only=0 on the new master.Enabling the VIP - 192.168.110.100 on the new master - mysql2 Wed Mar 15 13:00:45 2017 - [info]  ok.Wed Mar 15 13:00:45 2017 - [info] Wed Mar 15 13:00:45 2017 - [info] * Switching slaves in parallel..Wed Mar 15 13:00:45 2017 - [info] Wed Mar 15 13:00:45 2017 - [info] -- Slave switch on host ansible(192.168.110.130:3306) started, pid: 43378Wed Mar 15 13:00:45 2017 - [info] Wed Mar 15 13:00:46 2017 - [info] Log messages from ansible ...Wed Mar 15 13:00:46 2017 - [info] Wed Mar 15 13:00:45 2017 - [info]  Waiting to execute all relay logs on ansible(192.168.110.130:3306)..Wed Mar 15 13:00:45 2017 - [info]  master_pos_wait(mysql-bin.000003:191) completed on ansible(192.168.110.130:3306). Executed 0 events.Wed Mar 15 13:00:45 2017 - [info]   done.Wed Mar 15 13:00:45 2017 - [info]  Resetting slave ansible(192.168.110.130:3306) and starting replication from the new master mysql2(192.168.110.132:3306)..Wed Mar 15 13:00:45 2017 - [info]  Executed CHANGE MASTER.Wed Mar 15 13:00:46 2017 - [info]  Slave started.Wed Mar 15 13:00:46 2017 - [info] End of log messages from ansible ...Wed Mar 15 13:00:46 2017 - [info] Wed Mar 15 13:00:46 2017 - [info] -- Slave switch on host ansible(192.168.110.130:3306) succeeded.Wed Mar 15 13:00:46 2017 - [info] Unlocking all tables on the orig master:Wed Mar 15 13:00:46 2017 - [info] Executing UNLOCK TABLES..Wed Mar 15 13:00:46 2017 - [info]  ok.Wed Mar 15 13:00:46 2017 - [info] Starting orig master as a new slave..Wed Mar 15 13:00:46 2017 - [info]  Resetting slave mysql1(192.168.110.131:3306) and starting replication from the new master mysql2(192.168.110.132:3306)..Wed Mar 15 13:00:46 2017 - [info]  Executed CHANGE MASTER.Wed Mar 15 13:00:47 2017 - [info]  Slave started.Wed Mar 15 13:00:47 2017 - [info] All new slave servers switched successfully.Wed Mar 15 13:00:47 2017 - [info] Wed Mar 15 13:00:47 2017 - [info] * Phase 5: New master cleanup phase..Wed Mar 15 13:00:47 2017 - [info] Wed Mar 15 13:00:47 2017 - [info]  mysql2: Resetting slave info succeeded.Wed Mar 15 13:00:47 2017 - [info] Switching master to mysql2(192.168.110.132:3306) completed successfully.


注:这里的手工online_change采用--interactive=1的参数,使用交互的方式进行online切换,如果嫌麻烦,也可以设置该参数为0,采用非交互式方式进行online切换。



配置 数据 脚本 进程 切换 日志 延迟 数据库 状态 一致 手工 线程 检测 管理 架构 参数 情况 方式 时间 目录 数据库的安全要保护哪些东西 数据库安全各自的含义是什么 生产安全数据库录入 数据库的安全性及管理 数据库安全策略包含哪些 海淀数据库安全审计系统 建立农村房屋安全信息数据库 易用的数据库客户端支持安全管理 连接数据库失败ssl安全错误 数据库的锁怎样保障安全 上网代理服务器添加端口 win7的代理服务器在哪里 数据库中is 1是什么意思 网络安全的手抄报简单手法 智能化软件开发属于技术开发吗 成都软件开发利弊 oracle数据库的减集 网络安全追责的主要方式 数据库怎么设置自动增长代码 美国网络安全问题 我的世界地球ol服务器怎么进入 数据库能干什么 北京软件开发人员外派 镇江软件开发培训哪个机构好 软件开发企业交什么税 长宁区会计网络技术欢迎咨询 利用网络技术侦查手段 外汇管理局网络安全技术体系 vps跟云服务器 麻将游戏软件开发中心 收银软件服务器要用什么嫁接 我的世界粘土服务器 网络安全威 深圳最好的外贸数据库网址 长春工业大学网络安全专业 运城财经学校计算机网络技术宿舍 图书管理系统数据库运行维护 网络安全西安院士 梦幻西游最早开哪个服务器 天玥数据库安全审计系统配置
0