高可用PXC
1.Percona XtraDB Cluster的搭建
安装环境:
节点1:A: 192.168.91.18
节点2:B:192.168.91.20
节点3:C:192.168.91.21
innodb引擎层实现的复制
ABC server_id要不一样
ABC:
下载软件:
wget http://www.percona.com/downloads/Percona-XtraDB-Cluster-56/Percona-XtraDB-Cluster-5.6.21-25.8/binary/tarball/Percona-XtraDB-Cluster-5.6.21-rel70.1-25.8.938.Linux.x86_64.tar.gz
安装依赖包:
yum install -y socat
yum install -y perl-DBD-MySQL.x86_64 perl-IO-Socket-SSL.noarch socat.x86_64 nc
(其中nc是一个强大的网络工具)
yum install -y http://www.percona.com/downloads/percona-release/redhat/0.1-3/percona-release-0.1-3.noarch.rpm
#安装xtrabackup备份软件:
yum list |grep percona-xtrabackup
yum install -y percona-xtrabackup.x86_64
#rpm -qa |grep percona
percona-release-0.1-3.noarch
percona-xtrabackup-2.3.7-2.el6.x86_64
ABC:
解压PXC包:
tar xf Percona-XtraDB-Cluster-5.6.21-rel70.1-25.8.938.Linux.x86_64.tar.gz
软链接:
ln -s /home/tools/Percona-XtraDB-Cluster-5.6.21-rel70.1-25.8.938.Linux.x86_64 /usr/local/mysql
创建 mysql 的用户及组
groupadd mysql
useradd -g msyql -s /sbin/nologin -d /usr/local/mysql mysql
创建启动文件:
cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysqld
创建 mysql 需要的基本目录
mkdir -p /data/mysql3306/{data,logs,tmp}
chown -R mysql:mysql *
A 配置文件:
vim /etc/my.cnf
#pxc
default_storage_engine=Innodb
#innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
wsrep_cluster_name=pxc_cluster #集群名称
wsrep_cluster_address=gcomm://192.168.91.18,192.168.91.20,192.168.91.21
wsrep_node_address=192.168.91.18
wsrep_provider=/usr/local/mysql/lib/libgalera_smm.so
#wsrep_provider_options="gcache.size = 1G;debug = yes"
wsrep_provider_options="gcache.size = 1G;"
#wsrep_sst_method=rsync
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sst:147258
B配置文件:
#pxc
default_storage_engine=Innodb
#innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
wsrep_cluster_name=pxc_cluster
wsrep_cluster_address=gcomm://192.168.91.18,192.168.91.20,192.168.91.21
wsrep_node_address=192.168.91.20
wsrep_provider=/usr/local/mysql/lib/libgalera_smm.so
#wsrep_provider_options="gcache.size = 1G;debug = yes"
wsrep_provider_options="gcache.size = 1G;"
#wsrep_sst_method=rsync
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sst:147258
C配置文件:
#pxc
default_storage_engine=Innodb
#innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
wsrep_cluster_name=pxc_cluster
wsrep_cluster_address=gcomm://192.168.91.18,192.168.91.20,192.168.91.21
wsrep_node_address=192.168.91.21
wsrep_provider=/usr/local/mysql/lib/libgalera_smm.so
#wsrep_provider_options="gcache.size = 1G;debug = yes"
wsrep_provider_options="gcache.size = 1G;"
#wsrep_sst_method=rsync
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sst:147258
ABC:
初始化:
[root@Darren1 mysql]# ./scripts/mysql_install_db
A:
第一个节点启动:
/etc/init.d/mysql bootstrap-pxc
Bootstrapping PXC (Percona XtraDB Cluster)Starting MySQL (Percona XtraDB Cluster)......... SUCCESS!
>mysql
delete from mysql.user where user!='root' or host!='localhost';
truncate mysql.db;
drop database test;
grant all on *.* to sst@localhost identified by '147258'; #创建用于xtrabackup的用户sst,密码要和my.cnf中对应
flush privileges;
BC:
启动节点二和节点三:
/etc/init.d/iptables stop
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
[root@Darren2 data]# /etc/init.d/mysqld start
Starting MySQL (Percona XtraDB Cluster).........State transfer in progress, setting sleep higher
... SUCCESS!
[root@Darren3 data]# /etc/init.d/mysqld start
ERROR! MySQL (Percona XtraDB Cluster) is not running, but lock file (/var/lock/subsys/mysql) exists
Starting MySQL (Percona XtraDB Cluster)..................State transfer in progress, setting sleep higher
... SUCCESS!
测试:
A:
root@localhost [testdb]> create database testdb;
root@localhost [testdb]>create table t1(c1 int auto_increment not null,c2 timestamp,primary key(c1));
root@localhost [testdb]>insert into t1 select 1,now();
root@localhost [testdb]>select * from testdb.t1;
+----+---------------------+
| c1 | c2 |
+----+---------------------+
| 1 | 2017-03-06 12:29:56 |
+----+---------------------+
B:
root@localhost [testdb]>select * from testdb.t1;
+----+---------------------+
| c1 | c2 |
+----+---------------------+
| 1 | 2017-03-06 12:29:56 |
+----+---------------------+
C:
root@localhost [testdb]>select * from testdb.t1;
+----+---------------------+
| c1 | c2 |
+----+---------------------+
| 1 | 2017-03-06 12:29:56 |
+----+---------------------+
关闭方式:
关闭:/etc/init.d/mysql stop
全部节点关闭后重启:
第一个节点启动的节点:/etc/init.d/mysql bootstrap-pxc
其它节点/etc/init.d/mysql start
SST和IST
State Snapshot Transfer(SST) 全量传输
发生在:新节点的加入,或者集群中节点故障(关闭)时间过长
wsrep_sst_method = xtrabackup-v2
这个参数有三个值:
(1)xtrabackup-v2
使用xtrabackup传输,需要提前创建用于备份的用户并制定参数用户名和密码:wsrep_sst_auth=sst:147258
(2)rsync:最快的传输方式,不需要指定wsrep_sst_auth参数,拷贝数据的时候read-only(flush table with read lock)
(3)mysqldump:不建议使用,数据量大的时候不行,拷贝数据的时候read-only(flush table with read lock)
Incremental state Transfer(IST) 增量传输
发生在:一个节点数据的改变,把增量的部分拷贝到另几个节点,通过一个缓存gcache控制,如果增量大于gcache会选择全量传输,再有在增量小于等于gcache时候,才会选择增量传输。
wsrep_provider_options="gcache.size = 1G"
如果去停止PXC其中的一个节点?
当 wsrep_local_state_comment 的状态是 Synced 表示三个节点之间数据同步,这样才能去停止其中一个的服务,滚动重启;
每个节点能够离线多长时间计算?
比如说想离线2h,算一下2个小时能够生成多大的binlog,对应的gcache.size就设置多大。
如一个比较繁忙的订单系统,5分钟产生200M的binog,则一个小时产生2.4G,两个小时4.8G,那么wsrep_provider_options="gcache.size = 6G",gcache是需要实际内存分配的,也不能设置太大,否则会出现oom-kill;
故障恢复后,加入集群的过程分析:
(1)如果数据量不是很大,重新初始化,搞一次SST;
(2)如果数据量很大,用rsync传输;
PXC的特点及注意事项:
(1)PCX每个节点都自动配置了自增初始值和步长,跟双主一样,这样是为了防止主键冲突;
node1:
auto_increment_offset=1
auto_incremnet_increment=3
node2:
auto_increment_offset=2
auto_incremnet_increment=3
node3:
auto_increment_offset=3
auto_incremnet_increment=3
(2)PCX集群是乐观控制,事物冲突情况可能发生在commit阶段,当多个节点修改同一行数据,只有其中一个节点能够成功,失败的节点将终止,并且返回死锁错误代码:
如:
A:
root@localhost [testdb]>begin;
root@localhost [testdb]>update t1 set c2=now() where c1=3;
B:
root@localhost [testdb]>begin;
root@localhost [testdb]>update t1 set c2=now() where c1=3;
root@localhost [testdb]>commit;
A:
出现报错deadlock:
root@localhost [testdb]>commit;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
(3)PXC只支持innodb引擎,mysql库下的表基本上都是myisam表怎么传输呢,PXC虽然不支持myisam表,但是支持DCL语句,如create user,drop user,grant,revoke等,可以通过开启参数wsrep_replicate_myisam,使pxc支持myisam表,因此当PXC出现数据不一致的时候,首先要查看是否是myisam表;
如:
node1:
root@localhost [testdb]>show create table t2\G
*************************** 1. row ***************************
Table: t2
Create Table: CREATE TABLE `t2` (
`c1` int(11) NOT NULL AUTO_INCREMENT,
`c2` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`c1`)
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8
root@localhost [testdb]>select * from t2;
+----+---------------------+
| c1 | c2 |
+----+---------------------+
| 2 | 2017-03-08 11:41:31 |
+----+---------------------+
在node2和node3节点上面都看不到,因为没有传送过来。
(4)PXC每个表必须要有主键,如果没有主键,可能造成集群中每个节点的data page里的数据不一样,select limit 可能在不同的节点产生不同的结果集;
(5)不支持表级锁 (lock table),所有的DDL操作都是实例级别的锁,需要用pt-osc工具
如:
例1:
node1:
root@localhost [testdb]>lock table t1 read;
root@localhost [testdb]>insert into t1 select 69,now();
ERROR 1099 (HY000): Table 't1' was locked with a READ lock and can't be updated
node2:节点2仍然可以插入,说明read lock没有生效
root@localhost [testdb]>insert into t1 select 69,now();
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0
例2:
node1:
root@localhost [testdb]>lock table t1 write;
root@localhost [testdb]>insert into t1 select 1,now();
Query OK, 1 row affected (0.03 sec)
Records: 1 Duplicates: 0 Warnings: 0
root@localhost [testdb]>select * from t1;
+----+---------------------+
| c1 | c2 |
+----+---------------------+
| 1 | 2017-03-08 14:59:46 |
+----+---------------------+
node2: 节点二没有受写锁影响,可以读写:
root@localhost [testdb]>insert into t1 select 2,now();
Query OK, 1 row affected (0.05 sec)
Records: 1 Duplicates: 0 Warnings: 0
root@localhost [testdb]>select * from t1;
+----+---------------------+
| c1 | c2 |
+----+---------------------+
| 1 | 2017-03-08 14:59:46 |
| 2 | 2017-03-08 14:59:57 |
+----+---------------------+
(6)不支持XA 事物
(7)query log日志存放在文件中,不能放在表里,即需要指定参数log_output=file;
(8)整个集群的性能/吞吐量由性能最差的节点决定,木桶效应;
不考虑延迟的主从复制:每秒6万insert,
考虑到延迟的主从复制:每秒3万insert,
pxc:每秒1万insert
(9)节点数量是3<=num<=8
(10)脑裂,所以至少需要三个节点,有个仲裁节点,防止脑裂;
演示脑裂:
强制干掉mysql进程:
node2:
[root@Darren1 mysql3306]# kill -9 10014
node3:
[root@Darren3 ~]# kill -9 10115
node1:
root@localhost [(none)]>use testdb;
ERROR 1047 (08S01): Unknown command
脑裂前的值:
show global status like '%wsrep%';
wsrep_local_state_comment | Synced
wsrep_cluster_status | Primary
wsrep_ready | ON
脑裂后的值:
wsrep_local_state_comment | Initialized
wsrep_cluster_status | non-Primary
wsrep_ready | OFF
重启node2或者node3会报错:
[root@Darren1 data]# /etc/init.d/mysqld start
ERROR! MySQL (Percona XtraDB Cluster) is not running, but PID file exists
解决方法:重启node1,然后再重启node2和node3