pt-online-schema-change的bug
mysql5.6和mysql5.7对online DDL做了大幅度功能增强,但是仍然存在主库执行DDL,从库存在大幅延迟的情况,故目前生产环境还是通过pt-online-schema-change工具来实现online DDL。但是pt-online-schema-change的使用是否就没有限制呢?
先看看官方文档对pt-online-schema-change的工作原理的描述:
pt-online-schema-change works by creating an empty copy of the table to alter, modifying it as desired, and thencopying rows from the original table into the new table. When the copy is complete, it moves away the original tableand replaces it with the new one. By default, it also drops the original table. The data copy process is performed in small chunks of data, which are varied to attempt to make them execute ina specific amount of time (see --chunk-time). This process is very similar to how other tools, such as pt-tablechecksum,work. Any modifications to data in the original tables during the copy will be reflected in the new table,because the tool creates triggers on the original table to update the corresponding rows in the new table. The use oftriggers means that the tool will not work if any triggers are already defined on the table. When the tool finishes copying data into the new table, it uses an atomic RENAME TABLE operation
接下来通过实验的方式看看pt-online-schema-change是如何工作的,记得打开mysql的general log。通过查看general日志验证pt-online-schema-change的工作机理。
shell>pt-online-schema-change -u linzj -h 192.168.110.131 -p linzj --alter='add column vid3 int' --execute D=sbtest,t=sbtest
1 创建一个和你要执行 alter 操作的表一样的空表结构:
11 Query CREATE TABLE `sbtest`.`_sbtest_new` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', `vid` int(11) DEFAULT NULL, `vid2` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `k` (`k`)) ENGINE=InnoDB AUTO_INCREMENT=4294967295 DEFAULT CHARSET=utf8
2、执行表结构修改
170407 15:45:46 11 Query ALTER TABLE `sbtest`.`_sbtest_new` add column vid3 int
3、在原表上创建触发器,如果表中已经定义了触发器这个工具就不能工作了。
11 Query CREATE TRIGGER `pt_osc_sbtest_sbtest_del` AFTER DELETE ON `sbtest`.`sbtest` FOR EACH ROW DELETE IGNORE FROM `sbtest`.`_sbtest_new` WHERE `sbtest`.`_sbtest_new`.`id` <=> OLD.`id` 11 Query CREATE TRIGGER `pt_osc_sbtest_sbtest_upd` AFTER UPDATE ON `sbtest`.`sbtest` FOR EACH ROW REPLACE INTO `sbtest`.`_sbtest_new` (`id`, `k`, `c`, `pad`, `vid`, `vid2`) VALUES (NEW.`id`, NEW.`k`, NEW.`c`, NEW.`pad`, NEW.`vid`, NEW.`vid2`) 11 Query CREATE TRIGGER `pt_osc_sbtest_sbtest_ins` AFTER INSERT ON `sbtest`.`sbtest` FOR EACH ROW REPLACE INTO `sbtest`.`_sbtest_new` (`id`, `k`, `c`, `pad`, `vid`, `vid2`) VALUES (NEW.`id`, NEW.`k`, NEW.`c`, NEW.`pad`, NEW.`vid`, NEW.`vid2`)
4、按主键or唯一索引进行排序,分成若干chunk进行数据copy
11 Query EXPLAIN SELECT * FROM `sbtest`.`sbtest` WHERE 1=1 11 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sbtest`.`sbtest` FORCE INDEX(`PRIMARY`) ORDER BY `id` LIMIT 1 /*first lower boundary*/ 11 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sbtest`.`sbtest` FORCE INDEX (`PRIMARY`) WHERE `id` IS NOT NULL ORDER BY `id` LIMIT 1 /*key_len*/ 11 Query EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ * FROM `sbtest`.`sbtest` FORCE INDEX (`PRIMARY`) WHERE `id` >= '1' /*key_len*/ 11 Query EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sbtest`.`sbtest` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) ORDER BY `id` LIMIT 999, 2 /*next chunk boundary*/ 11 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sbtest`.`sbtest` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) ORDER BY `id` LIMIT 999, 2 /*next chunk boundary*/ 11 Query SHOW WARNINGS 11 Query SHOW GLOBAL STATUS LIKE 'Threads_running' 11 Query EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sbtest`.`sbtest` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) ORDER BY `id` LIMIT 19329, 2 /*next chunk boundary*/ 11 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `sbtest`.`sbtest` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) ORDER BY `id` LIMIT 19329, 2 /*next chunk boundary*/ 11 Query EXPLAIN SELECT `id`, `k`, `c`, `pad`, `vid`, `vid2` FROM `sbtest`.`sbtest` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) AND ((`id` <= '20330')) LOCK IN SHARE MODE /*explain pt-online-schema-change 17219 copy nibble*/ 11 Query INSERT LOW_PRIORITY IGNORE INTO `sbtest`.`_sbtest_new` (`id`, `k`, `c`, `pad`, `vid`, `vid2`) SELECT `id`, `k`, `c`, `pad`, `vid`, `vid2` FROM `sbtest`.`sbtest` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) AND ((`id` <= '20330')) LOCK IN SHARE MODE /*pt-online-schema-change 17219 copy nibble*/
5、rename表,默认删除旧表
11 Query RENAME TABLE `sbtest`.`sbtest` TO `sbtest`.`_sbtest_old`, `sbtest`.`_sbtest_new` TO `sbtest`.`sbtest` 11 Query DROP TABLE IF EXISTS `sbtest`.`_sbtest_old`
那这样的话,如果我们在使用pt-online-schema-change工具在线online DDL某个表的时候,同时对该表的主键or唯一索引字段进行DML,是否会存在异常呢?
实验场景如下:
第一个窗口:
shell>pt-online-schema-change -u linzj -h 192.168.110.131 -p linzj --alter='add column vid3 int' --execute D=sbtest,t=sbtestFound 2 slaves: mysql2 ansibleWill check slave lag on: mysql2 ansibleOperation, tries, wait: copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1Altering `sbtest`.`sbtest`...Creating new table...Created new table sbtest._sbtest_new OK.Waiting forever for new table `sbtest`.`_sbtest_new` to replicate to mysql2...Altering new table...Altered `sbtest`.`_sbtest_new` OK.2017-04-07T14:52:50 Creating triggers...2017-04-07T14:52:50 Created triggers OK.2017-04-07T14:52:50 Copying approximately 986400 rows...Copying `sbtest`.`sbtest`: 86% 00:04 remain2017-04-07T14:53:27 Copied rows OK.2017-04-07T14:53:27 Swapping tables...2017-04-07T14:53:27 Swapped original and new tables OK.2017-04-07T14:53:27 Dropping old table...2017-04-07T14:53:27 Dropped old table `sbtest`.`_sbtest_old` OK.2017-04-07T14:53:27 Dropping triggers...2017-04-07T14:53:27 Dropped triggers OK.Successfully altered `sbtest`.`sbtest`.
第二个窗口:
root@localhost:mysql3306.sock 15:44: [sbtest]>select count(*) from sbtest;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (0.17 sec)
root@localhost:mysql3306.sock 15:44: [sbtest]>update sbtest set id=9999999 where id =110; Query OK, 1 row affected (1.33 sec)Rows matched: 1 Changed: 1 Warnings: 0
root@localhost:mysql3306.sock 15:45: [sbtest]>update sbtest set id=9999998 where id =111;Query OK, 1 row affected (0.84 sec)Rows matched: 1 Changed: 1 Warnings: 0
root@localhost:mysql3306.sock 15:46: [sbtest]>update sbtest set id=9999997 where id =112;Query OK, 1 row affected (0.75 sec)Rows matched: 1 Changed: 1 Warnings: 0
root@localhost:mysql3306.sock 15:46: [sbtest]>select count(*) from sbtest;+----------+| count(*) |+----------+| 1000003 |+----------+1 row in set (0.70 sec)
root@localhost:mysql3306.sock 15:46: [sbtest]>select * from sbtest order by id desc limit 5;+---------+---+---+----------------------------------------------------+------+------+------+| id | k | c | pad | vid | vid2 | vid3 |+---------+---+---+----------------------------------------------------+------+------+------+| 9999999 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt | NULL | NULL | NULL || 9999998 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt | NULL | NULL | NULL || 9999997 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt | NULL | NULL | NULL || 1000000 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt | NULL | NULL | NULL || 999999 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt | NULL | NULL | NULL |+---------+---+---+----------------------------------------------------+------+------+------+5 rows in set (0.00 sec)
root@localhost:mysql3306.sock 15:46: [sbtest]>select * from sbtest where id in (110,111,112);+-----+---+---+----------------------------------------------------+------+------+------+| id | k | c | pad | vid | vid2 | vid3 |+-----+---+---+----------------------------------------------------+------+------+------+| 110 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt | NULL | NULL | NULL || 111 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt | NULL | NULL | NULL || 112 | 0 | | qqqqqqqqqqwwwwwwwwwweeeeeeeeeerrrrrrrrrrtttttttttt | NULL | NULL | NULL |+-----+---+---+----------------------------------------------------+------+------+------+3 rows in set (0.02 sec)
同时对表的主键or唯一索引进行修改的话,这时候就会出现新表的数据比旧表数据多的情况发现。这应该算是pt-online-schema-change工具的一个bug,为何会出现这种情况,请仔细观察下pt-online-schema-change工具在原表创建的3个触发器的定义就可以很容易发现了。
建议大家,在使用pt-online-schema-change的时候,暂停对表主键or唯一索引列的数据更新。
pt_online_schema_change典型的用法:
1)添加一列,并不真正执行
pt-online-schema-change -alter "add column c1 int" D=mydb,t=mytable -dry-run
2)更新存储引擎为InnoDB,不删除原表
pt-online-schema-change -alter "ENGINE=InnoDB" -no-drop-old-table -print -statistics -execute D=mydb,t=mytable -execute
3)复制环境下,忽略日志筛选和Slave复制延迟,删除表字段
pt-online-schema-change -no-check-replication-filters -recursion-method=none -alter "drop company_type,drop channel_code" h=192.168.10.14,P=3370,u=user1,p=pass1,D=db1,t=table1 -print -statistics -execute
4)更新被子表引用到的父表
pt-online-schema-change -alter "add newcol int" h=192.168.10.14,P=3370,u=user1,p=pass1,D=db1,t=table1 -alter-foreign-keys-method auto -print -statistics -execute
5)在我们的双主复制环境中,设定了忽略mysql库的复制,不是很在乎复制的延迟,有时有外键影响,希望尽量保留原表数据,必要时自行删除。
pt-online-schema-change -no-check-replication-filters -recursion-method=none -alter "drop newcol" h=192.168.10.14,P=3370,u=user1,p=pass1,D=db1,t=table1 -alter-foreign-keys-method auto -no-drop-old-table -print -statistics -execute