master/slave 相同server_id引起的同步失败
昨天在做MySQL SwitchOver遇到一个诡异的想象,切换前后的结构图如下:
当我把一切都切换好之后,应其他需求,重启了04上的mysql,然后show slave status\G发现报错:
Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).
查看了一下03/04的server_id才发现,原来它们的server_id相同,我们知道,master/slave的server_id是不能相同的,但是为什么在我重启之前我show slave status\G的时候没有发现报错??? 并且我特别检查了一下slave几个参数:
Master_Log_File: mysql-bin.000001Read_Master_Log_Pos: 38593Relay_Master_Log_File: mysql-bin.000001Exec_Master_Log_Pos: 38593
表示slave已经完全追上master,并且通过SQL_thread线程执行。
问题:在server_id相同的情况下,slave为什么之前没有报错呢,并且还可以继续应用master的binlog呢???
废话不多说了,测试如下:
在一个master/slave环境下,刚开始server_id不同,一切都很正常,数据也可以同步过去,我slave的server_id修改成与master的相同:
master:mysql> select @@server_id;+-------------+| @@server_id |+-------------+| 583306 |+-------------+1 row in set (0.00 sec)slave:mysql> select @@server_id;+-------------+| @@server_id |+-------------+| 593306 |+-------------+1 row in set (0.00 sec)mysql> set global server_id=583306;mysql> select @@server_id;+-------------+| @@server_id |+-------------+| 583306 |+-------------+1 row in set (0.00 sec)
然后show slave status\G,还没有报错;
于是我在master上insert了一条数据,观察一下slave有没有同步过去,show slave status\G 发现,position的值变化了,但是当我select这个表的时候,确没有找到刚才插入的那条数据,说明数据没有同步过去。
Master_Log_File: mysql-bin.000001Read_Master_Log_Pos: 38857Relay_Master_Log_File: mysql-bin.000001Exec_Master_Log_Pos: 38857
通过show relaylog events命令也可以看到,insert语句并没有写到relay log中,所以数据没有同步过来,这就说明当我们修改server_id之后,show slave status\G显示的结果并不是这么可靠,
mysql> SHOW RELAYLOG EVENTS in 'sht-sgmhadoopdn-02-relay-bin.000005';+-------------------------------------+------+----------------+-----------+-------------+---------------------------------------------------------------------+| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |+-------------------------------------+------+----------------+-----------+-------------+---------------------------------------------------------------------+| sht-sgmhadoopdn-02-relay-bin.000005 | 4 | Format_desc | 593306 | 123 | Server ver: 5.7.21-log, Binlog ver: 4 || sht-sgmhadoopdn-02-relay-bin.000005 | 123 | Previous_gtids | 593306 | 194 | 8b94d944-34c8-11e8-9e15-0050568211bd:1-144 || sht-sgmhadoopdn-02-relay-bin.000005 | 194 | Rotate | 583306 | 0 | mysql-bin.000001;pos=4 || sht-sgmhadoopdn-02-relay-bin.000005 | 241 | Format_desc | 583306 | 123 | Server ver: 5.7.21-log, Binlog ver: 4 || sht-sgmhadoopdn-02-relay-bin.000005 | 360 | Rotate | 0 | 407 | mysql-bin.000001;pos=154 || sht-sgmhadoopdn-02-relay-bin.000005 | 407 | Rotate | 0 | 454 | mysql-bin.000001;pos=37623 || sht-sgmhadoopdn-02-relay-bin.000005 | 454 | Gtid | 583306 | 37688 | SET @@SESSION.GTID_NEXT= '8b94d944-34c8-11e8-9e15-0050568211bd:145' || sht-sgmhadoopdn-02-relay-bin.000005 | 519 | Query | 583306 | 37807 | use `testdb`; DROP TABLE `t1` /* generated by server */ || sht-sgmhadoopdn-02-relay-bin.000005 | 638 | Gtid | 583306 | 37872 | SET @@SESSION.GTID_NEXT= '8b94d944-34c8-11e8-9e15-0050568211bd:146' || sht-sgmhadoopdn-02-relay-bin.000005 | 703 | Query | 583306 | 37964 | use `testdb`; create table t1 like t2 || sht-sgmhadoopdn-02-relay-bin.000005 | 795 | Gtid | 583306 | 38029 | SET @@SESSION.GTID_NEXT= '8b94d944-34c8-11e8-9e15-0050568211bd:147' || sht-sgmhadoopdn-02-relay-bin.000005 | 860 | Query | 583306 | 38148 | use `testdb`; DROP TABLE `t1` /* generated by server */ || sht-sgmhadoopdn-02-relay-bin.000005 | 979 | Gtid | 583306 | 38213 | SET @@SESSION.GTID_NEXT= '8b94d944-34c8-11e8-9e15-0050568211bd:148' || sht-sgmhadoopdn-02-relay-bin.000005 | 1044 | Query | 583306 | 38329 | use `testdb`; create table t1(c1 int,c2 varchar(20)) || sht-sgmhadoopdn-02-relay-bin.000005 | 1160 | Gtid | 583306 | 38394 | SET @@SESSION.GTID_NEXT= '8b94d944-34c8-11e8-9e15-0050568211bd:149' || sht-sgmhadoopdn-02-relay-bin.000005 | 1225 | Query | 583306 | 38468 | BEGIN || sht-sgmhadoopdn-02-relay-bin.000005 | 1299 | Table_map | 583306 | 38518 | table_id: 228 (testdb.t1) || sht-sgmhadoopdn-02-relay-bin.000005 | 1349 | Write_rows | 583306 | 38562 | table_id: 228 flags: STMT_END_F || sht-sgmhadoopdn-02-relay-bin.000005 | 1393 | Xid | 583306 | 38593 | COMMIT /* xid=3689 */ |+-------------------------------------+------+----------------+-----------+-------------+---------------------------------------------------------------------+
最后当我stop slave; && start slave;之后,就出现了文章一开始报错的server_id相同的错误。
总结:
(1)master/slave要保证server_id不同;
(2)当修改server_id之后,需要执行stop/start slave;最好重启mysql数据库;
(3)不要仅仅通过show slave status\G查看同步是否正确,还在查看数据是否真正的同步过去;