千家信息网

PostgreSQL DBA(32) - HA#1(pg_rewind切换)

发表于:2025-01-23 作者:千家信息网编辑
千家信息网最后更新 2025年01月23日,基于streaming replication搭建的PostgreSQL HA环境,如出现网络访问/硬件故障等原因导致Standby节点升级为Master节点,但旧Master节点数据库并未损坏,在排
千家信息网最后更新 2025年01月23日PostgreSQL DBA(32) - HA#1(pg_rewind切换)

基于streaming replication搭建的PostgreSQL HA环境,如出现网络访问/硬件故障等原因导致Standby节点升级为Master节点,但旧Master节点数据库并未损坏,在排除故障后旧Master节点是否不需要通过备份方式重新搭建成为Standby节点呢?答案是肯定的,PG提供了pg_rewind这个工具实现.

原理
在PostgreSQL HA环境中,Standby节点升级为Master节点后,时间线会切换为新的时间线,比如从1变为2.而旧Master节点的时间线仍然为原来的时间线,比如仍为1,那么使用pg_rewind工具,旧Master节点如何从新Master节点读取相关的数据成为新的Standby节点?
简单来说,有以下几步:
1.确定新Master和旧Master数据一致性的Checkpoint位置.在该位置上,新Master和旧Master数据完全一致.这可以通过读取新旧Master节点时间线历史文件可以获得,该文件位于$PGDATA/pg_wal/目录下,文件名称为XX.history
2.旧Master节点根据上一步获取的Checkpoint读取本机日志文件WAL Record,获取在此Checkpoint之后出现变化的Block,并以链表的方式存储Block编号等信息
3.根据第2步获取的Block信息从新Master节点拷贝相应的Block,替换旧Master节点相应的Block
4.拷贝新Master节点上除数据文件外的所有其他文件,包括配置文件等(如果拷贝数据文件,与备份方式搭建区别不大)
5.旧Master启动数据库,应用从Checkpoint开始后的WAL Record.

示例
考虑两节点的HA环境,旧Master节点IP为192.168.26.25,Standby节点(新Master)IP为192.168.26.26,模拟主库宕机,备库升级为主库,然后通过pg_rewind把旧Master节点切换为Standby节点.
切换前,26.25节点相关信息

testdb=# select * from pg_stat_replication;-[ RECORD 1 ]----+------------------------------pid              | 1537usesysid         | 90113usename          | replicatorapplication_name | standby_26client_addr      | 192.168.26.26client_hostname  | client_port      | 53164backend_start    | 2019-03-27 15:19:09.254987+08backend_xmin     | 654state            | streamingsent_lsn         | 0/6B000060write_lsn        | 0/6B000060flush_lsn        | 0/6B000060replay_lsn       | 0/6B000060write_lag        | 00:00:00.156457flush_lag        | 00:00:00.158792replay_lag       | 00:00:00.158815sync_priority    | 0sync_state       | asynctestdb=# testdb=# select pg_is_in_recovery();-[ RECORD 1 ]-----+--pg_is_in_recovery | f

插入测试数据

testdb=# create table t_oldmaster(id int);CREATE TABLEtestdb=# insert into t_oldmaster select generate_series(1,1000000);INSERT 0 1000000testdb=#

切换前26.26节点信息,数据已复制,角色为Standby

testdb=# select count(*) from t_oldmaster;  count  --------- 1000000(1 row)testdb=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t(1 row)

模拟Master节点宕机

[xdb@localhost testdb]$ ps -ef|grep postgresxdb       1353     1  0 14:46 pts/1    00:00:00 /appdb/xdb/pg11.2/bin/postgresxdb       1354  1353  0 14:46 ?        00:00:00 postgres: logger   xdb       1356  1353  0 14:46 ?        00:00:00 postgres: checkpointer   xdb       1357  1353  0 14:46 ?        00:00:00 postgres: background writer   xdb       1358  1353  0 14:46 ?        00:00:00 postgres: walwriter   xdb       1359  1353  0 14:46 ?        00:00:00 postgres: autovacuum launcher  xdb       1360  1353  0 14:46 ?        00:00:00 postgres: archiver   last was 00000010000000000000006Dxdb       1361  1353  0 14:46 ?        00:00:00 postgres: stats collector   xdb       1362  1353  0 14:46 ?        00:00:00 postgres: logical replication launcher  xdb       1418  1353  0 15:11 ?        00:00:02 postgres: xdb testdb [local] idlexdb       1537  1353  0 15:19 ?        00:00:00 postgres: walsender replicator 192.168.26.26(53164) streaming 0/6ED4FDA8xdb       1555  1317  0 15:25 pts/1    00:00:00 grep --color=auto postgres[xdb@localhost testdb]$ kill -9 1353[xdb@localhost testdb]$ ps -ef|grep postgresxdb       1557  1317  0 15:25 pts/1    00:00:00 grep --color=auto postgres[xdb@localhost testdb]$

Standby节点执行切换
编辑recovery.conf文件,添加trigger_file和recovery_target_timeline

[xdb@localhost testdb]$ cat recovery.conf standby_mode = 'on'primary_conninfo = 'host=192.168.26.25 port=5432 user=replicator password=replicator application_name=standby_26'restore_command = 'cp /data/archivelog/%f %p'archive_cleanup_command = 'pg_archivecleanup /data/archivelog %r'trigger_file = '/tmp/trigger_file'recovery_target_timeline = 'latest'#生成trigger文件[xdb@localhost testdb]$ touch /tmp/trigger_file

重启Standby节点

[xdb@localhost testdb]$ pg_ctl restartpg_ctl: PID file "/data/pgsql/testdb/postmaster.pid" does not existIs server running?trying to start server anywaywaiting for server to start....2019-03-27 15:33:28.218 CST [1901] LOG:  listening on IPv4 address "0.0.0.0", port 54322019-03-27 15:33:28.219 CST [1901] LOG:  listening on IPv6 address "::", port 54322019-03-27 15:33:28.223 CST [1901] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"2019-03-27 15:33:28.253 CST [1901] LOG:  redirecting log output to logging collector process2019-03-27 15:33:28.253 CST [1901] HINT:  Future log output will appear in directory "pg_log".. doneserver started[xdb@localhost testdb]$

日志输出

2019-03-27 15:33:28.259 CST,,,1903,,5c9b2748.76f,2,,2019-03-27 15:33:28 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""2019-03-27 15:33:28.264 CST,,,1903,,5c9b2748.76f,3,,2019-03-27 15:33:28 CST,1/0,0,LOG,00000,"redo starts at 0/6B0164C0",,,,,,,,,""2019-03-27 15:33:29.416 CST,,,1903,,5c9b2748.76f,4,,2019-03-27 15:33:28 CST,1/0,0,LOG,00000,"consistent recovery state reached at 0/6ED4FDA8",,,,,,,,,""2019-03-27 15:33:29.416 CST,,,1903,,5c9b2748.76f,5,,2019-03-27 15:33:28 CST,1/0,0,LOG,00000,"invalid record length at 0/6ED4FDA8: wanted 24, got 0",,,,,,,,,""2019-03-27 15:33:29.416 CST,,,1903,,5c9b2748.76f,6,,2019-03-27 15:33:28 CST,1/0,0,LOG,00000,"trigger file found: /tmp/trigger_file",,,,,,,,,""2019-03-27 15:33:29.416 CST,,,1903,,5c9b2748.76f,7,,2019-03-27 15:33:28 CST,1/0,0,LOG,00000,"redo done at 0/6ED4FD70",,,,,,,,,""2019-03-27 15:33:29.416 CST,,,1903,,5c9b2748.76f,8,,2019-03-27 15:33:28 CST,1/0,0,LOG,00000,"last completed transaction was at log time 2019-03-27 15:21:57.519456+08",,,,,,,,,""2019-03-27 15:33:29.416 CST,,,1901,,5c9b2748.76d,2,,2019-03-27 15:33:28 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""2019-03-27 15:33:29.421 CST,,,1903,,5c9b2748.76f,9,,2019-03-27 15:33:28 CST,1/0,0,LOG,00000,"selected new timeline ID: 17",,,,,,,,,""

Standby节点已升级为新Master节点
recovery.conf 文件重命名为 recovery.done
/tmp/trigger_file文件被删除
pg_is_in_recovery返回false

[xdb@localhost testdb]$ cat recovery.done standby_mode = 'on'primary_conninfo = 'host=192.168.26.25 port=5432 user=replicator password=replicator application_name=standby_26'restore_command = 'cp /data/archivelog/%f %p'archive_cleanup_command = 'pg_archivecleanup /data/archivelog %r'trigger_file = '/tmp/trigger_file'recovery_target_timeline = 'latest'[root@localhost testdb]# ls -l /tmp/trigger_filels: cannot access /tmp/trigger_file: No such file or directory[xdb@localhost testdb]$ psql -d testdbpsql (11.2)Type "help" for help.testdb=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f(1 row)

在pg_wal目录下生成了时间线history文件

[xdb@localhost testdb]$ ls ./pg_wal/00000010000000000000006E.partial  00000011000000000000006F  00000011000000000000007200000010.history                  000000110000000000000070  00000011.history00000011000000000000006E          000000110000000000000071  archive_status[xdb@localhost testdb]$ cat ./pg_wal/00000011.history 7    0/27000000    no recovery target specified8    0/2A0012E8    no recovery target specified12    0/32000000    no recovery target specified14    0/39A63BD0    no recovery target specified15    0/40A63B08    no recovery target specified16    0/6ED4FDA8    no recovery target specified[xdb@localhost testdb]$ cat ./pg_wal/00000010.history 7    0/27000000    no recovery target specified8    0/2A0012E8    no recovery target specified12    0/32000000    no recovery target specified14    0/39A63BD0    no recovery target specified15    0/40A63B08    no recovery target specified

生成测试数据

testdb=# create table t_new_master(id int);CREATE TABLEtestdb=# insert into t_new_master select generate_series(1,1000000);INSERT 0 1000000testdb=#

旧Master节点降级为Standby节点
重新启动旧master节点,插入测试数据,用于模拟时间线分叉后的数据变动

[xdb@localhost testdb]$ pg_ctl startpg_ctl: another server might be running; trying to start server anywaywaiting for server to start....2019-03-27 15:39:07.989 CST [1561] LOG:  listening on IPv4 address "0.0.0.0", port 54322019-03-27 15:39:07.989 CST [1561] LOG:  listening on IPv6 address "::", port 54322019-03-27 15:39:07.991 CST [1561] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"2019-03-27 15:39:08.015 CST [1561] LOG:  redirecting log output to logging collector process2019-03-27 15:39:08.015 CST [1561] HINT:  Future log output will appear in directory "pg_log".. doneserver started[xdb@localhost testdb]$ psql -d testdbpsql (11.2)Type "help" for help.testdb=# testdb=# create table t_fork(id int);CREATE TABLEtestdb=# insert into t_fork select generate_series(1,100000);INSERT 0 100000testdb=# testdb=# \q[xdb@localhost testdb]$

关闭旧主库,执行pg_rewind

[xdb@localhost testdb]$ pg_ctl stopwaiting for server to shut down.... doneserver stopped[xdb@localhost testdb]$ pg_rewind --target-pgdata=$PGDATA --source-server="host=192.168.26.26 port=5432 dbname=testdb" --progress --debug -nconnected to serverfetched file "global/pg_control", length 8192target server needs to use either data checksums or "wal_log_hints = on"Failure, exiting

主从设置postgres.conf中的wal_log_hints参数为on,重启数据库

[xdb@localhost testdb]$ pg_ctl startwaiting for server to start....2019-03-27 15:54:56.038 CST [1603] LOG:  listening on IPv4 address "0.0.0.0", port 54322019-03-27 15:54:56.038 CST [1603] LOG:  listening on IPv6 address "::", port 54322019-03-27 15:54:56.042 CST [1603] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"2019-03-27 15:54:56.079 CST [1603] LOG:  redirecting log output to logging collector process2019-03-27 15:54:56.079 CST [1603] HINT:  Future log output will appear in directory "pg_log". doneserver started[xdb@localhost testdb]$ pg_ctl stopwaiting for server to shut down.... doneserver stopped

再次执行pg_rewind

[xdb@localhost testdb]$ pg_rewind --target-pgdata=$PGDATA --source-server="host=192.168.26.26 port=5432 dbname=testdb" --progress --debug -nconnected to serverfetched file "global/pg_control", length 8192fetched file "pg_wal/00000011.history", length 261Source timeline history:Target timeline history:7: 0/0 - 0/270000008: 0/27000000 - 0/2A0012E812: 0/2A0012E8 - 0/3200000014: 0/32000000 - 0/39A63BD015: 0/39A63BD0 - 0/40A63B0816: 0/40A63B08 - 0/0servers diverged at WAL location 0/6ED4FDA8 on timeline 16could not open file "/data/pgsql/testdb/pg_wal/00000010000000000000006E": No such file or directorycould not find previous WAL record at 0/6ED4FDA8Failure, exiting[xdb@localhost testdb]$

拷贝本机和新Master节点的归档wal文件

[xdb@localhost testdb]$ scp -P 22 root@192.168.26.26:/data/archivelog/* $PGDATA/pg_wal/The authenticity of host '192.168.26.26 (192.168.26.26)' can't be established.ECDSA key fingerprint is SHA256:wT+B26a2wJvmsOz8QH2UK6QUqodwBYvDu/WMrmqZ5fU.ECDSA key fingerprint is MD5:79:63:4d:55:83:d9:d6:78:21:d1:b6:a1:e8:74:0f:b0.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added '192.168.26.26' (ECDSA) to the list of known hosts.root@192.168.26.26's password: 00000010000000000000006E.partial                                                           100%   16MB  44.1MB/s   00:00    00000011000000000000006E                                                                   100%   16MB  71.5MB/s   00:00    00000011000000000000006F                                                                   100%   16MB  47.0MB/s   00:00    000000110000000000000070                                                                   100%   16MB  26.7MB/s   00:00    000000110000000000000071                                                                   100%   16MB  26.7MB/s   00:00    000000110000000000000072                                                                   100%   16MB  31.6MB/s   00:00    000000110000000000000073                                                                   100%   16MB  37.1MB/s   00:00    00000011.history                                                                           100%  261   323.1KB/s   00:00    [xdb@localhost testdb]$ ls ./pg_wal/00000010000000000000006A.00000028.backup  000000100000000000000074  00000010.history          00000011000000000000007300000010000000000000006E.partial          000000100000000000000075  00000011000000000000006E  00000011.history000000100000000000000070                  000000100000000000000076  00000011000000000000006F  archive_status000000100000000000000071                  000000100000000000000077  000000110000000000000070  RECOVERYHISTORY000000100000000000000072                  000000100000000000000078  000000110000000000000071000000100000000000000073                  000000100000000000000079  000000110000000000000072[xdb@localhost testdb]$ [xdb@localhost testdb]$ cp /data/archivelog/20190327/00000010000000000000006* ./pg_wal

重新执行pg_rewind

[xdb@localhost testdb]$ pg_rewind --target-pgdata=$PGDATA --source-server="host=192.168.26.26 port=5432 dbname=testdb" --progress -nconnected to serverservers diverged at WAL location 0/6ED4FDA8 on timeline 16rewinding from last common checkpoint at 0/6B0164F8 on timeline 16reading source file listreading target file listreading WAL in targetneed to copy 296 MB (total source directory size is 437 MB)303659/303659 kB (100%) copiedcreating backup label and updating control filesyncing target data directoryDone!

修改配置文件recovery.conf

[xdb@localhost testdb]$ scp -P 22 root@192.168.26.26:$PGDATA/recovery.done $PGDATAroot@192.168.26.26's password: recovery.done                                                                              100%  317   312.7KB/s   00:00    [xdb@localhost testdb]$ mv recovery.done recovery.conf[xdb@localhost testdb]$ vim recovery.conf [xdb@localhost testdb]$ cat recovery.conf standby_mode = 'on'primary_conninfo = 'host=192.168.26.26 port=5432 user=replicator password=replicator application_name=standby_25'restore_command = 'cp /data/archivelog/%f %p'archive_cleanup_command = 'pg_archivecleanup /data/archivelog %r'[xdb@localhost testdb]$

重启原主库

新Master节点
日志提示有误,搭建出现问题

2019-03-27 16:22:19.800 CST,"replicator","",2360,"192.168.26.25:32974",5c9b32bb.938,1,"idle",2019-03-27 16:22:19 CST,4/0,0,ERROR,XX000,"requested starting point 0/71000000 on timeline 16 is not in this server's history","This server's history forked from timeline 16 at 0/6ED4FDA8.",,,,,,,,"standby_25"2019-03-27 16:22:24.813 CST,"replicator","",2361,"192.168.26.25:32976",5c9b32c0.939,1,"idle",2019-03-27 16:22:24 CST,4/0,0,ERROR,XX000,"requested starting point 0/71000000 on timeline 16 is not in this server's history","This server's history forked from timeline 16 at 0/6ED4FDA8.",,,,,,,,"standby_25"

作为失败的案例,供参考.

参考资料
神奇的pg_rewind

0