千家信息网

Oracle 19C Data Guard基础运维-05Failovers (GAP)

发表于:2024-11-26 作者:千家信息网编辑
千家信息网最后更新 2024年11月26日,Oracle 19C Data Guard 基础运维 -0 5Failovers (GAP)原主库原备库Failovers新主库独立库192.168.31.90192.168.31.100192.16
千家信息网最后更新 2024年11月26日Oracle 19C Data Guard基础运维-05Failovers (GAP)

Oracle 19C Data Guard 基础运维 -0 5Failovers (GAP)

原主库

原备库

Failovers

新主库

独立库

192.168.31.90

192.168.31.100

192.168.31.100

192.168.31.90

cjcdb

chendb

chendb

cjcdb

Failover

https://docs.oracle.com/en/database/oracle/oracle-database/19/sbydb/data-guard-concepts-and-administration.pdf

Figure 9-4 Failover to a Standby Database

Performing a Failover to a Physical Standby Database

关于archive gap 的问题?

上一篇博客《04 Failovers疑问?》写了关于 archive gap的疑问,在实验中,我提前将备库关机,主库端插入大量数据产生 3个归档文件,并手动将最后 3个归档文件重命名,目的是不让备库获取到这三个归档文件,在启动备库,试图模拟出备库 archive gap场景,但是在备库端 v$archive_gap中显示空的,备库没有检测出 archive gap的存在吗?

实际上是本人对archive gap概念存在一些误解,比如主库有 1100个归档,我认为只要有任何归档文件在备库端获取失败都会出现 archive gap,都会记录到 v$archive_gap,通过上一篇实验发现这种理论显然是不对的,我强制将主库 98,99,100三个归档文件重命名,备库端并没有出现 archive gap,即在 v$archive_gap中不会有数据。

那么究竟什么场景才会出现archive gap?真实的场景是,备库在接收主库归档文件时有部分没有接收成功,但后续的归档文件又接收成功了,比如主库 1100个归档文件,出于某种原因,备库没有接收到 97,98两个归档,但是后面的 99,100归档又能正常接收,这时就会产生 archive gap,在 v$archive_gap会查到 97,98归档信息。 (感谢墨天轮平台"你好我是李白"的答疑解惑 )

实验过程如下:

场景二:archive gap下的failover

主库模拟故障,模拟归档gap

先停掉备库: 不接收主库产生的 redo 或归档数据

SQL> shutdown immediate

主库:生成测试数据,生成redo 和归档数据

---session 1

SQL>

declare

begin

for i in 1 .. 1000 000 loop

insert into test1 values (i);

commit;

end loop;

end;

插入数据期间,生成了3 个归档文件

[oracle@cjcos01 arch]$ pwd

/arch

......

cjcpdb_arch_1_74_1030641846.arc

cjcpdb_arch_1_75_1030641846.arc

cjcpdb_arch_1_76_1030641846.arc

主库重命名新产生的前两个归档文件,模拟归档gap

[oracle@cjcos01 arch]$ mv cjcpdb_arch_1_74_1030641846.arc cjcpdb_arch_1_74_1030641846.arc.bak

[oracle@cjcos01 arch]$ mv cjcpdb_arch_1_75_1030641846.arc cjcpdb_arch_1_75_1030641846.arc.bak

再次插入部分数据

SQL>

declare

begin

for i in 1 .. 1000 0 loop

insert into test1 values (i);

commit;

end loop;

end;

启动备库:

SQL> startup

-- 备库启动时,查看对应主库日志,提示找不到 74,75 两个归档文件,无法将 74,75 发送到备库端。

2020-04-19T18:37:53.170879+08:00

Errors in file /u01/app/oracle/diag/rdbms/cjcdb/cjcdb/trace/cjcdb_tt00_2349.trc:

ORA-00308: cannot open archived log '/arch/cjcpdb_arch_1_75_1030641846.arc'

ORA-27037: unable to obtain file status

Linux-x86_64 Error: 2: No such file or directory

Additional information: 7

2020-04-19T18:37:53.171203+08:00

Errors in file /u01/app/oracle/diag/rdbms/cjcdb/cjcdb/trace/cjcdb_tt00_2349.trc:

ORA-00308: cannot open archived log '/arch/cjcpdb_arch_1_74_1030641846.arc'

ORA-27037: unable to obtain file status

Linux-x86_64 Error: 2: No such file or directory

Additional information: 7

备库:查看archive log ,实际应该是 74 75 ,不清楚为什么会显示 73

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

SQL> select thread#, low_sequence#, high_sequence# from v$archive_gap;

THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#

---------- ------------- --------------

1 73 75

备库:没有接收到74,75 两个归档文件

主库重命名system01.dbf 模拟数据库故障

[oracle@cjcos01 arch]$ cd /u01/app/oracle/oradata/CJCDB/

[oracle@cjcos01 CJCDB]$ mv system01.dbf system01.dbf.bak

SQL> alter system checkpoint;

SQL> shutdown abort

主库启动失败

SQL> startup

ORACLE instance started.

Total System Global Area 1375728192 bytes

Fixed Size 9134656 bytes

Variable Size 1107296256 bytes

Database Buffers 251658240 bytes

Redo Buffers 7639040 bytes

Database mounted.

ORA-01157: cannot identify/lock data file 1 - see DBWR trace file

ORA-01110: data file 1: '/u01/app/oracle/oradata/CJCDB/system01.dbf'

SQL> select open_mode from v$database;

OPEN_MODE

--------------------

MOUNTED

备库:

1. 检查 dg 恢复模式 ( 最大性能模式 )

SQL> select database_role,protection_level,protection_mode from v$database;

DATABASE_ROLE PROTECTION_LEVEL PROTECTION_MODE

---------------- -------------------- --------------------

PHYSICAL STANDBY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE

2 检查 archive_gap ,实际应该是74 75 ,不清楚为什么会显示 73

SQL> select thread#, low_sequence#, high_sequence# from v$archive_gap;

THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#

---------- ------------- --------------

1 73 75

主库:

SQL> select name from v$archived_log where thread#=1 and sequence# between 73 and 75;

NAME

--------------------------------------------------------------------------------

/arch/cjcpdb_arch_1_73_1030641846.arc

/arch/cjcpdb_arch_1_74_1030641846.arc

/arch/cjcpdb_arch_1_75_1030641846.arc

73 归档文件拷贝到备库端

[oracle@cjcos01 arch]$ scp cjcpdb_arch_1_73_1030641846.arc cjcos02:/arch

主库在mount 状态下执行 flush redo 操作

SQL> ALTER SYSTEM FLUSH REDO TO chendb;

ALTER SYSTEM FLUSH REDO TO chendb

*

ERROR at line 1:

ORA-16416: No viable switchover targets available

备库:手动注册73 号归档,也显示归档已经注册了

SQL> alter database register logfile '/arch/cjcpdb_arch_1_73_1030641846.arc';

alter database register logfile '/arch/cjcpdb_arch_1_73_1030641846.arc'

*

ERROR at line 1:

ORA-16089: archive log has already been registered

但是archive gap 还是显示有 73

SQL> select thread#, low_sequence#, high_sequence# from v$archive_gap;

THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#

---------- ------------- --------------

1 73 75

主库:将74 号归档文件名改回来

[oracle@cjcos01 arch]$ mv cjcpdb_arch_1_74_1030641846.arc.bak cjcpdb_arch_1_74_1030641846.arc

再次执行flush redo

SQL> ALTER SYSTEM FLUSH REDO TO chendb;

ALTER SYSTEM FLUSH REDO TO chendb

*

ERROR at line 1:

ORA-16416: No viable switchover targets available

查看主库日志,主库已经将74 归档发生备库端了,开始尝试读取 75 号归档文件。

如果flush redo 命令没生效,也可以将归档文件拷到备库端,手动执行注册

SQL> ALTER DATABASE REGISTER PHYSICAL LOGFILE 'filespec1';

备库:只有1 75 号归档找不到了

SQL> select thread#, low_sequence#, high_sequence# from v$archive_gap;

THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#

---------- ------------- --------------

1 75 75

备库:取消应用进程

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Database altered.

备库:由于存在archive gap ,是不允许常规的 failover

SQL> ALTER DATABASE FAILOVER TO chendb;

ALTER DATABASE FAILOVER TO chendb

*

ERROR at line 1:

ORA-00283: recovery session canceled due to errors

ORA-16171: RECOVER...FINISH not allowed due to gap for thr 1, seq 75-75

备库:加force 也不生效

SQL> ALTER DATABASE FAILOVER TO chendb force;

ALTER DATABASE FAILOVER TO chendb force

*

ERROR at line 1:

ORA-00283: recovery session canceled due to errors

ORA-16171: RECOVER...FINISH not allowed due to gap for thr 1, seq 75-75

强制failover: 在存在 archive gap 情况下,强制执行 failover ,会丢失数据,正式环境谨慎使用!!!

Perform a data loss failover.

If an error condition cannot be resolved, a failover can still be performed (with some data loss) by issuing the following SQL statement on the target standby database:

SQL> ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE;

打开数据库

SQL> ALTER DATABASE OPEN;

查看数据

SQL> select count(*) from test1;

COUNT(*)

----------

252780

test1 表丢失了 1000 000+1000- 252780 =748220 条数据。

欢迎关注我的微信公众号"IT小Chen",共同学习,共同成长!!!

0