千家信息网

数据库中如何利用systemstate dump trace文件寻找dg terminated原因

发表于:2025-01-21 作者:千家信息网编辑
千家信息网最后更新 2025年01月21日,这篇文章主要介绍了数据库中如何利用systemstate dump trace文件寻找dg terminated原因,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面
千家信息网最后更新 2025年01月21日数据库中如何利用systemstate dump trace文件寻找dg terminated原因

这篇文章主要介绍了数据库中如何利用systemstate dump trace文件寻找dg terminated原因,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让小编带着大家一起了解一下。

收到告警,一台dataguard宕机了。软件版本为11.2.0.4.登录上来首先查看alert文件.

Fri Jan 25 11:00:39 2019

System State dumped to trace file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc

Fri Jan 25 11:00:54 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc:

Fri Jan 25 11:00:58 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:01:19 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:01:39 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:01:59 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:02:19 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:02:40 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:03:00 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:03:21 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:03:41 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:04:01 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:04:21 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:04:41 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:05:01 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:05:21 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:05:27 2019

RFS[3]: Selected log 9 for thread 1 sequence 85314 dbid -1036715693 branch 899829907

Fri Jan 25 11:05:27 2019

Media Recovery Waiting for thread 1 sequence 85314 (in transit)

Recovery of Online Redo Log: Thread 1 Group 9 Seq 85314 Reading mem 0

Mem# 0: /san/data/oradata/EIN1/standby_redo03.log

Fri Jan 25 11:05:27 2019

Archived Log entry 165577 added for thread 1 sequence 85313 ID 0xc2352453 dest 1:

Fri Jan 25 11:05:41 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:06:01 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:06:21 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:06:41 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25 11:06:45 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_lgwr_56177.trc:

ORA-04020: deadlock detected while trying to lock object SYS.ERAC

LGWR (ospid: 56177): terminating the instance due to error 4020

Fri Jan 25 11:06:46 2019

System state dump requested by (instance=1, osid=56177 (LGWR)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_diag_56159_20190125110646.trc

可以看到持续大量的ORA-03170死锁告警,11:06:46分的时候实例被终止。在贴出的alert日志开头生成了一个trace文件,首先分析一下这个文件。

Vi 编辑/u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc文件,搜索关键字final,定位到下面这段:

SO: 0xa1cccae30, type: 4, owner: 0xa0c7c8da8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

proc=0xa0c7c8da8, name=session, file=ksu.h LINE:12729, pg=0

(session) sid: 2228 ser: 9 trans: (nil), creator: 0xa0c7c8da8

flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-

flags2: (0x40009) -/-/INC

DID: , short-term DID:

txn branch: (nil)

edition#: 100 oct: 3, prv: 0, sql: 0xa33cf5e08, psql: 0xa3e7cdec0, user: 627/CONVERT_R

ksuxds FALSE at location: 0

service name: SYS$USERS

client details:

O/S info: user: EM-H6VQXD2$, term: unknown, ospid: 1234

machine: EM-H6VQXD2 program: JDBC Thin Client

application name: JDBC Thin Client, hash value=2546894660

Current Wait Stack:

0: waiting for 'library cache lock'

handle address=0xa3ff9eb18, lock address=0x50de690a0, 100*mode+namespace=0x1004a0002

wait_id=945878 seq_num=28394 snap_id=1

wait times: snap=2 min 53 sec, exc=2 min 53 sec, total=2 min 53 sec

wait times: max=15 min 0 sec, heur=2 min 53 sec

wait counts: calls=60 os=60

in_wait=1 iflags=0x15a2

There is at least one session blocking this session.

Dumping 1 direct blocker(s):

inst: 1, sid: 970, ser: 1

Dumping final blocker:

inst: 1, sid: 2059, ser: 26985

Wait State:

fixed_waits=0 flags=0x22 boundary=(nil)/-1

Session Wait History:

elapsed time of 0.000134 sec since current wait

0: waited for 'SQL*Net message from client'

driver id=0x54435000, #bytes=0x1, =0x0

wait_id=945877 seq_num=28393 snap_id=1

wait times: snap=3.470941 sec, exc=3.470941 sec, total=3.470941 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000001 sec of elapsed time

1: waited for 'SQL*Net message to client'

driver id=0x54435000, #bytes=0x1, =0x0

wait_id=945876 seq_num=28392 snap_id=1

wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec

wait times: max=infinite

wait counts: calls=0 os=0

occurred after 0.000048 sec of elapsed time

可以看到最源头的blocker是sid: 2059, ser: 26985这个session。

这里的SO是0xa1cccae30, type: 4说明是会话信息, owner: 0xa0c7c8da8是这个SO(0xa1cccae30)的父节点。

搜索这个位置所在的PROCESS进程,发现正是LGWR进程

PROCESS 17: LGWR

然后直接搜索sid: 2059 ser: 26985

SO: 0xa18befcd0, type: 4, owner: 0xa18786ed8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

proc=0xa18786ed8, name=session, file=ksu.h LINE:12729, pg=0

(session) sid: 2059 ser: 26985 trans: 0xa1b361290, creator: 0xa18786ed8

flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-

flags2: (0x40009) -/-/INC

DID: , short-term DID:

txn branch: 0xa0f20f998

edition#: 100 oct: 3, prv: 0, sql: 0xa345bb8d0, psql: 0xa2fe3c0a8, user: 510/READER

ksuxds FALSE at location: 0

service name: SYS$USERS

client details:

O/S info: user: Administrator, term: ADMIN-PC, ospid: 8960:8832

machine: WORKGROUP\ADMIN-PC program: plsqldev.exe

application name: PL/SQL Developer, hash value=1190136663

action name: SQL 窗口 - ISIN香港净值.sql, hash value=506963986

Current Wait Stack:

0: waiting for 'SQL*Net message from dblink'

driver id=0x28444553, #bytes=0x1, =0x0

wait_id=474 seq_num=475 snap_id=1

wait times: snap=4 min 45 sec, exc=4 min 45 sec, total=4 min 45 sec

wait times: max=infinite, heur=4 min 45 sec

wait counts: calls=0 os=0

in_wait=1 iflags=0x5a0

There are 28 sessions blocked by this session.

可以看到是reader用户从一台Windows的PL/SQL developer发起的一段sql文本导致的。而且这段sql使用了db_link。已经持续了4分45秒,接近5分钟的时间。该会话阻塞了28个会话。

目前为止完整的阻塞情况可能并没有全部得到,但是已经得到了LGWR进程被一个PL/SQL developer会话所阻塞的结论,在sql文本执行了接近五分钟的时候,实例被LGWR终止。

这里为什么是五分钟呢?这是因为之前发生过的其他故障,那时修改了一个隐含参数_adg_parselock_timeout

SQL> col name for a30

SQL> col value for a20

SQL> col description for a70

SQL> set line 150

SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description

2 from x$ksppi a,x$ksppcv b

3 where a.inst_id = USERENV ('Instance')

4 and b.inst_id = USERENV ('Instance')

5 and a.indx = b.indx

6 and upper(a.ksppinm) LIKE upper('%adg_parselock_timeout%')

7 order by name;

NAME VALUE DESCRIPTION

------------------------------ -------------------- ----------------------------------------------------------------------

_adg_parselock_timeout 30000 timeout for parselock get on ADG in centiseconds

_adg_parselock_timeout_sleep 100 sleep duration after a parselock timeout on ADG in milliseconds

SQL> col name for a30;

SQL> col value for a10;

SQL> select * from

2 (select

3 x.ksppinm name,

4 y.ksppstvl value,

5 y.ksppstdf isdefault,

6 decode(bitand(y.ksppstvf,7),1,'MODIFIED',4,'SYSTEM_MOD','FALSE') ismod,

7 decode(bitand(y.ksppstvf,2),2,'TRUE','FALSE') isadj

8 from

9 sys.x$ksppi x,

10 sys.x$ksppcv y

11 where

12 x.inst_id = userenv('Instance') and

13 y.inst_id = userenv('Instance') and

14 x.indx = y.indx

15 order by

16 translate(x.ksppinm, ' _', ' ')) T where T.name like '%adg_parselock_timeout%';

NAME VALUE ISDEFAULT ISMOD ISADJ

------------------------------ ---------- --------------------------- ------------------------------ ---------------

_adg_parselock_timeout 30000 FALSE FALSE FALSE

_adg_parselock_timeout_sleep 100 TRUE FALSE FALSE

_adg_parselock_timeout该值设置为了30000厘秒,约等于300秒,也就是五分钟。

搜索MOS后确认为BUG

Bug 18515268 ORA-4020 in ADG Standby Database causing instance crash by LGWR

Solution

Apply interim patch 18515268, if available for your platform and Oracle version.

Fixed releases: 11.2.0.4.BP20 12.1.0.2 12.2.0.1

感谢你能够认真阅读完这篇文章,希望小编分享的"数据库中如何利用systemstate dump trace文件寻找dg terminated原因"这篇文章对大家有帮助,同时也希望大家多多支持,关注行业资讯频道,更多相关知识等着你来学习!

0