ora-00445/linux alsr
一号节点:可以看出是8:51:36秒失去和二号节点的通信
2017-10-24 08:49:49.963 [CLSECHO(5883)]CRS-10001: 24-Oct-17 08:49 AFD-9204: false2017-10-24 08:49:59.005 [CLSECHO(7786)]CRS-10001: 24-Oct-17 08:49 AFD-9204: false2017-10-24 08:51:36.138 [OCSSD(271335)]CRS-1612: Network communication with node 12crac2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.050 seconds2017-10-24 08:51:43.139 [OCSSD(271335)]CRS-1611: Network communication with node 12crac2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 7.050 seconds2017-10-24 08:51:48.140 [OCSSD(271335)]CRS-1610: Network communication with node 12crac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.050 seconds
二号节点:可以看出是在8:51:07无法检测到资源policejx的状态
2017-10-24 08:49:49.896 [CLSECHO(78374)]CRS-10001: 24-Oct-17 08:49 AFD-9204: false2017-10-24 08:49:58.902 [CLSECHO(79205)]CRS-10001: 24-Oct-17 08:49 AFD-9204: false2017-10-24 08:51:07.952 [ORAAGENT(281402)]CRS-5011: Check of resource "policejx" failed: details at "(:CLSN00007:)" in "/u01/app/grid/diag/crs/12crac2/crs/trace/crsd_oraagent_oracle.trc"2017-10-24 08:51:10.064 [ORAAGENT(83682)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 836822017-10-24 09:10:21.937 [OHASD(202039)]CRS-8500: Oracle Clusterware OHASD process is starting with operating system process ID 202039
查看crsd_oraagent_oracle.trc:在8:51:08秒检查到失到到本地实例的连接
2017-10-24 08:51:07.955381 : AGENT:3816797952: {0:13:2033} Agfw calling user exitCB, will exit on return2017-10-24 08:51:07.955390 : AGENT:3816797952: {0:13:2033} returned from user exitCB, exiting2017-10-24 08:51:07.955455 : AGFW:3816797952: {0:13:2033} Agent is exiting with exit code: 12017-10-24 08:51:08.006860 : USRTHRD:3812595456: {0:13:2} ORA-01092: ORACLE instance terminated. Disconnection forcedProcess ID: 0Session ID: 4739 Serial number: 3830
查看:ohasd_oraagent_grid.trc在8:51:09判定为实例宕掉;
2017-10-24 08:50:48.592645 :CLSDYNAM:1666320128: [ora.gipcd]{0:0:2} [check] ClsdmClient::sendMessage clsdmc_respget return: status=0, ecode=02017-10-24 08:51:09.396880 : USRTHRD:2702620416: Usrco UsrcoEventForwarder::postMyEvent posting event "INSTANCE VERSION=1.0 service=policejx database=policejx instance=policejx_1 host=12crac2 status=down reason=FAILURE timestamp=2017-10-24 08:51:07 timezone=+08:00 db_domain= "2017-10-24 08:51:09.407933 : USRTHRD:2702620416: clsnUsrco: path=/u01/app/12.1.0/grid/racg/usrco/
查看:crsd_oraagent_oracle.trc 发现为连不上实例
2017-10-24 08:51:07.952430 :CLSDYNAM:3797776128: [ora.policejx.db]{0:13:2} [check] DbAgent:checkCbk shutdown reset s_PDBStatusMap2017-10-24 08:51:07.952493 :CLSDYNAM:3797776128: [ora.policejx.db]{0:13:2} [check] InstAgent::checkState db/asm 2clsagfw_res_status 5 poolState 22017-10-24 08:51:07.952606 : USRTHRD:3797776128: {0:13:2} Gimh::destructor gimh_dest_query_ctx rc=02017-10-24 08:51:07.952802 : USRTHRD:3797776128: {0:13:2} Gimh::destructor gimh_dest_inst_ctx rc=02017-10-24 08:51:07.952827 :CLSDYNAM:3797776128: [ora.policejx.db]{0:13:2} [check] ConnectionPool::stopConnection2017-10-24 08:51:07.952850 :CLSDYNAM:3797776128: [ora.policejx.db]{0:13:2} [check] ConnectionPool::removeConnection connection count 0
查看alert_poliejx1.log:在8:41:09了现M000进程的dump,8:43:27出现进程的异常dump,8:43分MMON进程异常
Auto-tuning: Starting background process GTXiTue Oct 24 08:41:09 2017Dumping diagnostic data in directory=[cdmp_20171024084109], requested by (instance=2, osid=1165285 (M000)), summary=[incident=643254].Tue Oct 24 08:43:27 2017Tue Oct 24 08:43:27 2017System State dumped to trace file /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_ora_784542.trcSystem State dumped to trace file /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_ora_783891.trcTue Oct 24 08:43:27 2017Warning: VKTM detected a time drift.Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.Tue Oct 24 08:43:35 2017LMON (ospid: 646174) waits for event 'latch: enqueue hash chains' for 248 secs.LMON (ospid: 646174) waits for latch 'enqueue hash chains' for 248 secs.Tue Oct 24 08:43:44 2017Errors in file /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_mmon_646328.trc (incident=577523) (PDBNAME=CDB$ROOT):ORA-00445: background process "m005" did not start after 120 secondsIncident details in: /u01/app/oracle/diag/rdbms/policejx/policejx_1/incident/incdir_577523/policejx_1_mmon_646328_i577523.trcDumping diagnostic data in directory=[cdmp_20171024084803], requested by (instance=1, osid=646328 (MMON)), summary=[incident=577523].Tue Oct 24 08:48:35 2017Dumping diagnostic data in directory=[cdmp_20171024084313], requested by (instance=2, osid=1165285 (M000)), summary=[incident=643255].Tue Oct 24 08:49:47 2017Errors in file /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_ora_775205.trc (incident=586003) (PDBNAME=POLICE):ORA-00445: 后台进程 "PP9S" 在 120 秒之后仍没有启动Incident details in: /u01/app/oracle/diag/rdbms/policejx/policejx_1/incident/incdir_586003/policejx_1_ora_775205_i586003.trcTue Oct 24 08:49:52 2017Dumping diagnostic data in directory=[cdmp_20171024084952], requested by (instance=1, osid=775205), summary=[incident=586003].Tue Oct 24 08:51:06 2017Dumping diagnostic data in directory=[cdmp_20171024085106], requested by (instance=2, osid=1175981 (M004)), summary=[incident=649702].Tue Oct 24 08:51:07 2017DRM FREEZE TIMEOUT: kjfzpdrmfrz: ospid 775215 not frozen. Process waiting on 'SQL*Net message from client', 62 secs since wait started. Parallel DRM freeze timeout (70 secs) exceeded, terminating the instance. See /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_rmv6_646880.trc.USER (ospid: 646880): terminating the instance due to error 481Tue Oct 24 08:51:07 2017DRM FREEZE TIMEOUT: kjfzpdrmfrz: ospid 770029 not frozen. Process waiting on 'gc buffer busy acquire', 76 secs since wait started. Parallel DRM freeze timeout (70 secs) exceeded, terminating the instance. See /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_rmv1_646886.trc.Tue Oct 24 08:51:07 2017opiodr aborting process unknown ospid (281446) as a result of ORA-1092Tue Oct 24 08:51:08 2017
查看policejx_1_mmon_646328.trc
*** 2017-10-24 08:39:46.952loadavg : 191.78 118.51 71.78System user time: 0.17 sys time: 0.49 context switch: 88107Memory (Avail / Total) = 658.31M / 2066865.29MSwap (Avail / Total) = 127618.42M / 131072.00Mskgpgcmdout: read() for cmd /bin/ps -elf | /bin/egrep 'PID | 646150' | /bin/grep -v grep timed out after 15.000 secondsskgpgcmdout: read() for cmd /bin/cat /proc/646150/task/646150/status timed out after 0.000 secondsShort stack dump: current sql:Current Wait Stack: 0: waiting for 'os thread creation' pname=0x4d303035, is_process=0x1, =0x0 wait_id=815816 seq_num=29397 snap_id=1 wait times: snap=1 min 21 sec, exc=1 min 21 sec, total=1 min 21 sec wait times: max=infinite, heur=1 min 21 sec wait counts: calls=0 os=0 in_wait=1 iflags=0x5a0 SO: 0xcd66cbfdd0, type: 4, owner: 0xccc59956a8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0xccc59956a8, name=session, file=ksu.h LINE:13957, pg=0 conuid=1 (session) sid: 9559 ser: 50701 trans: (nil), creator: 0xccc59956a8 flags: (0x8000051) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40409) -/-/INC DID: 0002-0051-0000000C, short-term DID: 0002-0051-0000000D txn branch: (nil) con_id/con_uid/con_name: 1/1/CDB$ROOT con_logonuid: 1 con_logonid: 1 edition#: 133 user#/name: 0/SYS oct: 0, prv: 0, sql: (nil), psql: 0xc1befc1bc0 stats: 0xcaffe51108, PX stats: 0xcebc230 ksuxds FALSE at location: 0 service name: SYS$BACKGROUND Current Wait Stack: Not in wait; last wait ended 2.865306 sec ago Wait State: fixed_waits=0 flags=0x21 boundary=(nil)/-1 Session Wait History: elapsed time of 2.865336 sec since last wait 0: waited for 'latch: enqueue hash chains' address=0xcd689799d0, number=0x24, tries=0x0 wait_id=1009016 seq_num=27384 snap_id=1 wait times: snap=17.802151 sec, exc=17.802151 sec, total=17.802151 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 1 min 52 sec of elapsed time 1: waited for 'oracle thread bootstrap' pname=0x305531, =0x0, =0x0 wait_id=1009010 seq_num=27383 snap_id=5 wait times: snap=0.000000 sec, exc=1 min 16 sec, total=3 min 8 sec wait times: max=2 min 0 sec wait counts: calls=92 os=92 occurred after 0.000000 sec of elapsed time 2: waited for 'latch free' address=0x60013540, number=0x75, tries=0x0 wait_id=1009015 seq_num=27382 snap_id=1 wait times: snap=36.770088 sec, exc=36.770088 sec, total=36.770088 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time 3: waited for 'oracle thread bootstrap' pname=0x305531, =0x0, =0x0 wait_id=1009010 seq_num=27381 snap_id=4 wait times: snap=0.000147 sec, exc=1 min 16 sec, total=2 min 31 sec wait times: max=2 min 0 sec wait counts: calls=92 os=92 occurred after 0.000000 sec of elapsed time 4: waited for 'process diagnostic dump' =0x0, =0x0, =0x0 wait_id=1009014 seq_num=27380 snap_id=1 wait times: snap=31.578425 sec, exc=31.578425 sec, total=31.578425 sec wait times: max=30.000000 sec wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time 5: waited for 'oracle thread bootstrap' pname=0x305531, =0x0, =0x0 wait_id=1009010 seq_num=27379 snap_id=3 wait times: snap=0.110586 sec, exc=1 min 16 sec, total=2 min 0 sec wait times: max=2 min 0 sec wait counts: calls=92 os=92 occurred after 0.000000 sec of elapsed time 6: waited for 'process diagnostic dump' =0x0, =0x0, =0x0 wait_id=1009013 seq_num=27378 snap_id=1 wait times: snap=22.622172 sec, exc=22.622172 sec, total=22.622172 sec wait times: max=30.000000 sec wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time 7: waited for 'oracle thread bootstrap' pname=0x305531, =0x0, =0x0 wait_id=1009010 seq_num=27377 snap_id=2 wait times: snap=16.169919 sec, exc=1 min 16 sec, total=1 min 37 sec wait times: max=2 min 0 sec wait counts: calls=91 os=91 occurred after 0.000000 sec of elapsed time 8: waited for 'process diagnostic dump' =0x0, =0x0, =0x0 wait_id=1009011 seq_num=27376 snap_id=2 wait times: snap=0.009738 sec, exc=20.768290 sec, total=20.834894 sec wait times: max=30.000000 sec wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time
查看AWR:
Snap Id | Snap Time | Sessions | Cursors/Session | Instances | CDB | |
---|---|---|---|---|---|---|
Begin Snap: | 1835 | 24-Oct-17 08:00:19 | 3611 | 19.2 | 2 | YES |
End Snap: | 1836 | 24-Oct-17 09:00:56 | 4026 | 10.8 | 1 | YES |
Elapsed: | 60.62 (mins) | |||||
DB Time: | 12,323.11 (mins) |
CPU负载在80%左右;
根据AWR提供的load
Operating System Statistics - Detail
Snap Time | Load | %busy | %user | %sys | %idle | %iowait |
---|---|---|---|---|---|---|
24-Oct 10:00:20 | 26.19 | |||||
24-Oct 11:00:02 | 66.72 | 12.96 | 11.71 | 1.18 | 87.04 | 0.23 |
24-Oct 12:00:17 | 454.69 | 88.22 | 86.37 | 1.72 | 11.78 | 0.06 |
24-Oct 13:00:29 | 384.12 | 97.53 | 95.75 | 1.61 | 2.47 | 0.01 |
24-Oct 14:00:14 | 33.43 | 74.91 | 73.03 | 1.72 | 25.09 | 0.09 |
24-Oct 15:00:01 | 153.89 | 33.61 | 32.07 | 1.44 | 66.39 | 0.15 |
24-Oct 16:00:19 | 128.44 | 62.83 | 61.10 | 1.62 | 37.17 | 0.11 |
24-Oct 17:00:20 | 313.98 | 84.20 | 82.17 | 1.88 | 15.80 | 0.09 |
24-Oct 18:00:14 | 158.36 | 80.30 | 78.88 | 1.29 | 19.70 | 0.10 |
机器负载相当高;
当时了现问题,节点2的物理内存已使用完,开始用虚拟内存,同时CPU的负载也是居高不下;由于oracle无法产生的新的进程,并在此时进行了DRM操作,导致节点2重启;建议优化办法:
1 关闭linux aslr随机特性
ORA-00445: Background Process "xxxx" Did Not Start After 120 Seconds (文档 ID 1345364.1)
add/modify this parameter in /etc/sysctl.conf
kernel.randomize_va_space=0
kernel.exec-shield=0
2 关闭transparent hugepages (DOCID 1557478.1)
RHEL 6.X
vi /etc/rc.local增加:
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/transparent_hugepage/enabled fi if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/transparent_hugepage/defrag fi
检查:
grep -e AnonHugePages /proc/*/smaps | awk '{ if($2>4) print $0} ' | awk -F "/" '{print $0; system("ps -fp " $3)} '
3 启用hugepage linux 6.X默认的内存页面管理为4K,建议启用2MB的大页管理(文档ID361468.1)
相关参数:
vi /etc/security/limits.conf
/etc/security/limits.conf file. Set the value (in KB) slightly smaller than installed RAM. e.g. If you have 64GB RAM installed, you may set:* soft memlock 60397977* hard memlock 60397977
检测:ulimit -l
启用:根据SGA/页面大小=vm.nr_hugepages
设置vi /etc/sysctl.conf,增加vm.nr_hugepages=XXXXXX
参考计算脚本:hugepage_settting.sh
相关参考:http://www.oracle.com/technetwork/cn/articles/servers-storage-dev/hugepages-2099009-zhs.html
4 关闭DRM
_gc_policy_time=0
_gc_undo_policy=FALSE
这2个参数是静态参数,必须要重启实例才能生效。
或者设置成超长时间:
_gc_policy_limit=250
_gc_policy_minimum=10485760
DRM的步骤:
1. Oracle停止所有在需要进行remastering的buffer上的操作。注意:DRM是渐进的,也就是说以windows 为单位,每次对一部分的buffer 进行remastering 操作。2. Lmon 通知所有实例,准备进行remastering3. 在旧的master实例清除对应buffer的master信息4. 将master信息传递给新的master实例5. 在新的master实例构建资源的最新状态6. 结束,并释放所有之前所有步骤占用的资源。
_gc_affinity_time :单位为分钟,控制DRM统计实例访问buffer次数的时间间隔,默认为是10分钟。
_gc_affinity_ratio:控制进行remastering所需要达到的最小比例(阀值),默认为50。也就是说,如果某个实例在10分钟(_gc_policy_time)之内,访问某个数据库对象的次数大于其他所有实例50倍时(注意:是50倍,而不是50次),对该数据库对象的buffer进行remastering。
注意:10g和11g的不同,11g里改为_gc_affinity_limit改名为_gc_policy_limit;_gc_affinity_time改名为_gc_policy_time;_gc_affinity_minimun改名为_gc_policy_minimum
诊断DRM:
"gcs drm freeze in enter server mode" 等待事件:
Script to Collect DRM Information (drmdiag.sql) (文档 ID 1492990.1)
-- NAME: DRMDIAG.SQL-- -------------------------------------------------------------------------- AUTHOR: Michael Polaski - Oracle Support Services-- -------------------------------------------------------------------------- PURPOSE:-- This script is intended to provide a user friendly guide to troubleshoot-- drm (dynamic resource remastering) waits. The script will create a file-- called drmdiag_.out in your local directory.set echo offset feedback offcolumn timecol new_value timestampcolumn spool_extension new_value suffixselect to_char(sysdate,'Mondd_hh34mi') timecol,'.out' spool_extension from sys.dual;column output new_value dbnameselect value || '_' outputfrom v$parameter where name = 'db_name';spool drmdiag_&&dbname&×tamp&&suffixset trim onset trims onset lines 140set pages 100set verify offset feedback onPROMPT DRMDIAG DATA FOR &&dbname&×tampPROMPT Important paramenters:PROMPTPROMPT _gc_policy_minimum (default is 1500). Increasing this would cause DRMs to happen less frequently.PROMPT Use the "OBJECT_POLICY_STATISTICS" section later in this report to see how active various objects are.PROMPTPROMPT _gc_policy_time (default to 10 (minutes)). Amount of time to evaluate policy stats. Use thePROMPT "OBJECT_POLICY_STATISTICS" section later in this report to see how active various objects are for thePROMPT _gc_policy_time. Usually not necessary to change this parameter.PROMPTPROMPT _gc_read_mostly_locking (default is TRUE). Setting this to FALSE would disable read mostly related DRMs.PROMPTPROMPT gcs_server_processes (default is derived from CPU count/4). May need to increase this above thePROMPT default to add LMS processes to complte the work during a DRM but the default is usually adequate.PROMPTPROMPT _gc_element_percent (default is 110). May need to apply the fix for bug 14791477 and increase this toPROMPT 140 if running out of lock elements. Usually not necessary to change this parameter.PROMPTPROMPT GC Related parameters set in this instance:show parameter gcPROMPTPROMPT CPU count on this instance:show parameter cpu_countPROMPTPROMPT SGA INFO FOR &&dbname&×tampPROMPTPROMPT Larger buffer caches (above 100 gig) may increase the cost of DRMs significantly.set lines 120set pages 100column component format a40 trucolumn current_size format 99999999999999999column min_size format 99999999999999999column max_size format 99999999999999999column user_specified_size format 99999999999999999select component, current_size, min_size, max_size, user_specified_sizefrom v$sga_dynamic_componentswhere current_size > 0;PROMPTPROMPT ASH THRESHOLD...PROMPTPROMPT This will be the threshold in milliseconds for total drm freezePROMPT times. This will be used for the next queries to look for the worstPROMPT 'drm freeze' minutes. Any minutes that have an average log filePROMPT sync time greater than the threshold will be analyzed further.column threshold_in_ms new_value threshold format 999999999.999select decode(min(threshold_in_ms),null,0,min(threshold_in_ms)) threshold_in_msfrom (select inst_id, to_char(sample_time,'Mondd_hh34mi') minute,sum(time_waited)/1000 threshold_in_msfrom gv$active_session_historywhere event like '%drm freeze%'group by inst_id,to_char(sample_time,'Mondd_hh34mi')order by 3 desc)where rownum <= 10;PROMPTPROMPT ASH WORST MINUTES FOR DRM FREEZE WAITS:PROMPTPROMPT APPROACH: These are the minutes where the avg drm freeze timePROMPT was the highest (in milliseconds).column event format a30 trucolumn program format a35 trucolumn total_wait_time format 999999999999.999column avg_time_waited format 999999999999.999select to_char(sample_time,'Mondd_hh34mi') minute, inst_id, event,sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,avg(time_waited)/1000 AVG_TIME_WAITEDfrom gv$active_session_historywhere event like '%drm freeze%'group by to_char(sample_time,'Mondd_hh34mi'), inst_id, eventhaving sum(time_waited)/1000 > &&thresholdorder by 1,2;PROMPTPROMPT ASH DRM BACKGROUND PROCESS WAITS DURING WORST MINUTES:PROMPTPROMPT APPROACH: What are LMS and RMV doing when 'drm freeze' waitsPROMPT are happening? LMD and LMON info may also be relevantcolumn inst format 999column minute format a12 trucolumn event format a50 trucolumn program format a55 wraselect to_char(sample_time,'Mondd_hh34mi') minute, inst_id inst,sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,avg(time_waited)/1000 AVG_TIME_WAITED,program, eventfrom gv$active_session_historywhere to_char(sample_time,'Mondd_hh34mi') in (select to_char(sample_time,'Mondd_hh34mi')from gv$active_session_historywhere event like '%drm freeze%'group by to_char(sample_time,'Mondd_hh34mi'), inst_idhaving sum(time_waited)/1000 > &&threshold and sum(time_waited)/1000 > 0.5)and (program like '%LMS%' or program like '%RMV%' or program like '%LMD%' orprogram like '%LMON%' or event like '%drm freeze%')group by to_char(sample_time,'Mondd_hh34mi'), inst_id, program, eventorder by 1,2,3,5 desc, 4;PROMPTPROMPT POLICY HISTORY INFO:PROMPT See if you can correlate policy history events with minutes of highPROMPT wait time.select * from gv$policy_historyorder by event_date;PROMPTPROMPT DYNAMIC_REMASTER_STATSPROMPT This shows where time is spent during DRM operations.set heading offset lines 60select 'Instance: '||inst_id inst, 'Remaster Ops: '||remaster_ops rops,'Remaster Time: '||remaster_time rtime, 'Remastered Objects: '||remastered_objects robjs,'Quiesce Time: '||quiesce_time qtime, 'Freeze Time: '||freeze_time ftime,'Cleanup Time: '||cleanup_time ctime, 'Replay Time: '||replay_time rptime,'Fixwrite Time: '||fixwrite_time fwtime, 'Sync Time: '||sync_time stime,'Resources Cleaned: '||resources_cleaned rclean,'Replayed Locks Sent: '||replayed_locks_sent rlockss,'Replayed Locks Received: '||replayed_locks_received rlocksr,'Current Objects: '||current_objectsfrom gv$dynamic_remaster_statsorder by 1;set lines 120set heading onPROMPTPROMPT OBJECT_POLICY_STATISTICS:PROMPT The sum of the last 3 columns (sopens,xopens,xfers) decides whether the objectPROMPT will be considered for DRM (_gc_policy_minimum). The duration of the statsPROMPT are controlled by _gc_policy_time (default is 10 minutes).select object,node,sopens,xopens,xfers from x$object_policy_statistics;PROMPTPROMPT ACTIVE OBJECTS (OBJECT_POLICY_STATISTICS)PROMPT These are the objects that are above the default _gc_policy_minimum (1500).select object, node, sopens+xopens+xfers activityfrom x$object_policy_statisticswhere sopens+xopens+xfers > 1500order by 3 desc;PROMPT LWM FOR LE FREELISTPROMPT This number should never get near zero, if it does consider the fix for bug 14791477PROMPT and/or increasing _gc_element_percent.select sum(lwm) from x$kclfx;PROMPTPROMPT GCSPFMASTER INFO WITH OBJECT NAMEScolumn objname format a120 truselect o.name || ' - '|| o.subname objname, o.type#, h.*from v$gcspfmaster_info h, obj$ o where h.data_object_id=o.dataobj#order by data_object_id;PROMPTPROMPT ASH DETAILS FOR WORST MINUTES:PROMPTPROMPT APPROACH: If you cannot determine the problem from the dataPROMPT above, you may need to look at the details of what each sessionPROMPT is doing during each 'bad' snap. Most likely you will want toPROMPT note the times of the high drm freeze waits, look at whatPROMPT LMS, RMV, LMD0, LMON is doing at those times, and go from there...set lines 140column program format a45 wracolumn sample_time format a25 trucolumn event format a30 trucolumn time_waited format 999999.999column p1 format a40 trucolumn p2 format a40 trucolumn p3 format a40 truselect sample_time, inst_id inst, session_id, program, event, time_waited/1000 TIME_WAITED,p1text||': '||p1 p1,p2text||': '||p2 p2,p3text||': '||p3 p3from gv$active_session_historywhere to_char(sample_time,'Mondd_hh34mi') in (selectto_char(sample_time,'Mondd_hh34mi')from gv$active_session_historywhere event like '%drm freeze%'group by to_char(sample_time,'Mondd_hh34mi'), inst_idhaving sum(time_waited)/1000 > &&threshold)and time_waited > 0.5order by 1,2,3,4,5;spool offPROMPTPROMPT OUTPUT FILE IS: drmdiag_&&dbname&×tamp&&suffixPROMPT
相关文档: 390483.1
5 后续建议部署细粒度的监控工具如OSWATCHER/NMON之类的监控,因为出现问题时OS的进程数已经达到9900多个,建议监控DB的连接数和OS的进程数;