Evmd进程不断重启原因深究
客户现场环境,全国性7*24小时在线系统,HP-ux B11.31 for Oracle 10.2.0.4 RAC双节点集群。
客户于6月7日做了主机层面硬件变更,数据库未做任何参数更改。启动起来后发现evmd进程一直在不断重启。evmd:事件管理守护进程。此进程还负责启动racgevt进程以管理FAN服务器端调用。各log也都有相应体现;
ocssd进程log:
[ CSSD]2016-06-07 04:02:28.853 >USER: Copyright 2016, Oracle version 10.2.0.4.0
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=essdb3DBG_CSSD))
[ CSSD]2016-06-07 04:02:28.853 >USER: CSS daemon log for node essdb3, number 1, in cluster crs
[ CSSD]2016-06-07 04:02:28.868 [1] >TRACE: clssscmain: local-only set to false
[ CSSD]2016-06-07 04:02:28.876 [1] >TRACE: clssnmReadNodeInfo: added node 1 (essdb3) to cluster
[ CSSD]2016-06-07 04:02:28.881 [1] >TRACE: clssnmReadNodeInfo: added node 2 (essdb4) to cluster
[ CSSD]2016-06-07 04:02:28.887 [5] >TRACE: clssnm_skgxninit: initialized skgxn version (2/0/Hewlett-Packard SKGXN 2.0
)
[ CSSD]2016-06-07 04:02:38.904 [1] >TRACE: clssnmNMInitialize: misscount set to (600)
[ CSSD]2016-06-07 04:02:38.905 [1] >TRACE: clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 300000 ms, reconfig start (misscount) 600000 ms
[ CSSD]2016-06-07 04:02:38.908 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/essdb2vg1/rdb2vg1_512_votk)
[ CSSD]2016-06-07 04:02:38.909 [6] >TRACE: clssnmvDPT: spawned for disk 0 (/dev/essdb2vg1/rdb2vg1_512_votk)
[ CSSD]2016-06-07 04:02:38.910 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (1//dev/essdb2vg2/rdb2vg2_512_votk)
[ CSSD]2016-06-07 04:02:38.911 [7] >TRACE: clssnmvDPT: spawned for disk 1 (/dev/essdb2vg2/rdb2vg2_512_votk)
[ CSSD]2016-06-07 04:02:38.915 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (2//dev/essdb2vg3/rdb2vg3_512_votk)
[ CSSD]2016-06-07 04:02:38.915 [8] >TRACE: clssnmvDPT: spawned for disk 2 (/dev/essdb2vg3/rdb2vg3_512_votk)
[ CSSD]2016-06-07 04:02:40.926 [6] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/essdb2vg1/rdb2vg1_512_votk)
[ CSSD]2016-06-07 04:02:40.927 [9] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (/dev/essdb2vg1/rdb2vg1_512_votk) initial sleep interval (1000)ms
[ CSSD]2016-06-07 04:02:40.927 [6] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(6) LATS(1433998) Disk lastSeqNo(6)
[ CSSD]2016-06-07 04:02:40.928 [7] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (1//dev/essdb2vg2/rdb2vg2_512_votk)
[ CSSD]2016-06-07 04:02:40.929 [10] >TRACE: clssnmvKillBlockThread: spawned for disk 1 (/dev/essdb2vg2/rdb2vg2_512_votk) initial sleep interval (1000)ms
[ CSSD]2016-06-07 04:02:40.930 [7] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(6) LATS(1434001) Disk lastSeqNo(6)
[ CSSD]2016-06-07 04:02:40.932 [8] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (2//dev/essdb2vg3/rdb2vg3_512_votk)
[ CSSD]2016-06-07 04:02:40.933 [11] >TRACE: clssnmvKillBlockThread: spawned for disk 2 (/dev/essdb2vg3/rdb2vg3_512_votk) initial sleep interval (1000)ms
[ CSSD]2016-06-07 04:02:40.934 [8] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(6) LATS(1434005) Disk lastSeqNo(6)
[ CSSD]2016-06-07 04:02:40.943 [1] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2016-06-07 04:02:40.976 [13] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49895))
[ CSSD]2016-06-07 04:02:40.977 [13] >TRACE: clssnmClusterListener: Probing node essdb4 (2), probcon(6000000000251ed0)
[ CSSD]2016-06-07 04:02:40.978 [13] >TRACE: clssnmConnComplete: MSGSRC 2, type 6, node 2, flags 0x0001, con 6000000000251ed0, probe 6000000000251ed0
[ CSSD]2016-06-07 04:02:40.978 [13] >TRACE: clssnmConnComplete: node 2, essdb4, con(6000000000251ed0), probcon(6000000000251ed0), ninfcon(0000000000000000), node unique 1465243343, prev unique 0, msg unique 1465243343 node state 0
[ CSSD]2016-06-07 04:02:40.978 [13] >TRACE: clssnmConnComplete: connected to node 2 (con 6000000000251ed0), ninfcon (6000000000251ed0), state (0), flag (9229)
[ CSSD]2016-06-07 04:02:40.982 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2016-06-07 04:02:40.982 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_essdb3_crs))
[ CSSD]2016-06-07 04:02:40.985 [17] >TRACE: clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=27)(HOST=192.168.1.3)(PORT=49894))
[ CSSD]2016-06-07 04:02:43.568 [13] >TRACE: clssnmHandleSync: diskTimeout set to (597000)ms
[ CSSD]2016-06-07 04:02:43.568 [20] >TRACE: clssnmRcfgMgrThread: initial lastleader(2) unique(1465243343)
[ CSSD]2016-06-07 04:02:43.568 [13] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[essdb4] seq[1] sync[1]
[ CSSD]2016-06-07 04:02:43.569 [13] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(1)
[ CSSD]2016-06-07 04:02:43.570 [13] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2016-06-07 04:02:43.570 [13] >TRACE: clssnmUpdateNodeState: node 1, state (1/3) unique (1465243348/1465243348) prevConuni(0) birth (0/1) (old/new)
[ CSSD]2016-06-07 04:02:43.570 [13] >TRACE: clssnmUpdateNodeState: node 2, state (1/3) unique (1465243343/1465243343) prevConuni(0) birth (0/1) (old/new)
[ CSSD]2016-06-07 04:02:43.570 [13] >USER: clssnmHandleUpdate: SYNC(1) from node(2) completed
[ CSSD]2016-06-07 04:02:43.570 [13] >USER: clssnmHandleUpdate: NODE 1 (essdb3) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2016-06-07 04:02:43.570 [13] >USER: clssnmHandleUpdate: NODE 2 (essdb4) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2016-06-07 04:02:43.570 [13] >TRACE: clssnmHandleUpdate: diskTimeout set to (597000)ms
[ CSSD]2016-06-07 04:02:43.620 [1] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2016-06-07 04:02:43.622 [21] >TRACE: clssgmReconfigThread: started for reconfig (1)
[ CSSD]2016-06-07 04:02:43.622 [21] >USER: NMEVENT_RECONFIG [00][00][00][06]
[ CSSD]2016-06-07 04:02:43.622 [21] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 1
[ CSSD]2016-06-07 04:02:43.624 [17] >TRACE: clssgmInitialRecv: (6000000000ad6ed0) accepted a new connection from node 2 born at 1 active (2, 2), vers (10,3,1,2)
[ CSSD]2016-06-07 04:02:43.624 [17] >TRACE: clssgmInitialRecv: conns done (2/2)
[ CSSD]2016-06-07 04:02:43.624 [21] >TRACE: clssgmEstablishMasterNode: MASTER for 1 is node(1) birth(1)
[ CSSD]2016-06-07 04:02:43.624 [21] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2016-06-07 04:02:43.624 [21] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
[ CSSD]2016-06-07 04:02:43.624 [21] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 2 nodes
[ CSSD]CLSS-3001: local node number 1, master node number 1
[ CSSD]2016-06-07 04:02:43.631 [21] >TRACE: clssgmReconfigThread: completed for reconfig(1), with status(1)
[ CSSD]2016-06-07 04:02:43.751 [14] >TRACE: clssgmCommonAddMember: clsomon joined (1/0x1000000/#CSS_CLSSOMON)
[ CSSD]2016-06-07 04:08:21.574 [17] >WARNING: clssgmAssignMemberNo(): grock(CRSDMAIN) memberNo(2) already assigned
evmd日志:
2016-06-07 04:02:51.086: [ EVMD][1] EVMD Starting
2016-06-07 04:02:51.086: [ EVMD][1]
Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2007, Oracle. All rights reserved
2016-06-07 04:02:51.086: [ EVMD][1] Initializing OCR
2016-06-07 04:02:51.108: [ EVMD][1] Active Version from OCR:10.2.0.4.0
2016-06-07 04:02:51.108: [ EVMD][1] Active Version and Software Version are same
2016-06-07 04:02:51.108: [ EVMD][1] Initializing Diagnostics Settings
2016-06-07 04:02:51.112: [ EVMD][1] ENV Logging level for Module: allcomp 0
2016-06-07 04:02:51.115: [ EVMD][1] ENV Logging level for Module: default 0
2016-06-07 04:02:51.117: [ EVMD][1] ENV Logging level for Module: COMMCRS 0
2016-06-07 04:02:51.119: [ EVMD][1] ENV Logging level for Module: COMMNS 0
2016-06-07 04:02:51.122: [ EVMD][1] ENV Logging level for Module: EVMD 0
2016-06-07 04:02:51.125: [ EVMD][1] ENV Logging level for Module: EVMDMAIN 0
2016-06-07 04:02:51.127: [ EVMD][1] ENV Logging level for Module: EVMCOMM 0
2016-06-07 04:02:51.130: [ EVMD][1] ENV Logging level for Module: EVMEVT 0
2016-06-07 04:02:51.132: [ EVMD][1] ENV Logging level for Module: EVMAPP 0
2016-06-07 04:02:51.133: [ EVMD][1] ENV Logging level for Module: EVMAGENT 0
2016-06-07 04:02:51.135: [ EVMD][1] ENV Logging level for Module: CRSOCR 0
2016-06-07 04:02:51.136: [ EVMD][1] ENV Logging level for Module: CLUCLS 0
2016-06-07 04:02:51.138: [ EVMD][1] ENV Logging level for Module: OCRRAW 0
2016-06-07 04:02:51.141: [ EVMD][1] ENV Logging level for Module: OCROSD 0
2016-06-07 04:02:51.142: [ EVMD][1] ENV Logging level for Module: OCRAPI 0
2016-06-07 04:02:51.144: [ EVMD][1] ENV Logging level for Module: OCRUTL 0
2016-06-07 04:02:51.146: [ EVMD][1] ENV Logging level for Module: OCRMSG 0
2016-06-07 04:02:51.148: [ EVMD][1] ENV Logging level for Module: OCRCLI 0
2016-06-07 04:02:51.149: [ EVMD][1] ENV Logging level for Module: CSSCLNT 0
2016-06-07 04:02:51.150: [ EVMD][1] Creating pidfile /oracle/product/10.2/crs/evm/init/essdb3.pid
[ clsdmt][4]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=essdb3DBG_EVMD))
2016-06-07 04:02:51.162: [ EVMD][1] Authorization database built successfully.
2016-06-07 04:02:53.370: [ EVMEVT][1][ENTER] EVM Listening on: 160045173
2016-06-07 04:02:53.373: [ COMMCRS][6]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49897))
2016-06-07 04:02:53.393: [ EVMAPP][1] EVMD Started
2016-06-07 04:02:53.393: [ COMMCRS][9]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49898))
2016-06-07 04:02:53.396: [ EVMEVT][12] Default port busy, listening at (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.3)(PORT=49905))
2016-06-07 04:02:53.397: [ OCRCLI][12]proac_set_value: Response message returned with failure keyname = [SYSTEM.evm.cevmport.essdb3], retcode = 5
2016-06-07 04:02:53.397: [ EVMEVT][12][PANIC] EVMD exiting. Couldn't update OCR key rc = 5
2016-06-07 04:02:53.397: [ EVMD][12][PANIC] EVMD exiting.
2016-06-07 04:02:53.398: [ EVMD][12] Done.
2016-06-07 04:02:54.500: [ EVMD][1] EVMD Starting
2016-06-07 04:02:54.501: [ EVMD][1]
Oracle Database 10g CRS Release 10.2.0.4.0 Production Copyright 1996, 2007, Oracle. All rights reserved
2016-06-07 04:02:54.501: [ EVMD][1] Initializing OCR
2016-06-07 04:02:54.522: [ EVMD][1] Active Version from OCR:10.2.0.4.0
2016-06-07 04:02:54.522: [ EVMD][1] Active Version and Software Version are same
2016-06-07 04:02:54.523: [ EVMD][1] Initializing Diagnostics Settings
2016-06-07 04:02:54.526: [ EVMD][1] ENV Logging level for Module: allcomp 0
2016-06-07 04:02:54.528: [ EVMD][1] ENV Logging level for Module: default 0
2016-06-07 04:02:54.530: [ EVMD][1] ENV Logging level for Module: COMMCRS 0
2016-06-07 04:02:54.531: [ EVMD][1] ENV Logging level for Module: COMMNS 0
2016-06-07 04:02:54.533: [ EVMD][1] ENV Logging level for Module: EVMD 0
2016-06-07 04:02:54.534: [ EVMD][1] ENV Logging level for Module: EVMDMAIN 0
2016-06-07 04:02:54.536: [ EVMD][1] ENV Logging level for Module: EVMCOMM 0
2016-06-07 04:02:54.539: [ EVMD][1] ENV Logging level for Module: EVMEVT 0
2016-06-07 04:02:54.541: [ EVMD][1] ENV Logging level for Module: EVMAPP 0
2016-06-07 04:02:54.543: [ EVMD][1] ENV Logging level for Module: EVMAGENT 0
2016-06-07 04:02:54.544: [ EVMD][1] ENV Logging level for Module: CRSOCR 0
2016-06-07 04:02:54.546: [ EVMD][1] ENV Logging level for Module: CLUCLS 0
2016-06-07 04:02:54.548: [ EVMD][1] ENV Logging level for Module: OCRRAW 0
2016-06-07 04:02:54.549: [ EVMD][1] ENV Logging level for Module: OCROSD 0
2016-06-07 04:02:54.551: [ EVMD][1] ENV Logging level for Module: OCRAPI 0
2016-06-07 04:02:54.552: [ EVMD][1] ENV Logging level for Module: OCRUTL 0
2016-06-07 04:02:54.554: [ EVMD][1] ENV Logging level for Module: OCRMSG 0
2016-06-07 04:02:54.556: [ EVMD][1] ENV Logging level for Module: OCRCLI 0
2016-06-07 04:02:54.558: [ EVMD][1] ENV Logging level for Module: CSSCLNT 0
2016-06-07 04:02:54.559: [ EVMD][1] Creating pidfile /oracle/product/10.2/crs/evm/init/essdb3.pid
[ clsdmt][4]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=essdb3DBG_EVMD))
2016-06-07 04:02:54.574: [ EVMD][1] Authorization database built successfully.
2016-06-07 04:02:56.731: [ EVMEVT][1][ENTER] EVM Listening on: 160045173
2016-06-07 04:02:56.734: [ COMMCRS][6]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49897))
2016-06-07 04:02:56.759: [ COMMCRS][9]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49898))
2016-06-07 04:02:56.759: [ EVMAPP][1] EVMD Started
2016-06-07 04:02:56.765: [ EVMEVT][12] Default port busy, listening at (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.3)(PORT=49907))
2016-06-07 04:02:56.767: [ OCRCLI][12]proac_set_value: Response message returned with failure keyname = [SYSTEM.evm.cevmport.essdb3], retcode = 5
2016-06-07 04:02:56.767: [ EVMEVT][12][PANIC] EVMD exiting. Couldn't update OCR key rc = 5
2016-06-07 04:02:56.767: [ EVMD][12][PANIC] EVMD exiting.
2016-06-07 04:02:56.768: [ EVMD][12] Done.
alert日志:
2016-06-07 04:02:44.208
[evmd(9218)]CRS-1401:EVMD started on node essdb3.
2016-06-07 04:02:47.748
[evmd(9682)]CRS-1401:EVMD started on node essdb3.
2016-06-07 04:02:51.151
[evmd(9788)]CRS-1401:EVMD started on node essdb3.
2016-06-07 04:02:54.559
[evmd(9894)]CRS-1401:EVMD started on node essdb3.
2016-06-07 04:02:57.988
[evmd(10005)]CRS-1401:EVMD started on node essdb3.
2016-06-07 04:03:01.329
[evmd(10111)]CRS-1401:EVMD started on node essdb3.
2016-06-07 04:03:04.720
[evmd(10222)]CRS-1401:EVMD started on node essdb3.
2016-06-07 04:03:08.117
[evmd(10339)]CRS-1401:EVMD started on node essdb3.
2016-06-07 04:03:11.435
[evmd(10447)]CRS-1401:EVMD started on node essdb3.
主机的CPU资源也使用的特别紧张,基本上是evmlogger进程占用的top 5
怀疑是OCR方面的问题,试着看看权限
essdb3_oracle[/oraclelog/admin/essdb2/bdump]$ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 524044
Used space (kbytes) : 4372
Available space (kbytes) : 519672
ID : 1612984325
Device/File Name : /dev/essdb2vg1/rdb2vg1_512_ocr
Device/File integrity check succeeded
Device/File Name : /dev/essdb2vg2/rdb2vg2_512_ocr
Device/File integrity check succeeded
Cluster registry integrity check succeeded
essdb3_oracle[/oraclelog/admin/essdb2/bdump]$ll /dev/essdb2vg1/rdb2vg1_512_ocr
crw-r----- 1 root oinstall 128 0x0200bc 2011年5月18日 /dev/essdb2vg1/rdb2vg1_512_ocr
essdb3_oracle[/oraclelog/admin/essdb2/bdump]$ll /dev/essdb2vg2/rdb2vg2_512_ocr
crw-r----- 1 root oinstall 128 0x0300bc 2011年5月18日 /dev/essdb2vg2/rdb2vg2_512_ocr
通过分析essdb3节点的evmd日志发现evmd进程一直无法启动反复出现如下evmd进程默认端口被占用错误信息,如下:
2016-06-07 04:02:56.734: [ COMMCRS][6]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49897))
2016-06-07 04:02:56.759: [ COMMCRS][9]clsclisten: Address already in use: (ADDRESS=(PROTOCOL=tcp)(HOST=essdb3-priv)(PORT=49898))
日志信息Oracle Bug 8200527描述一致,BUG 8200527 信息,如下:
MOS文档 EVMD Coredumps When Starting Up (Doc ID 1091803.1) 信息,有下面的建议:
检查占用端口49897,49898的进程,并将集群重启。evmd恢复正常,不在重启;