千家信息网

goldengate配置添加pump进程僵死分析

发表于:2024-09-24 作者:千家信息网编辑
千家信息网最后更新 2024年09月24日,1.创建目录:GGSCI (jq-prod-oracle-wms3-120-24) 3> CREATE SUBDIRSCreating subdirectories under current dir
千家信息网最后更新 2024年09月24日goldengate配置添加pump进程僵死分析


1.创建目录:


GGSCI (jq-prod-oracle-wms3-120-24) 3> CREATE SUBDIRS

Creating subdirectories under current directory /u01/app/goldengate

Parameter file /u01/app/goldengate/dirprm: created.
Report file /u01/app/goldengate/dirrpt: created.
Checkpoint file /u01/app/goldengate/dirchk: created.
Process status files /u01/app/goldengate/dirpcs: created.
SQL script files /u01/app/goldengate/dirsql: created.
Database definitions files /u01/app/goldengate/dirdef: created.
Extract data files /u01/app/goldengate/dirdat: created.
Temporary files /u01/app/goldengate/dirtmp: created.
Credential store files /u01/app/goldengate/dircrd: created.
Masterkey wallet files /u01/app/goldengate/dirwlt: created.
Dump files /u01/app/goldengate/dirdmp: created.


2.编辑mgr:

edit param mgr

port 7809
autostart er *
autorestart er *
PURGEOLDEXTRACTS /s01/app/goldengate/dirdat/sz*, USECHECKPOINTS, MINKEEPDAYS 3

启动:

GGSCI (chuanqiu) 9> start mgr
Manager started.


GGSCI (chuanqiu) 10> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING


3.添加pump进程:

GGSCI (scm02db01.baozunops.com) 2> edit params p_wmsjq

extract p_wmsjq
rmthost 192.168.101, mgrport 7809, compress
passthru
numfiles 5000
rmttrail ./dirdat/sz
--dynamicresolution
ddl
table wms.T_USER ;
table wms.T_BRAND ;
table wms.T_CHANNEL ;
table wms.T_CUSTOMER ;


ADD EXTRACT p_wmsjq, EXTTRAILSOURCE ./dirdat/ea, BEGIN now

add rmttrail ./dirdat/sz extract p_wmsjq

启动进程:

start p_wmsjq


查看添加的进程状态:

stats P_WMSJQ

Sending STATS request to EXTRACT P_WMSJQ ...

2018-09-19 16:34:33 ERROR OGG-15149 EXTRACT P_WMSJQ is initializing, please try the command later.


GGSCI (chunqiu) 40> info P_WMSJQ

EXTRACT P_WMSJQ Initialized 2018-09-19 16:27 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:12:20 ago)
Process ID 11252
Log Read Checkpoint File ./dirdat/ea000000000
2018-09-19 16:27:03.000000


为什么会出现这种情况呢?

这是因为抽取进程以及运行了很久,而且日志量非常的大,之前的很多文件已经删除,所以P_WMSJQ进程在后台不断的寻找刚才从BEGIN now开始的时候的日志,可以从后台可以看到如下日志确认:

tail -f ggserr.log查看后台日志:

2018-09-19T16:41:03.194+0800 INFO OGG-02232 Oracle GoldenGate Capture for Oracle, p_wmsjq.prm: Switching to next trail file /u01/goldengate/dirdat/ea000001813 at 2018-09-19 16:41:03.194239 due to EOF. with current RBA 499,998,509.
2018-09-19T16:41:07.781+0800 INFO OGG-02232 Oracle GoldenGate Capture for Oracle, p_wmsjq.prm: Switching to next trail file /u01/goldengate/dirdat/ea000001814 at 2018-09-19 16:41:07.781704 due to EOF. with current RBA 499,999,232.
2018-09-19T16:41:12.339+0800 INFO OGG-02232 Oracle GoldenGate Capture for Oracle, p_wmsjq.prm: Switching to next trail file /u01/goldengate/dirdat/ea000001815 at 2018-09-19 16:41:12.339296 due to EOF. with current RBA 499,998,210.

那么到底需要多久才可以完成呢? 我们可以从extract的抽取进程生成的最新的trail文件确认:

查看trail文件的序号:

ls -l ea000002*

-rw-r----- 1 oracle oinstall 477M Sep 19 15:58 ea000002146
-rw-r----- 1 oracle oinstall 477M Sep 19 16:03 ea000002147
-rw-r----- 1 oracle oinstall 477M Sep 19 16:07 ea000002148
-rw-r----- 1 oracle oinstall 477M Sep 19 16:11 ea000002149
-rw-r----- 1 oracle oinstall 477M Sep 19 16:25 ea000002150
-rw-r----- 1 oracle oinstall 477M Sep 19 16:40 ea000002151
-rw-r----- 1 oracle oinstall 477M Sep 19 16:52 ea000002152
-rw-r----- 1 oracle oinstall 6.0M Sep 19 16:52 ea000002153


或者查看另外的pump进程:

GGSCI (chunqiu) 41> info PUMP_lbs

EXTRACT PUMP_WMS Last Started 2018-09-19 06:55 Status RUNNING
Checkpoint Lag 00:00:03 (updated 00:00:03 ago)
Process ID 92773
Log Read Checkpoint File /u01/goldengate/dirdat/ea000002152
2018-09-19 16:42:10.000000 RBA 85805949

从该进程可以看出ea000001815 到ea000002151 还有点时间,不过已经越来越接近了。

等了很久,再次看下:

GGSCI (scm02db01.baozunops.com) 54> info P_WMSJQ

EXTRACT P_WMSJQ Initialized 2018-09-19 16:27 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:42:57 ago)
Process ID 11252
Log Read Checkpoint File ./dirdat/ea000000000
2018-09-19 16:27:03.000000

已经42分钟过去了,可知trail文件是相当大的。


最后确认:

GGSCI (scm02db01.baozunops.com) 79> info P_WMSJQ

EXTRACT P_WMSJQ Last Started 2018-09-19 17:32 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:07 ago)
Process ID 71325
Log Read Checkpoint File /s01/goldengate/dirdat/ea000002156
2018-09-19 17:32:04.000000 RBA 37332782


GGSCI (chunqiu) 80> stats P_WMSJQ

Sending STATS request to EXTRACT P_WMSJQ ...

Start of Statistics at 2018-09-19 17:33:07.

DDL replication statistics (for all trails):

*** Total statistics since extract started ***
Operations 0.00
Mapped operations 0.00
Unmapped operations 0.00
Other operations 0.00
Excluded operations 0.00

Output to ./dirdat/sz:

Extracting from wms.T_USER to WMS.T_USER:

*** Total statistics since 2018-09-19 17:32:22 ***
Total inserts 7041.00
Total updates 108651.00
Total deletes 178000.00
Total discards 0.00
Total operations 293692.00


看看target端文件已经传输过来:

-rw-r----- 1 oracle oinstall 499999847 Sep 19 17:31 sz000000000
-rw-r----- 1 oracle oinstall 499999640 Sep 19 17:31 sz000000001
-rw-r----- 1 oracle oinstall 499999561 Sep 19 17:43 sz000000002
-rw-r----- 1 oracle oinstall 50584137 Sep 19 17:47 sz000000003


另外,如果是新的环境或者是日志量非常小,不会出现这种情况。


因此,在日志工作中,特别是生产环境,如果遇到goldengate问题,不要惊慌,只要理解原理,解决问题相当简单。


2018-09-19 周三

0