千家信息网

/limits.conf Oracle bug引起的进程不够用

发表于:2024-11-19 作者:千家信息网编辑
千家信息网最后更新 2024年11月19日,今天在检查SMIDB的时候,发现CRS的告警日志中出现很多错误,具体为:2015-08-19 17:12:21.745:[/oracle/app/11.2.0/grid_1/bin/oraagent.
千家信息网最后更新 2024年11月19日/limits.conf Oracle bug引起的进程不够用

今天在检查SMIDB的时候,发现CRS的告警日志中出现很多错误,具体为:

2015-08-19 17:12:21.745:

[/oracle/app/11.2.0/grid_1/bin/oraagent.bin(6227)]CRS-5013:Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"2015-08-19 17:13:09.986: [/oracle/app/11.2.0/grid_1/bin/oraagent.bin(6227)]CRS-5013:Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"2015-08-19 17:13:21.758: [/oracle/app/11.2.0/grid_1/bin/oraagent.bin(6227)]CRS-5013:Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"

进一步跟踪日志发现:

2015-08-19 17:14:09.993: [ora.LISTENER.lsnr][1342174976]{1:63186:26462} [check] clsn_agent::check: Exception SclsProcessSpawnException2015-08-19 17:14:21.744: [ora.asm][1342174976]{0:21:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 02015-08-19 17:14:21.759: [ora.asm][1342174976]{0:21:2} [check] AsmProxyAgent::check clsagfw_res_status 02015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 2015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] (:CLSN00008:)Utils:execCmd scls_process_spawn() failed 12015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] (:CLSN00008:) category: -2, operation: fork, loc: spawnproc28, OS error: 11, other: forked failed [-1]2015-08-19 17:14:21.761: [ora.LISTENER_SCAN1.lsnr][1339545344]{0:21:2} [check] clsnUtils::error Exception type=2 string=CRS-5013: Agent "/oracle/app/11.2.0/grid_1/bin/oraagent.bin" failed to start process "/oracle/app/11.2.0/grid_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/oracle/app/11.2.0/grid_1/log/smidb11/agent/crsd/oraagent_grid/oraagent_grid.log"


ONS的日志:

[grid@smidb11 logs]$ tail ons.out pthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailablepthread_create() Resource temporarily unavailable[2015-05-07T03:09:22+08:00] [ons] [TRACE:2] [] [internal] ONS worker process stopped (0)


报这个错误说明是由于系统资源不足而导致的进程无法启动,检查ulimit设置

[grid@smidb11 logs]$ ulimit -u10240

limit.conf

# End of filegrid soft nproc 10240grid hard nofile 65536oracle soft nproc 10240oracle hard nofile 65536

limit.conf配置有一些问题,没有配置hard nproc 和 soft nofle,下周一重启前进行修正

[grid@smidb11 pam.d]$ cat login #%PAM-1.0auth [user_unknown=ignore success=ok ignore=ignore default=bad] pam_securetty.soauth       include      system-authaccount    required     pam_nologin.soaccount    include      system-authpassword   include      system-auth# pam_selinux.so close should be the first session rulesession    required     pam_selinux.so closesession    required     pam_loginuid.sosession    optional     pam_console.so# pam_selinux.so open should only be followed by sessions to be executed in the user contextsession    required     pam_selinux.so opensession    required     pam_namespace.sosession    optional     pam_keyinit.so force revokesession    include      system-auth-session   optional     pam_ck_connector.so[grid@smidb11 pam.d]$


/etc/pam.d/login 文件没有添加资源限制模块,这里应该添加一行

session required /lib64/security/pam_limits.so

经过网上查找资料,发现Oracle MOS上面的一个文档,和我们的情况完全一致:

The processes and resources started by CRS (Grid Infrastructure) do not inherit the ulimit setting for "max user processes" from /etc/security/limits.conf setting (文档 ID 1594606.1)

通过验证,发现虽然我们的grid用户的ulimit -u已经设置为10240.但是实际运行的时候依然是1024.

这个是Oracle的一个Bug 17301761 ,我们的数据库版本是11.2.0.4,正好是这个bug的影响范围.

解决办法有两个,

1. 打补丁

2. 通过MOS给出的办法进行规避,如下:

The ohasd script needs to be modified to setthe ulimit explicitly for all grid and database resources that are started bythe Grid Infrastructure (GI).

1) go to GI_HOME/bin

2) make a backup of ohasd script file

3) in the ohasd script file, locate thefollowing code:

Linux)
# MEMLOCK limit is for Bug 9136459
ulimit -l unlimited
if [ "$?" != "0"]
then
$CLSECHO -phas -f crs -l -m 6021 "l" "unlimited"
fi
ulimit -c unlimited
if [ "$?" != "0"]
then
$CLSECHO -phas -f crs -l -m 6021 "c" "unlimited"
fi
ulimit -n 65536

In the above code, insert the following linejust before the line with "ulimit -n 65536"

ulimit -u 16384

4) Recycle CRS manually so that the ohasdwill not use new ulimit setting for open files.
After the database is started, please issue "ps -ef | grep pmon" andget the pid of it.
Then, issue "cat /proc//limits | grepprocess" and find out if the Max process is set to 16384.
Setting the number of processes to 16384 should be enough for most serverssince having 16384 processes normally mean the server to loaded veryheavily. using smaller number like 4096 or 8192 should also suffice formost users.
In addition to above, the ohasd template needs to be modified to insure thatnew ulimit setting persists even after a patch is applied.
1) go to GI_HOME/crs/sbs

2) make a backup of crswrap.sh.sbs

3) in crswrap.sh.sbs, insert the followingline just before the line "# MEMLOCK limit is for Bug 9136459"

ulimit -u 16384
Finally, although the above setting is successfully used to increase the numberof processes setting, please test this on the test server first before settingthe ulimit on the production.



参考:http://blog.csdn.net/weiwangsisoftstone/article/details/42460585


0