千家信息网

11g生产数据库crsctl start has无法启动故障处理

发表于:2025-01-31 作者:千家信息网编辑
千家信息网最后更新 2025年01月31日,以下内容为模拟操作,因为客户核心数据库是不允许把日志拿出来的哈。不过处理过程和思路几乎是一样的。一、故障描述11G RAC -单机ADG,备端HAS服务无法启动。二、现象查看了集群的各种日志,均无任何
千家信息网最后更新 2025年01月31日11g生产数据库crsctl start has无法启动故障处理

以下内容为模拟操作,因为客户核心数据库是不允许把日志拿出来的哈。
不过处理过程和思路几乎是一样的。

一、故障描述
11G RAC -单机ADG,备端HAS服务无法启动。

二、现象

查看了集群的各种日志,均无任何日志输出。

[root@roidb2 bin]# pwd/u01/app/11.2.0/grid/bin[root@roidb2 bin]# ./crsctl start has[root@roidb2 bin]# --无输出,不提示报错,也不提示成功启动[root@roidb2 bin]# 

怎么办,怎么办?第一次遇到这样的问题。问了客户,说了周五做了搬迁工作,难道是磁盘出了问题,还是权限出了问题。按照这个思路查了一遍,也没有什么发现。回过头来,整理了一下思路,使用strace来看一下,也许会有意想不到的收获。

[root@roidb2 bin]# strace ./crsctl start hasexecve("./crsctl", ["./crsctl", "start", "has"], [/* 28 vars */]) = -1 ENOEXEC (Exec format error)  --格式错误dup(2)                                  = 3fcntl(3, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff367e29000lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)write(3, "strace: exec: Exec format error\n", 32strace: exec: Exec format error) = 32close(3)                                = 0munmap(0x7ff367e29000, 4096)            = 0exit_group(1)                           = ?[root@roidb2 bin]# 

为什么会是这样的报错,难道是文件又问题?继续往下查..........

[root@roidb2 bin]# ls -l crsctl-rwxr-xr-x 1 root root 0 Dec 11 20:54 crsctl[root@roidb2 bin]# file crsctlcrsctl: empty  --竟然是空文件!!!!!!![root@roidb2 bin]# 

怎么办,怎么办?我们知道这是一个脚本文件,那么,我们从其他节点copy一个文件怎么样呢?

三、处理过程

--远程传输一个文件过来呗$scp /u01/app/11.2.0/grid/bin/crsctl root@192.168.1.212:/u01/app/11.2.0/grid/bin/root@192.168.1.212's password: crsctl                                                                                 100% 8574     8.4KB/s   00:00    $
[root@roidb2 bin]# file crsctl         crsctl: POSIX shell script text executable[root@roidb2 bin]# ./crsctl start has  CRS-4123: Oracle High Availability Services has been started.[root@roidb2 bin]# --搞定
--学习官方都怎么写脚本[root@roidb2 bin]# cat crsctl#!/bin/sh## Copyright (c) 2001, 2013, Oracle and/or its affiliates. All rights reserved. # Notes:#   - This script should only use clsecho.bin directly and not clsecho(which is#     this same script).#   - FIXME: crswrap should process hostname locally as well just like init.ohasd.### Main ###ORA_CRS_HOME=/u01/app/11.2.0/gridMY_HOST=roidb1ORACLE_USER=gridORACLE_HOME=/u01/app/11.2.0/gridCRF_HOME=/u01/app/11.2.0/gridexport ORA_CRS_HOME ORACLE_HOME CRF_HOME#limitsCRS_LIMIT_CORE=unlimitedCRS_LIMIT_MEMLOCK=unlimitedCRS_LIMIT_OPENFILE=65536CRS_LIMIT_STACK=2048#export the limit variablesexport CRS_LIMIT_CORE CRS_LIMIT_MEMLOCK CRS_LIMIT_OPENFILE CRS_LIMIT_STACK#listenerCRS_LSNR_STACK=10240export CRS_LSNR_STACK# Unset env var ORACLE_BASE before spawning any processes.unset ORACLE_BASE[ -z "$PERL" ] && PERL="/u01/app/11.2.0/grid/perl/bin/perl -I${ORA_CRS_HOME}/perl/lib"LOGMSG="/bin/logger -puser.err"CLSECHO="/u01/app/11.2.0/grid/bin/clsecho.bin"PLATFORM=`/bin/uname`case $PLATFORM inLinux)        ORACLUSTER_LIB=/etc/ORCLcluster/lib       LD_LIBRARY_PATH=/u01/app/11.2.0/grid/lib:$ORACLUSTER_LIB       export LD_LIBRARY_PATH        # forcibly eliminate LD_ASSUME_KERNEL to ensure NPTL where available       LD_ASSUME_KERNEL=       export LD_ASSUME_KERNEL       LOGGER="/usr/bin/logger"       if [ ! -f "$LOGGER" ];then        LOGGER="/bin/logger"       fi       LOGMSG="$LOGGER -puser.err"       ;;HP-UX) MACH_HARDWARE=`/bin/uname -m`       if [ "$MACH_HARDWARE" = "ia64" ]; then          SO_EXT=so          NMAPIDIR_64=/opt/nmapi/nmapi2/lib/hpux64          NMAPIDIR_32=/opt/nmapi/nmapi2/lib/hpux32       else          SO_EXT=sl          NMAPIDIR_64=/opt/nmapi/nmapi2/lib/pa20_64          NMAPIDIR_32=/opt/nmapi/nmapi2/lib       fi        case $0 in           */lsnodes|lsnodes)               if [ ! -f $NMAPIDIR_64/libnmapi2.so -a ! -f $NMAPIDIR_32/libnmapi2.so ]; then                   /bin/echo "No vendor clusterware installed."                   exit 1               fi               ;;       esac       LD_LIBRARY_PATH=/u01/app/11.2.0/grid/lib:$NMAPIDIR_64:/usr/lib:$LD_LIBRARY_PATH       SHLIB_PATH=/u01/app/11.2.0/grid/lib32:$NMAPIDIR_32:$SHLIB_PATH       export LD_LIBRARY_PATH       export SHLIB_PATH        ;;SunOS) ARCH_NAME=`/bin/uname -p`       if [ "${ARCH_NAME}" = "sparc" ]; then           LD_LIBRARY_PATH_64=/u01/app/11.2.0/grid/lib:/opt/ORCLcluster/lib:/usr/lib/sparcv9:/usr/ucblib/sparcv9:$LD_LIBRARY_PATH_64       else           LD_LIBRARY_PATH_64=/u01/app/11.2.0/grid/lib:/opt/ORCLcluster/lib:/usr/lib/amd64:/usr/ucblib/amd64:$LD_LIBRARY_PATH_64       fi       LD_LIBRARY_PATH=/u01/app/11.2.0/grid/lib:/opt/ORCLcluster/lib:/usr/lib:/usr/ucblib:$LD_LIBRARY_PATH       export LD_LIBRARY_PATH_64       export LD_LIBRARY_PATH       GREP='/usr/bin/grep'       /usr/bin/coreadm | $GREP  'process core dumps' | $GREP  'enabled' > /dev/null       STATUS1=$?       /usr/bin/coreadm | $GREP  'global core dumps' | $GREP 'enabled' > /dev/null       STATUS2=$?       if [ "$STATUS1" != "0" ] && [ "$STATUS2" != "0" ];       then           /usr/bin/coreadm -e global > /dev/null 2>&1       fi       /usr/bin/coreadm | $GREP  'process setid' | $GREP 'enabled' > /dev/null       STATUS1=$?       /usr/bin/coreadm | $GREP  'global setid' | $GREP 'enabled' > /dev/null       STATUS2=$?       if [ "$STATUS1" != "0" ] && [ "$STATUS2" != "0" ];       then           /usr/bin/coreadm -e global-setid > /dev/null 2>&1       fi       # Solaris allows partitioning of resources by Projects.       # On Solaris, start crsd/ohasd using the default Project of       # the owner of the Grid Home. See bugs 9442360 / 5629487.       PROJECT=`/usr/bin/projects -d $ORACLE_USER`        # If no project is set use the default root project       if [ "$PROJECT" = "" ]; then           PROJECT="user.root"       fi       ;;AIX)   ORACLUSTER_LIB=/opt/ORCLcluster/lib        LIBPATH=/u01/app/11.2.0/grid/lib:$ORACLUSTER_LIB:/usr/lib       LD_LIBRARY_PATH=$LIBPATH:$LD_LIBRARY_PATH       AIXTHREAD_SCOPE=S       export LIBPATH       export LD_LIBRARY_PATH       export AIXTHREAD_SCOPE       ;;*)     /bin/echo "ERROR: Unknown Operating System"       exit -1       ;;esac# enable GIPCHA consistently along with root scriptscase $PLATFORM in  Linux)    GIPCD_PASSTHROUGH=false    export GIPCD_PASSTHROUGH    ;;  HP-UX)    GIPCD_PASSTHROUGH=false    export GIPCD_PASSTHROUGH    ;;  SunOS)    GIPCD_PASSTHROUGH=false    export GIPCD_PASSTHROUGH    ;;  AIX)    GIPCD_PASSTHROUGH=false    export GIPCD_PASSTHROUGH    ;;  OSF1)    ;;esaccase $0 in*.bin)     ORASYM=/u01/app/11.2.0/grid/bin/`basename $0 .bin`    ;;*)         ORASYM=$0.bin    ;;esacexport ORASYMcase $ORASYM in*ocrpatch*)     if [ ! -x $ORASYM ]     then       /bin/echo "NOTE:"       /bin/echo "The ocrpatch binary is not part of the software distribution;"       /bin/echo "ocrpatch can only be obtained and used by Oracle Support."       exit -1     fi     ;;*ocssd*)     if [ "$PLATFORM" = "AIX" ]     then       UID=`id -u`       if [ $UID -eq 0 ]; # do not want to do su in SIHA       then         SU='/bin/su'         $SU $ORACLE_USER -c "/bin/sh -c 'ulimit -c unlimited; $ORASYM $@'"         exit 0       fi     fi     ;;*ohasd*)    CRSWRAPEXECE="/u01/app/11.2.0/grid/bin/crswrapexece.pl"    ENV_FILE="${ORA_CRS_HOME}/crs/install/s_crsconfig_${MY_HOST}_env.txt"    export ENV_FILE    if [ ! -f "$CRSWRAPEXECE" ]    then      $LOGMSG "$CRSWRAPEXECE script is not found"      exit 1;    fi    # we attempt to set limits here and check if return code is 0    # if not we generate an alert using clsecho    # see init.ohasd.sbs for a full rationale    #STACK_SIZE limit. The goal is to reduce thread usage across the grid    #infrastructure bottom up from the ohasd wrapper (Bug 9154152).    #Only the soft limit is set so that any process even unpriviledged can    #reincrease it up to the administrator set hard limit    ulimit -Ss 2048    if [ "$?" != "0" ]        then        $CLSECHO -p has -f crs -l -m 6021 "Ss" "2048"    fi        case $PLATFORM in    Linux)         # MEMLOCK limit is for Bug 9136459        ulimit -l unlimited        if [ "$?" != "0" ]        then            $CLSECHO -p has -f crs -l -m 6021 "l" "unlimited"        fi        ulimit -c unlimited        if [ "$?" != "0" ]        then            $CLSECHO -p has -f crs -l -m 6021 "c" "unlimited"        fi        ulimit -n 65536        if [ "$?" != "0" ]        then            $CLSECHO -p has -f crs -l -m 6021 "n" "65536"        fi        ;;    *)         ulimit -c unlimited        if [ "$?" != "0" ]        then            $CLSECHO -p has -f crs -l -m 6021 "c" "unlimited"        fi        ulimit -n 65536        if [ "$?" != "0" ]        then            $CLSECHO -p has -f crs -l -m 6021 "n" "65536"        fi        ;;    esac    $LOGMSG "exec $PERL /u01/app/11.2.0/grid/bin/crswrapexece.pl $ENV_FILE $ORASYM \"$@\""    exec $PERL /u01/app/11.2.0/grid/bin/crswrapexece.pl $ENV_FILE $ORASYM "$@"    # Reached here only if exec fails    /bin/echo "Failed to execute \"exec $PERL /u01/app/11.2.0/grid/bin/crswrapexece.pl $ENV_FILE $ORASYM \"$@\""    $LOGMSG "Failed to execute \"exec $PERL /u01/app/11.2.0/grid/bin/crswrapexece.pl $ENV_FILE $ORASYM \"$@\""    exit 1;    ;;*)    if [ "$PLATFORM" = "AIX" ]    then      # Prevents the setting of RT_GRQ for non-ocssd and non-cssagent processes      # RT_GRQ is turned on globally for all processes in the environment file      # generated by s_crsconfig_lib.pm during install setup, for AIX platform.      # This should prevent rdbms RT processes from inheriting this attribute      # since crsd will not have RT_GRQ set.      #      # NOTE: cssdagent and monitor does not need a special case since they      #       do not use this wrapper script. So the '*)' case here does not      #       apply and they *will* inherit RT_GRQ attribute, as intended      RT_GRQ=      export RT_GRQ    fi    ;;esac# Solaris allows partitioning of resources by Projects.# On Solaris, start crsd/ohasd using the default Project of# the owner of the Grid Home. See bugs 9442360 / 5629487.case $PLATFORM inSunOS)    case $ORASYM in    *ohasd*|*crsd*)         exec /usr/bin/newtask -p $PROJECT $ORASYM "$@"         ;;    *)         exec $ORASYM "$@"         ;;    esac    ;;*)    exec $ORASYM "$@"    ;;esac [root@roidb2 bin]# 

小结:
1.数据库、主机的启停一定要正常步骤进行,切记直接断电。
2.搬迁之前,做好备份工作,移动安装设备要注意轻拿轻放。

0