千家信息网

Sun Solaris XSCF故障诊断

发表于:2025-01-23 作者:千家信息网编辑
千家信息网最后更新 2025年01月23日,1、showhardconfshowhardconf 命令可用于显示有关每个 FRU 的信息。可显示的信息如下所示:■ 当前配置和状态■ 安装的 FRU 数■ 域信息■ IOBOX 信息■ PCI 卡
千家信息网最后更新 2025年01月23日Sun Solaris XSCF故障诊断

1showhardconf

showhardconf 命令可用于显示有关每个 FRU 的信息。可显示的信息如下所示:

当前配置和状态

安装的 FRU

域信息

IOBOX 信息

PCI 卡的名称属性


XSCF> showhardconf

SPARC Enterprise M4000;

+ Serial:BDF1115196; Operator_Panel_Switch:Locked;

+ Power_Supply_System:Single; SCF-ID:XSCF#0;

+ System_Power:On; System_Phase:Cabinet Power On;

Domain#0 Domain_Status:Running;

MBU_A Status:Normal; Ver:4301h; Serial:BD1114008E ;

+ FRU-Part-Number:CF00541-4359 01 /541-4359-01 ;

+ Memory_Size:64 GB;

+ Type:2;

CPUM#0-CHIP#0 Status:Normal; Ver:0601h; Serial:PP105300QG ;

+ FRU-Part-Number:CA06761-D205 C1 /371-4932-03 ;

+ Freq:2.660 GHz; Type:48;

+ Core:4; Strand:2;

CPUM#0-CHIP#1 Status:Normal; Ver:0601h; Serial:PP105300QG ;

+ FRU-Part-Number:CA06761-D205 C1 /371-4932-03 ;

+ Freq:2.660 GHz; Type:48;

+ Core:4; Strand:2;

CPUM#1-CHIP#0 Status:Normal; Ver:0601h; Serial:PP104903Y5 ;

+ FRU-Part-Number:CA06761-D205 C1 /371-4932-03 ;

+ Freq:2.660 GHz; Type:48;

+ Core:4; Strand:2;

CPUM#1-CHIP#1 Status:Normal; Ver:0601h; Serial:PP104903Y5 ;

+ FRU-Part-Number:CA06761-D205 C1 /371-4932-03 ;

+ Freq:2.660 GHz; Type:48;

+ Core:4; Strand:2;

MEMB#0 Status:Normal; Ver:0101h; Serial:BF1109220C ;

+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;

MEM#0A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f244b4c;

+ Type:2A; Size:2 GB;

MEM#0B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f83e611;

+ Type:2A; Size:2 GB;

MEM#1A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f53e611;

+ Type:2A; Size:2 GB;

MEM#1B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f444b4b;

+ Type:2A; Size:2 GB;

* MEM#2A Status:Degraded;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f63e609;

+ Type:2A; Size:2 GB;

MEM#2B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f83e5fa;

+ Type:2A; Size:2 GB;

MEM#3A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f444b4c;

+ Type:2A; Size:2 GB;

MEM#3B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-3f344b4c;

+ Type:2A; Size:2 GB;

MEMB#1 Status:Normal; Ver:0101h; Serial:BF1036E3DX ;

+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;

MEM#0A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5274b16d;

+ Type:2A; Size:2 GB;

MEM#0B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5214c262;

+ Type:2A; Size:2 GB;

MEM#1A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5234c261;

+ Type:2A; Size:2 GB;

MEM#1B Status:Normal;

+ Code:ce0000000000000001M3 93T5660QZA-CE6 4151-481382de;

+ Type:2A; Size:2 GB;

MEM#2A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5e649f87;

+ Type:2A; Size:2 GB;

MEM#2B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5264b175;

+ Type:2A; Size:2 GB;

MEM#3A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5274b170;

+ Type:2A; Size:2 GB;

MEM#3B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-5234c268;

+ Type:2A; Size:2 GB;

MEMB#2 Status:Normal; Ver:0101h; Serial:BF1051HK5T ;

+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;

MEM#0A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4833ce5e;

+ Type:2A; Size:2 GB;

MEM#0B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4813ce45;

+ Type:2A; Size:2 GB;

MEM#1A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4843ce5f;

+ Type:2A; Size:2 GB;

MEM#1B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4833ce5c;

+ Type:2A; Size:2 GB;

MEM#2A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4813ce5e;

+ Type:2A; Size:2 GB;

MEM#2B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4883341c;

+ Type:2A; Size:2 GB;

MEM#3A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48833439;

+ Type:2A; Size:2 GB;

MEM#3B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48733428;

+ Type:2A; Size:2 GB;

MEMB#3 Status:Normal; Ver:0101h; Serial:BF1040EUC8 ;

+ FRU-Part-Number:CF00541-0545 09 /541-0545-09 ;

MEM#0A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4823a1a3;

+ Type:2A; Size:2 GB;

MEM#0B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48731182;

+ Type:2A; Size:2 GB;

MEM#1A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4823a19c;

+ Type:2A; Size:2 GB;

MEM#1B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48631182;

+ Type:2A; Size:2 GB;

MEM#2A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4823a19a;

+ Type:2A; Size:2 GB;

MEM#2B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4833a19a;

+ Type:2A; Size:2 GB;

MEM#3A Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-48831186;

+ Type:2A; Size:2 GB;

MEM#3B Status:Normal;

+ Code:ad0000000000000001HYMP125P72CP4-Y5 4141-4813a1a2;

+ Type:2A; Size:2 GB;

DDC_A#0 Status:Normal;

DDC_A#1 Status:Normal;

DDC_B#0 Status:Normal;

IOU#0 Status:Normal; Ver:0101h; Serial:BF110617KB ;

+ FRU-Part-Number:CF00541-2240 05 /541-2240-05 ;

+ Type:1;

DDC_A#0 Status:Normal;

DDCR Status:Normal;

DDC_B#0 Status:Normal;

PCI#2 Name_Property:network; Card_Type:Other;

PCI#3 Name_Property:SUNW,qlc; Card_Type:Other;

PCI#4 Name_Property:SUNW,qlc; Card_Type:Other;

XSCFU Status:Normal,Active; Ver:0101h; Serial:BF11071FKN ;

+ FRU-Part-Number:CF00541-0481 05 /541-0481-05 ;

OPNL Status:Normal; Ver:0101h; Serial:NN11052TLU ;

+ FRU-Part-Number:CF00541-0850 06 /541-0850-06 ;

PSU#0 Status:Normal; Serial:0017527-1108023275;

+ FRU-Part-Number:CF00300-2311 0150 /300-2311-01-50;

+ Power_Status:On; AC:200 V;

PSU#1 Status:Normal; Serial:0017527-1012024046;

+ FRU-Part-Number:CF00300-2011 0250 /300-2011-02-50;

+ Power_Status:On; AC:200 V;

FAN_A#0 Status:Normal;

FAN_A#1 Status:Normal;

FANBP_B Status:Normal; Ver:0401h; Serial:NN110736WD ;

+ FRU-Part-Number:CF00541-3098 01 /541-3098-01 ;

FAN_B#0 Status:Normal;

FAN_B#1 Status:Normal;

XSCF>

2showlogs

showlogs 命令可用于从最早日期开始按时间戳顺序显示指定日志的内容。showlogs

命令显示下列日志:

错误日志

电源日志

事件日志

温度和湿度记录

监视消息日志

控制台消息日志

应急消息日志

IPL 消息日志

XSCF> showlogs error

Date: May 05 15:03:27 CST 2014 Code: 80002000-c6ff0000-0104340700000000

Status: Alarm Occurred: May 05 15:03:26.996 CST 2014

FRU: /FAN_A#0

Msg: Unit disappeared unexpectedly

Date: May 05 15:04:23 CST 2014 Code: 80002000-c6ff0000-0104080100000000

Status: Alarm Occurred: May 05 15:04:23.572 CST 2014

FRU: /FAN_A#0

Msg: Unit detected unexpectedly

Date: May 05 15:06:53 CST 2014 Code: 80002000-c6ff0000-0104340700000000

Status: Alarm Occurred: May 05 15:06:53.420 CST 2014

FRU: /FAN_A#0

Msg: Unit disappeared unexpectedly

Date: May 05 15:07:34 CST 2014 Code: 80002000-c6ff0000-0104080100000000

Status: Alarm Occurred: May 05 15:07:34.836 CST 2014

FRU: /FAN_A#0

Msg: Unit detected unexpectedly

Date: Feb 07 13:20:46 CST 2016 Code: 80002000-c3ff0000-0104320100000000

Status: Alarm Occurred: Feb 07 13:20:44.966 CST 2016

FRU: /PSU#1

Msg: PSU failed

Date: Jan 23 02:36:06 CST 2018 Code: 60000000-8a2a0000-10cc000000000000

Status: Warning Occurred: Jan 23 02:36:05.765 CST 2018

FRU: /MBU_A/MEMB#1/MEM#1B

Msg: DIMM permanent correctable error

Date: Sep 06 13:11:15 CST 2018 Code: 60000000-8a2a0000-10cc000000000000

Status: Warning Occurred: Sep 06 13:11:15.396 CST 2018

FRU: /MBU_A/MEMB#0/MEM#2A

Msg: DIMM permanent correctable error

3showstatus

showstatus 可用于显示服务器上已降级的 FRU 的相关信息。已降级的单元用星号 (*)

指示出来,同时会显示以下任一状态:

Normal

Faulted

Degraded

Deconfigured

Maintenance

XSCF> showstatus

MBU_A Status:Normal;

MEMB#0 Status:Normal;

* MEM#2A Status:Degraded;

4fmadump

bash-3.2# fmdump

TIME UUID SUNW-MSG-ID

Sep 06 13:04:37.2512 168620e1-a275-e9ed-bbff-d8f9da784bc8 SUN4U-8000-2S

bash-3.2# fmdump -V -u 168620e1-a275-e9ed-bbff-d8f9da784bc8

TIME UUID SUNW-MSG-ID

Sep 06 2018 13:04:37.251251000 168620e1-a275-e9ed-bbff-d8f9da784bc8 SUN4U-8000-2S

nvlist version: 0

version = 0x0

class = list.suspect

uuid = 168620e1-a275-e9ed-bbff-d8f9da784bc8

code = SUN4U-8000-2S

diag-time = 1536210277 204244

de = (embedded nvlist)

nvlist version: 0

version = 0x0

scheme = fmd

authority = (embedded nvlist)

nvlist version: 0

version = 0x0

product-id = SUNW,SPARC-Enterprise

chassis-id = BDF1115196

server-id = sunm4k_1

(end authority)

mod-name = cpumem-diagnosis

mod-version = 1.7

(end de)

fault-list-sz = 0x1

topo-uuid = 4ede8959-9768-eb1c-b6f5-f9f9af63c97c

fault-list = (array of embedded nvlists)

(start fault-list[0])

nvlist version: 0

version = 0x0

class = fault.memory.dimm

certainty = 0x5f

asru = (embedded nvlist)

nvlist version: 0

version = 0x0

scheme = mem

unum = /MBU_A/MEMB0/MEM2A

serial = 3F63E609:HYMP125P72CP4-Y5

authority = (embedded nvlist)

nvlist version: 0

product-id = SUNW,SPARC-Enterprise

server-id = sunm4k_1

(end authority)

(end asru)

fru = (embedded nvlist)

nvlist version: 0

version = 0x0

scheme = mem

unum = /MBU_A/MEMB0/MEM2A

serial = 3F63E609:HYMP125P72CP4-Y5

authority = (embedded nvlist)

nvlist version: 0

product-id = SUNW,SPARC-Enterprise

server-id = sunm4k_1

(end authority)

(end fru)

(end fault-list[0])

fault-status = 0x1

severity = Major

__ttl = 0x1

__tod = 0x5b90b565 0xef9c938

bash-3.2#

使用 -V 选项时,用户至少会看到另外三行输出:

第一行是以前在控制台消息中显示过的信息摘要,但是现在包括时间戳、UUID

消息 ID

第二行是有关诊断确定情况的声明。在本例中,完全可以确信故障出现在所示的 ASIC

中。诊断可能涉及到多个组件,这时会显示多行,例如,此处显示了两行,每行描述

一个组件。

"FRU" 开头的行声明使服务器恢复到完全正常状态必须更换的部件。

"rsrc" 开头的行说明此故障导致了哪个组件失效。

bash-3.2# fmdump -e

TIME CLASS

Jan 23 2018 02:01:14 ereport.asic.mac.mi-ce

Jan 23 2018 02:01:14 ereport.asic.mac.ptrl-ce

Jan 23 2018 02:01:24 ereport.asic.mac.mi-ce

Jan 23 2018 02:01:35 ereport.asic.mac.mi-ce

Jan 23 2018 02:01:35 ereport.asic.mac.ptrl-ce

Jan 23 2018 02:01:46 ereport.asic.mac.mi-ce

Jan 23 2018 02:01:57 ereport.asic.mac.ptrl-ce

………

5fmadm faulty/config

bash-3.2# fmadm faulty

--------------- ------------------------------------ -------------- ---------

TIME EVENT-ID MSG-ID SEVERITY

--------------- ------------------------------------ -------------- ---------

Sep 06 13:04:37 168620e1-a275-e9ed-bbff-d8f9da784bc8 SUN4U-8000-2S Major

Host : sunm4k_1

Platform : SUNW,SPARC-Enterprise Chassis_id : BDF1115196

Product_sn :

Fault class : fault.memory.dimm 95%

Affects : mem:///unum=/MBU_A/MEMB0/MEM2A

faulted but still in service

FRU : mem:///unum=/MBU_A/MEMB0/MEM2A 95%

faulty

Serial ID. : 3F63E609:HYMP125P72CP4-Y5

Description : The number of correctable errors associated with this memory

module has exceeded acceptable levels.

Response : Pages of memory associated with this memory module have been

removed from service, up to a limit which has now been reached.

Impact : Total system memory capacity has been reduced.

Action : Use 'fmadm faulty' to provide a more detailed view of this event.

Please refer to the associated reference document at

http://sun.com/msg/SUN4U-8000-2S for the latest service

procedures and policies regarding this diagnosis.

bash-3.2# fmadm config

MODULE VERSION STATUS DESCRIPTION

cpumem-diagnosis 1.7 active CPU/Memory Diagnosis

cpumem-retire 1.1 active CPU/Memory Retire Agent

disk-transport 1.0 active Disk Transport Agent

eft 1.16 active eft diagnosis engine

event-transport 2.0 active Event Transport Module

ext-event-transport 0.1 active External FM event transport

fabric-xlate 1.0 active Fabric Ereport Translater

fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis

fps-transport 1.0 active Solaris FP-Scrubber

io-retire 1.0 active I/O Retire Agent

snmp-trapgen 1.0 active SNMP Trap Generation Agent

sysevent-transport 1.0 active SysEvent Transport Agent

syslog-msgs 1.0 active Syslog Messaging Agent

zfs-diagnosis 1.0 active ZFS Diagnosis Engine

zfs-retire 1.0 active ZFS Retire Agent

6fmstat

XSCF> fmstat

module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz

eft 0 0 0.0 0.0 0 0 0 0 3.3M 0

event-transport 0 0 0.0 0.0 0 0 0 0 6.4K 0

faultevent-post 2 0 0.0 8.9 0 0 0 0 0 0

fmd-self-diagnosis 24 24 0.0 352.1 0 0 1 0 24b 0

iox_agent 0 0 0.0 0.0 0 0 0 0 0 0

reagent 0 0 0.0 0.0 0 0 0 0 0 0

sysevent-transport 0 0 0.0 8700.4 0 0 0 0 0 0

syslog-msgs 0 0 0.0 0.0 0 0 0 0 97b 0



0