Anda di halaman 1dari 50

<Insert Picture Here>

X4-2/2L ILOM Services TOI


x86_re_sp_ww_grp@oracle.com
August 2013
1

X4-2/2L ILOM
Changes between X3 and X4
Versions in this release

<Insert Picture Here>

Platform Overview
Fault Diagnosis
Managing FRUs
Policies

Troubleshooting

2012 Oracle Corporation Proprietary and Confidential

Changes on X4-2/2L

UEFI config enhancement


Update UEFI config version to v2.3.1 (Align with update in X4-2/L BIOS)

Adjust RAS rule for IVB-EP CPU

Support for two Has (Home Agent) for 12 cores IVB-EP processor;
DIMM address decoding for IVB-EP platform;
IIO/PCIE diagnostic updates for IVB-EP platform;
PECI interface error check and report.

Some new faults for IVB-EP platform

New faulty:
fault.cpu.intel.peci.interface-error
fault.memory.intel.dimm.init-failed
fault.memory.intel.dimm.nonecc-mixture
fault.memory.intel.dimm.ext-addr-unsupported

Support for some new PCIE cards (Identification, Cooling)

IB CX3
Aura 2.1 (F80)

2012 Oracle Corporation Proprietary and Confidential

X4-2/2L SW 1.0 includes


Oracle System Assistant (OSA) version 1.0.0
ILOM Versions
3.1.2.30 (X4-2)
3.1.2.32 (X4-2L)
HMP Version 2.2.7
Contains IPMItool & IPMIflash
BIOS Version(s)

25.01.06.00 (X4-2)

26.01.06.00 (X4-2L)

2012 Oracle Corporation Proprietary and Confidential

Merrimack SW 1.0 is ILOM 3.1-based


Core ILOM 3.1 Documentation
http://www.oracle.com/pls/topic/lookup?ctx=ilom31

Core ILOM 3.1 Service TOI


https://stbeehive.oracle.com/content/dav/st/CoreILOM%20Development/Documents/Pres
entations/RPE-RE%20ILOM%203.1%20TOI/Oracle_ILOM3.1_TOI.v2.pdf

HMP Documentation
http://www.oracle.com/pls/topic/lookup?ctx=ohmp

X4-2/2L Documentation
X4-2 TBD
X4-2L TBD

2012 Oracle Corporation Proprietary and Confidential

<Insert Picture Here>

Platform Overview

2012 Oracle Corporation Proprietary and Confidential

Legend

Platform Overview

x(,y)
*

X4-2/2L Components

Description

Max number of components


Not part of this platform
Populated on this platform

X4-2

X4-2L

x=1

x=2

x=1,y=3

x=1,y=23

Fan Modules
(NOTE: no fan board)

/SYS/FM[0-3]

CPU Connector Board

/SYS/MB/CONNBD

Processors

x=1

x=1

/SYS/MB/P[0-x]

Memory DIMMs

x=1

x=1

/SYS/MB/P[0-x]/D[0-7]

PCIe expansion slots

x=3

x=6

/SYS/MB/PCIE[0-x]

Disk Backplane(s)
Hard Disk(s)

2012 Oracle Corporation Proprietary and Confidential

Component NAC
/SYS/DBP[0-x]
/SYS/DBP[0-x]/HDD[0-y]

<Insert Picture Here>

Fault Diagnosis

2012 Oracle Corporation Proprietary and Confidential

Fault Diagnosis
Finding faults from CLI
System Health show /System/

Under Properties:

health = Service Required


health_details = P1/D5 (CPU 1 DIMM 5) is faulty. Type 'show
/System/Open_Problems' for details.

open_problems_count = 1

Fault summary show /SP/faultmgmt


Fault details show /SP/faultmgmt -level all
Fault messages show /SP/logs/event/list

2012 Oracle Corporation Proprietary and Confidential

Fault Diagnosis
Finding faults from BUI
System Information

Status under Summary

Overall Status: Service Required

Service Required links to

Open Problems

Links to Reference Documents


http://www.sun.com/msg/SPX86-8004-0D

Fault Logs

Under ILOM Administration


Logs

Events tab

Filter on Class:fault

2012 Oracle Corporation Proprietary and Confidential

Fault Diagnosis
Finding faults (continued)
SNMP traps, email notifications

ILOM CLI: set up at /SP/alertmgmt/rules/# (# is a number)


ILOM BUI: set up at ILOM Administration --> Notifications -> Alerts tab

Do NOT rely on IPMI SEL entries or contents of


/var/log/messages

Any error in the SEL or messages file does not necessarily


mean theres a fault in the system
IPMI events and other system events are analyzed by the
fault diagnosis system to see if there is indeed a fault in the
system. If something is broken, a fault will be posted.

2012 Oracle Corporation Proprietary and Confidential

Fault Diagnosis
Examining faults from CLI
Example of Fault logged to ILOM Event Log
-> show /SP/logs/event/list/
Event
ID

Date/Time

Class

Type

Severity

----- ------------------------ -------- -------- --------

90

Mon Jul 29 06:23:36 2013 Fault

Fault

critical

Fault detected at time = Mon Jul 29 06:23:36 2013. The suspect component:
/SYS/MB/P1/D5 has fault.memory.intel.dimm.memtest-disable with probabili
ty=100. Refer to http://www.sun.com/msg/SPX86-8004-0D for details.

2012 Oracle Corporation Proprietary and Confidential

12

Fault Diagnosis
Examining faults from BUI

2012 Oracle Corporation Proprietary and Confidential

13

Fault Diagnosis
The faultmgmt Shell
start from CLI, then run fmadm faulty
-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
faultmgmtsp> help
Built-in commands:
echo - Display information to user.
Typical use: echo $?
help - Produces this help.
Use 'help <command>' for more information about an external command.
exit - Exit this shell.
External commands:
fmadm - Administers the fault management service
fmdump - Displays contents of the fault and ereport/error logs
fmstat - Displays statistics on fault management operations
etcd - ereport injector

2012 Oracle Corporation Proprietary and Confidential

14

Fault Diagnosis
The faultmgmt Shell: fmadm faulty
faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- -------Time
UUID
msgid
Severity
------------------- ------------------------------------ -------------- -------2013-07-29/06:23:36 223391c3-209c-4766-bc05-cf5042754fd9 SPX86-8004-0D Major
Fault class : fault.memory.intel.dimm.memtest-disable

ASRU

FRU

: /SYS/MB/P1/D5
faulted
: /SYS/MB/P1/D5
(Part Number: 001-0003-01,HMT31GR7CFR4A-PB)
(Serial Number: 00AD011213496B004B)
100%
faulty

Description : Memory DIMMs failed with memtest errors.


Response : The service-required LEDs for the affected memory DIMMs and
chassis will be illuminated.
Impact

: The host OS will be able to boot, but the affected memory


DIMMs will be disabled.

Action

: Please refer to the associated reference document at


http://www.sun.com/msg/SPX86-8004-0D for the latest service
procedures and policies regarding this diagnosis.

2012 Oracle Corporation Proprietary and Confidential

15

Fault Diagnosis
Locating faulty components
Indicators

Service Required LED will always be lit


Component or subsystem-specific Service LEDs will be lit
when applicable

Fault Remind Button

2012 Oracle Corporation Proprietary and Confidential

16

Fault Diagnosis
Fixing faults
If a FRU is faulted, replace it
A fault needs to be manually repaired if FRU does not

have a FRUID PROM (i.e. ILOM will remember the fault)


If the FRU has a FRUID PROM and it is moved to a new chassis,
the fault will follow the FRU.

faults associated with ambient air temperatures, lack


of AC input or missing FRUs will get repaired
automatically when the problem is fixed
user always has the option of manually repairing a
fault (CLI only)

2012 Oracle Corporation Proprietary and Confidential

17

Fault Diagnosis
Manually repairing faults
-> show /SYS/MB/P1
/SYS/MB/P1
Targets:
D0
D7
...
Properties:
type = Host Processor
ipmi_name = MB/P1
fru_name = Genuine Intel(R) CPU @ 2.40GHz
fru_version = 05
fru_part_number = 060D
fault_state = Faulted
clear_fault_action = (none)
-> set /SYS/MB/P1/ clear_fault_action=true
Are you sure you want to clear /SYS/MB/P1 (y/n)? y

Set 'clear_fault_action' to 'true'

2012 Oracle Corporation Proprietary and Confidential

18

Fault Diagnosis
More information on faults
See the IVB-EP portfolio:
https://zebulon.us.oracle.com/fcgi/portfolio.fcgi?GO=GUI::portfolios::details&location=2013/001.ivb_ep_platform

SP events
Environmental/chassis events
Memory ECC and CPU (MCA) events
MRC errors and warnings, boot progress events
IOH (QPI, QPIP, PCIE, ESI, Core, Therm, Misc) events

*https://zebulon.us.oracle.com/fcgi/portfolio.fcgi?GO=GUI::portfolios::details&location=2011/007.snb_ilom

2012 Oracle Corporation Proprietary and Confidential

19

<Insert Picture Here>

Managing FRUs

2012 Oracle Corporation Proprietary and Confidential

20

FRU Removal/Replacement
Merrimack PSNC Quorum Containers
Some FRUs contain vital product data and have special handling requirements:
Primary: fruid:///SYS/DBP
Backup 1: fruid:///SYS/MB
Backup 2: fruid:///SYS/PSx

NOTE: The PSU is automatically selected at boot-time. Default is PS0.

Showing container contents


[(flash)root@nsgsh-dhcp-93-96:~]# showpsnc
Primary: fruid:///SYS/DBP0
Backup 1: fruid:///SYS/MB
Backup 2: fruid:///SYS/PS0
Element
| Primary
| Backup1
| Backup2
------------------+-------------------+-------------------+------------------PPN
NASHUAPLUS
NASHUAPLUS
NASHUAPLUS
PSN
1150FML002
1150FML002
1150FML002
Product Name
SUN SERVER X4-2
SUN SERVER X4-2
SUN SERVER X4-2

2012 Oracle Corporation Proprietary and Confidential

21

FRU Removal/Replacement (cont'd)


Primary Container
Commands are via the sunservice account login

Set product data


[(flash)root@nsgsh-dhcp-93-96:~]# setpsnc
Reading fruid:///SYS/DBP0...
PPN ['NASHUAPLUS']:
PSN ['1150FML002']:
Product Name ['SUN SERVER X4-2']:
PPN
NASHUAPLUS
PSN
1150FML002
Product Name
SUN SERVER X4-2
Is the above correct? (y|n) [n]: y
Writing fruid:///SYS/DBP0...
You will need to reboot the SP for these changes to take full effect.

2012 Oracle Corporation Proprietary and Confidential

22

FRU Removal/Replacement (cont'd)


Primary Container
Commands are via the sunservice account login

Initializing Quorum containers


[(flash)root@ORACLESP-1137FM501B:~]# copypsnc
Number of arguments is incorrect.
Usage:
copypsnc [-n] <src> <dest>
where <src> is PRIMARY|BACKUP1|BACKUP2
<dest> is PRIMARY|BACKUP1|BACKUP2
-n: If src is a bilingual FRU, copy from new-style record.
PRIMARY: fruid:///SYS/DBP0
BACKUP1: fruid:///SYS/MB
BACKUP2: fruid:///SYS/PS1
Backup -> Primary
[(flash)root@ORACLESP-1137FM501B:~]# copypsnc backup1 primary
Primary -> Backup
[(flash)root@ORACLESP-1137FM501B:~]# copypsnc primary backup1
[(flash)root@ORACLESP-1137FM501B:~]# copypsnc primary backup2

2012 Oracle Corporation Proprietary and Confidential

23

<Insert Picture Here>

Policies

2012 Oracle Corporation Proprietary and Confidential

24

Configuration of Policies
From the CLI
-> show /SP/policy
/SP/policy
Targets:
Properties:
ENHANCED_PCIE_COOLING_MODE = disabled
HOST_AUTO_POWER_ON = disabled
HOST_LAST_POWER_STATE = disabled
Commands:
cd
set
show

-> set /SP/policy/ ENHANCED_PCIE_COOLING_MODE =enabled

-> set /SP/policy/ HOST_LAST_POWER_STATE=enabled

2012 Oracle Corporation Proprietary and Confidential

Configuration of Policies
From the BUI

2012 Oracle Corporation Proprietary and Confidential

26

<Insert Picture Here>

Troubleshooting

2012 Oracle Corporation Proprietary and Confidential

27

Troubleshooting
Taking a snapshot
For difficult issues it may be necessary to collect debug information to help troubleshoot a problem. ILOM has the
ability to do this through the snapshot command.
From the sunservice prompt you may take a snapshot either to a file or across the network.
This is the typical command syntax:
[(flash)root@ORACLESP:~]# snapshot -Loih -u file:///dev/shm

Then scp the snapshot file off of sp from an external system.


Or you can capture the snapshot and copy it off sp in one line:
[(flash)root@ORACLESP:~]# snapshot -Loih -u sftp://<username>:<password>@<ip or
hostname>/<dir>

2012 Oracle Corporation Proprietary and Confidential

28

Troubleshooting
If the SP hangs
In the event the SP hangs, a hardware watchdog will reboot the SP automatically. It should not
be necessary to reset the SP manually anymore. If ILOM crashes repeatedly in a short period,
the SP will stop in the Preboot Menu with the amber Fault LED lit, and the green SP OK LED
extinguished.

Check the serial console to confirm the SP is at the Preboot> Menu


Capture SP console messages for escalation
Type "reset" in Preboot Menu to retry starting the SP
Cycle AC power if practical

To recover a dead SP
If ILOM will not start, we recommend looking at the serial console first.
The SP will report (right after "Primary Bootstrap") if the ILOM flash image is corrupted.
Use "Preboot> host enable-on" if necessary, then press the Power button to start the host.
Use "ipmiflash -I pci" from the host to flash the ILOM package file.
NOTE: In some cases you may need to hold the Locate button while applying AC power, to
access the Preboot menu.

2012 Oracle Corporation Proprietary and Confidential

29

Troubleshooting

[(restricted_shell) ORACLESP:~]$ help

Restricted Shell

ac_off_sp

ipv6_cfg_dad

cat

less

echo

ls

egrep

lsdir

fgrep

more

uptime

sp_trace_view
statistics

tail
top

ncsi-check-links.sh

grep
traceroute

showpsnc

head

ping

hwdiag

reboot

ac_off_sp - Takes SP to the point were most processes have been stopped. Remove
AC power or restart via serial console.
Ipv6_cfg_dad Enables(or Disables) IPV6 duplicate address detection
showpsnc Displays PSNC quorum information

2012 Oracle Corporation Proprietary and Confidential

30

Troubleshooting
Restricted Shell
sp_trace_view Logging facility ported from SPARC platforms
[(flash)root@nsgsh-dhcp-93-96:~]# sp_trace_view -m
Trace Buffer Format Version Number = '4'
Trace Buffer Revision Number = 'galaxy/sp-merrimackplus.ast2300::82439'
Name
ID
Level CompMask Flags
Offset:Size
HiTmStmp Lost_Cnt
-------- ------ -------- ----- -------- -------- -------- -------OTHER
0
FUNC ffffffff 00001 000018e0:00049070 00000000 00000000
CAPI
1
FUNC ffffffff 00001 0004a950:00049070 0004e2a0 00000000
LUAPI
2
WARN ffffffff 00001 000939c0:000331e8 0004e2a0 00000000
PSNC
4
FUNC ffffffff 00001 000c6ba8:000074d8 0004e2a0 00000000
POD
5
ERROR ffffffff 00001 000ce080:00049070 0004e2a0 00000000

[(flash)root@nsgsh-dhcp-93-96:~]# sp_trace_view -r pod


POD
2013-07-29 06:21:15.213033
829
pod_event.c:56
event subscribe to finish
POD
2013-07-29 06:21:15.214983
889
pod_event.c:92
events
POD
2013-07-29 06:21:15.231113
889
pod_event.c:121
events
POD
2013-07-29 06:21:15.231176
889
pod_daemon.c:130
POD
2013-07-29 06:21:15.231305
889
pod_daemon.c:54
POD
2013-07-29 06:21:15.231333
889
pod_daemon.c:114
POD
2013-07-29 06:21:17.036250
892
pod_server.c:1938
prog id=0x30000001(805306369) ver=0x1
POD
2013-07-29 06:21:17.494142
892
pod_server.c:85
/SYS/DBP0/HDD1 failed status=-7
POD
2013-07-29 06:21:17.513205
892
pod_server.c:85
/SYS/DBP0/HDD2 failed status=-7
POD
2013-07-29 06:21:17.547640
892
pod_server.c:85
/SYS/DBP0/HDD3 failed status=-7
POD
2013-07-29 06:21:17.559160
892
pod_server.c:85
2012 Oracle Corporation
Proprietary
and Confidential
/SYS/DBP1/HDD4
failed
status=-7

RD_Indx
--------00000000
00000000
00000000
00000000
00000000

WR_Indx
-------00000000
00002021
00003685
00000215
0000206a

pod_event_wait_for_subscribe: wait 15s for


pod_event_subscribe: Start subscribing to
pod_event_subscribe: Done subscribing to
pod - EM Connected.
pod - processing state change from 0 to 3.
pod - finished processing state change.
rpc_svc_run: start listening for rpc msgs
rpc_get_inventory: inv walk component
rpc_get_inventory: inv walk component
rpc_get_inventory: inv walk component
rpc_get_inventory: inv walk component

31

Troubleshooting
Restricted Shell (contd)

ncsi-check-links.sh - NCSI/Sideband diagnostic tool


ncsi-check-links.sh: Version 1.0 (Platform sp-merrimack)
[-h]

Display usage.

-n {2|4}

Display link status for (2 or 4) ports.

-s a,b,c,d

Print FAILED if speed of port[0] doesn't match 'a', etc.


Each value is {10,100,1000,DOWN,ANY,IGNORE}.
IGNORE allows any errors; ANY matches any link status.

hwdiag - SP diagnostics tool. Can show SP information such as FRUID, LED, Temperature, Voltage,
CPLD, GPIO, I2C

2012 Oracle Corporation Proprietary and Confidential

32

<Insert Picture Here>

Q&A

2012 Oracle Corporation Proprietary and Confidential

33

<Insert Picture Here>

Appendix

2012 Oracle Corporation Proprietary and Confidential

34

FRUID Command Summary


All commands are via the sunservice account login

Update FRU contents


https://stbeehive.oracle.com/content/dav/st/FRUID/Public%20Documents/fruupdatemanpage_032009.txt

Clear FRU contents


https://stbeehive.oracle.com/content/dav/st/FRUID/Public%20Documents/clearfru-manpage.txt

Display FRU contents


https://stbeehive.oracle.com/content/dav/st/FRUID/Public%20Documents/prtfru-manpage.txt

Format DIMM SPDs


https://stbeehive.oracle.com/content/dav/st/FRUID/Public%20Documents/spdformatman3.txt

Extract binary FRU image


https://stbeehive.oracle.com/content/dav/st/FRUID/Public%20Documents/fruimage-manpage.txt

2012 Oracle Corporation Proprietary and Confidential

35

<Insert Picture Here>

Updating Configuration

2012 Oracle Corporation Proprietary and Confidential

36

Config: Programmables (cont'd)


All commands are via the sunservice account login
UPDATING INDIVIDUAL PROGRAMMABLES
The merrimack_update.sh update <device_name> - syntax will update individual part(s)
manually.
NOTE: The "-" sign in the command tells the script to use the image contained in the ILOM
package.
Example:
[(flash)root@ORACLESP-223444555:~]# merrimack_update.sh update nashua

UPDATING WITH A FILE


merrimack_update.sh update <device_name> /dev/shm/foo
REVERTING AN UPDATE

merrimack_update.sh revert <device_name> -

2012 Oracle Corporation Proprietary and Confidential

37

Config: Programmables (cont'd)


All commands are via the sunservice account login
merrimack_update.sh - Merrimack has it's own script for updating all programmables in the
system . As of this writing the current version is 0.1k. This section describes the various ways
it can be used.
CHECKING FOR UPDATES
The merrimack_update.sh probe syntax will show the versions of programmables in the
system and if they need an update, or are newer than programmables in the ILOM image.
Below is sample output of the command:
[(flash)root@ORACLESP-223444555:~]# merrimack_update.sh probe
System: Merrimack - Nashua - motherboard rev 1

probing Power CPLD/FPGA... Rev: 11 Build: 13 ILOM: 12 *


probing 4dbp#0 CPLD... Rev: 02 Build: 0f ILOM: 07 *
probing 4dbp#1 CPLD... Rev: 02 Build: 0f ILOM: 07 *
probing 4dbp#2 CPLD... [not found]
The following images need an update: nashua m4dbp0 m4dbp1 The following images are newer than
ILOM:

2012 Oracle Corporation Proprietary and Confidential

38

Config: Programmables (cont'd)


All commands are via the sunservice account login
SEMI-AUTOMATIC UPDATE
The merrimack_update.sh update all - syntax will update all the programmables in the
system. They are not automatically updated when ILOM is upgraded.
NOTE: The "-" sign in the command tells the script to use the image(s) contained in the ILOM
package.
Below is sample output of the command:
[(flash)root@ORACLESP-223444555:~]# merrimack_update.sh update all
System: Merrimack - Nashua - motherboard rev 1
probing Power CPLD/FPGA... Rev: 12 Build: 15 ILOM: 12
probing 4dbp#0 CPLD... Rev: 02 Build: 0f ILOM: 07 *
probing 4dbp#1 CPLD... Rev: 02 Build: 0f ILOM: 07 *
probing 4dbp#2 CPLD... [not found]
The following images need an update: m4dbp0 m4dbp1

The following images are newer than ILOM: Performing update on: m4dbp0 m4dbp1

2012 Oracle Corporation Proprietary and Confidential

39

Config: Updating the PSU firmware


All commands are via the sunservice account login
IMPORTANT NOTE: 2 working PSUs are required for any PSU FW update. This is because
the PSU being updated will be shut down, which would interrupt power to the SP for single PSU
systems.
Users need to contact the A256 and/or A258 PSU teams to get information on new firmware
released for these PSUs
READING THE REVISION INFORMATION
psupdate -n <psu slot #> -r

FLASH UPDATE A PSU'S SECONDARY MCU


psupdate -n <psu slot #> -s <filename>
FLASH UPDATE A PSU'S PRIMARY MCU
psupdate -n <psu slot #> -p <filename>

2012 Oracle Corporation Proprietary and Confidential

40

Config: CPLD/FPGA Rosetta Stone


Actual filenames for the various CPLD and FPGA images in Merrimack
are listed here in the rare circumstance where an update is distributed
outside of an ILOM release.

Mn1uc.jbc- X4-2 Power CPLD

Mc2uc_sfl.jbc - X4-2L power FPGA

pulse_nconfig.jbc - Reset image for power CPLDs

M4dbp.jbc X4-2/2L 4 disk backplane

M24dbp.jbc X4-2/2L 24 disk backplane

Megd.jbc - X4-2L SAS expander CPLD

2012 Oracle Corporation Proprietary and Confidential

41

Monitoring State
X4-2/2L Sensors (cont.)
Sensor NAC

Nashua Concord Sudbury IPMI Entity Type

Description

/SYS/MB/P[0-x]/PRSNT

Discrete Sensor

Host
Processor is
present

01hENTITY_PRESENT,
02h-ENTITY_ABSENT

/SYS/MB/P[0-x]/D[07]/PRSNT

Discrete Sensor

Host CPU
DIMM is
present

01hENTITY_PRESENT,
02h-ENTITY_ABSENT

/SYS/PS[0-1]/PRSNT

Discrete Sensor

01hPower Supply
ENTITY_PRESENT,
is present
02h-ENTITY_ABSENT

Discrete Sensor

Presence detected
Failure detected
Predictive Failure
Multi-state,
Power Supply input
Power Supply
lost
sensor type,
Power Supply input
per IPMI and
lost or out-of-range
AmberRoad.
Power Supply input
out-of-range
Configuration error

/SYS/PS[0-1]/STATE

2012 Oracle Corporation Proprietary and Confidential

42

Monitoring State
X4-2/2L Sensors (cont.)
Sensor NAC

Nashua Concord Sudbury IPMI Entity Type

/SYS/PS[0-1]/P_IN

Power Sensor

Input power
draw

/SYS/PS[0-1]/P_OUT

Power Sensor

Output power Watts

/SYS/PS[0-1]/V_IN

Voltage Sensor

Input voltage Volts

/SYS/PS[0-1]/V_12V

Voltage Sensor

12V rail
voltage

Volts

/SYS/PS[0-1]/V_3V3

Voltage Sensor

3.3V rail
voltage

Volts

/SYS/PS[0-1]/T_OUT

Temperature Sensor

Ambient
Temperature

Degrees C

2012 Oracle Corporation Proprietary and Confidential

Description
Watts

43

Monitoring State
X4-2/2L Sensors (cont.)
Sensor NAC

IPMI
Entity

Type

IPMI

Description

/SYS/PS[0-1]/I_OUT_ERR

Power Supply

Digital Discrete
Deasserted Asserted
Sensor

PS[0-1]/I_OUT_ERR

PS Output Current Error


1h-DEASSERTED
2h-ASSERTED

/SYS/PS[0-1]/I_OUT_WARN

Power Supply

Digital Discrete
Deasserted Asserted
Sensor

PS[0-1]/I_OUT_WARN

PS Output Current Warning


1h-DEASSERTED
2h-ASSERTED

/SYS/PS[0-1]/P_IN

Power Supply

Threshold Sensor

PS[0-1]/P_IN

PS Input Power (Watts)

/SYS/PS[0-1]/P_OUT

Power Supply

Threshold Sensor

PS[0-1]/P_OUT

PS Output Power (Watts)

/SYS/PS[0-1]/PRSNT

Power Supply

Entity Presence
Sensor (Discrete)

PS[0-1]/PRSNT

PS Present
1h-ENTITY_ABSENT,
2h-ENTITY_PRESENT

/SYS/PS[0-1]/T_ERR

Power Supply

Digital Discrete
Deasserted Asserted
Sensor

PS[0-1]/T_ERR

PS Temp Error
1h-DEASSERTED,
2h-ASSERTED

/SYS/PS[0-1]/T_OUT

Power Supply

Threshold Sensor

PS[0-1]/T_OUT

PS Output Exhaust Temp


(Degrees C)

/SYS/PS[0-1]/T_WARN

Power Supply

Digital Discrete
Deasserted Asserted
Sensor

PS[0-1]/T_WARN

PS Temp Warning
1h-DEASSERTED
2h-ASSERTED

2012 Oracle Corporation Proprietary and Confidential

44

Monitoring State
X4-2/2L Sensors (cont.)

Sensor NAC

IPMI
Entity

Type

IPMI

Description

/SYS/PS[0-1]/V_IN_ERR

Power Supply

Digital Discrete
Deasserted Asserted
Sensor

PS[0-1]/V_IN_ERR

PS Input Voltage Error


1h-DEASSERTED
2h-ASSERTED

/SYS/PS[0-1]/V_IN_WARN

Power Supply

Digital Discrete
Deasserted Asserted
Sensor

PS[0-1]/V_IN_WARN

PS Input Voltage Warning


1h-DEASSERTED
2h-ASSERTED

/SYS/PS[0-1]/V_OUT_ERR

Power Supply

Digital Discrete
Deasserted Asserted
Sensor

PS[0-1]/V_OUT_ERR

PS Output Voltage Error


1h-DEASSERTED
2h-ASSERTED

/SYS/PS[0-1]/V_OUT_OK

Power Supply

Digital Discrete
Deasserted Asserted
Sensor

PS[0-1]/V_OUT_OK

PS Output Voltage Ok
1h-DEASSERTED
2h-ASSERTED

/SYS/PS[0-1]/V_12V

Power Supply

Threshold Sensor

PS[0-1]/V_12V

PS Main Power 12V (Volts)

/SYS/PS[0-1]/V_3V3

Power Supply

Threshold Sensor

PS[0-1]/V_3V3

PS Standby Power 3.3V (Volts)

/SYS/T_AMB

System Chassis

Threshold Sensor

/SYS/T_AMB

Planned to be /SYS/FB/T_AMB
(Degrees C)

/SYS/VPS

Virtual Power Sensor (Watts)

2012 Oracle Corporation Proprietary and Confidential

/SYS/VPS

System Chassis

45
Threshold Sensor

Monitoring State
X4-2/2L Indicators (cont.)

Indicator NAC

IPMI
Entity

Type

IPMI

Description

/SYS/PS_FAULT

System Chassis

Indicator

/SYS/PS_FAULT

System Power Supply Fault LED

/SYS/SERVICE

System Chassis

Indicator

/SYS/SERVICE

System Service LED

/SYS/SP/OK

System Board

Indicator

/SYS/SP/OK

SP Ok

/SYS/SP/SERVICE

System Board

Indicator

/SYS/SP/SERVICE

SP Service LED

/SYS/TEMP_FAULT

System Chassis

Indicator

/SYS/TEMP_FAULT

System Temperature Fault

/SYS/FAN_FAULT

System Chassis

Indicator

/SYS/FAN_FAULT

System Fan Fault

2012 Oracle Corporation Proprietary and Confidential

46

Platform Overview
X4-2/2L Indicators
Indicator NAC

Nashua Concord Sudbury

Description

/SYS/DBP[0-x]/HDD[0-y]/OK2RM

Disk or Disk
Bay

Indicator

DBP[0-x]/HDD[0y]/OK2RM

Hard Disk OK to Remove LED

/SYS/DBP[0-x]/HDD[0-y]/SERVICE

Disk or Disk
Bay

Indicator

DBP[0-x]/HDD[0y]/SERVICE

Hard Disk Service LED

/SYS/MB/RHDD[0-1]/OK2RM

Disk or Disk
Bay

Indicator

MB/RHDD[0-1]/OK2RM

Hard Disk OK to Remove LED

/SYS/MB/RHDD[0-1]/SERVICE

Disk or Disk
Bay

Indicator

MB/RHDD[0-1]/SERVICE

Hard Disk Service LED

/SYS/MB/FM[0-3]/OK

Cooling Unit

Indicator

MB/FM[0-3]/OK

Fan OK LED

/SYS/MB/FM[0-3]/SERVICE

Cooling Unit

Indicator

MB/FM[0-3]/SERVICE

Fan Service LED

/SYS/LOCATE

System
Chassis

Indicator

/SYS/LOCATE

System Locate Indicator LED

/SYS/MB/P[0-3]/SERVICE

Processor

Indicator

MB/P[0-3]/SERVICE

Processer Service LED

/SYS/MB/P[0-3]/D[0-7]/SERVICE

Memory
Device

Indicator

P[0-3]//D[0-7]/SVC

DIMM Service Indicator

2012 Oracle Corporation Proprietary and Confidential

47

Fault Diagnosis
What is a fault?
fault event that indicates something is out of

specification

Types of Faults
Components - such as a cpu or dimm
Configuration
Invalid DIMM configuration in a memory channel
missing FRUs
Environmental - such as ambient air temperature
Infrastructure - no AC input to power supply

2012 Oracle Corporation Proprietary and Confidential

Fault Diagnosis
More information on faults: Message IDs and
Knowledge articles
fault events have associated URLs such as http:

//www.sun.com/msg/SPX86-8000-AE

SPX86-8000-AE is the message ID


URL gets you to the knowledge article which contains
additional details regarding the fault
- www.sun.com/msg is redirected to oracle.com

2012 Oracle Corporation Proprietary and Confidential

49

Fault Diagnosis
Persistence of faults
fault database in ILOM is persisted across cold boots
fault records in a FRUs FRUID PROM (when

available) are also persistent


NOTE: If you do not repair the fault on the FRU, and move the
FRU to a new chassis, the fault will show up in ILOM on the new
chassis

2012 Oracle Corporation Proprietary and Confidential

50