Introduction
When a CX series storage processor fails to boot, a diagnosis must be made to see if some minor corrective action is
sufficient to rectify the situation, if hardware must be replaced, or if the storage processor “boot-drives” must be re-imaged.
This document describes the procedure by which CX series array can be analyzed to make a decision. The actual procedure
for re-imaging an array is beyond the scope of this document.
Diagnosis
The primary goal of this diagnosis is to get the customer’s storage array running at 100% functionality as rapidly as possible.
A secondary goal is to retrieve necessary and sufficient information so that the root cause of the failure can be determined
off-site.
Certain types of failures will require a re-imaging a pair of SP boot drives by using the documented procedure for using a CX
series Recovery Drive. Other types of failures will require an SP replacement and others may require re-cabling or
reconfiguration of the array.
It should be noted that SPA boots from a mirrored boot drive on disks 0 and 2. SPB boots from a mirrored boot drive on disks
1 and 3.. If a drive fails physically, the SP will boot from the survivor. If the image becomes corrupted, it will likely happen
to both boot drives for an SP since they are a mirrored pair in the “Boot area”.
Main areas of Array function which could cause an array to not boot
• Storage Processor Power-on Self Test (POST) failure
• Storage Processor BIOS Test Failure
• Inaccessibility to the Boot Drives in the chassis where they reside
• Chassis with boot drives is not on back-end Bus 0
• Chassis with boot drives is not “enclosure 0”.
CLAR-PSP-093 Page 1 of 10
Determine the circumstances of the failure
Ask the customer the circumstances of the failure. What, if anything, was the customer doing when the SP failed? Determine
if one or both of the SPs fail to boot.
4 Hz Boot Phase
4 times per second If this Rate starts, POST has completed and the boot phase
has begun. If the Boot LED doe not go out and stay out, the
SP has not successfully booted.
See the following sections of this document to diagnose
• It Is Ping-able
If you are able to ping the faulty SP after the boot process has completed, this is a sign that NT has booted and the
network interface drivers have loaded. It is not likely that there are any hardware issues with the Storage Processor
at this time
• While monitoring the dynamic Ping, it alternates between answering the Ping and Time-outs.
If the SP was pinged dynamically then you may see the SP become ping-able for a while during the boot process and
then become unping-able again. This may repeat several times until the SP finally remains ping-able. This indicates
CLAR-PSP-0xx Page 2 of 10
that there has been a repeated panic/failure or reboot in one of the core software components or the peer SP is
resetting the SP being pinged. The SP may remain ping-able once the reboot count is exhausted after 4 unsuccessful
reboots. The SP may never become ping-able if the peer SP is constantly resetting it.
An indication of an SP that Boots NT from the its FC boot drives and panics, is rapid disk activity on a pair of Boot
drives (0/2 or 1/3) following the subsequent boot After the panic. If this panic/reboot occurs 4 times, the reboot
counter will be tripped and the SP will remain with Flare not running, but it may be running NT. The BOOT LED
on the SP Air Dam will be flashing at a rapid (4hz) rate indicating that the Flare driver/application has not
successfully loaded and begun. Call EMC/CLARiiON support. There is a possibility of Constant reboots of an SP.
This indicates that the SP reboot counter is being rest, not allowing the “4 reboot counter” from being reached. Call
EMC/CLARiiON Support.
If the attempt to SymRemote into faulty the SP is successful then Involve the CLARiiON Technical Sup[port group
to help use debug techniques to root cause the failure and take appropriate action. Determine what drivers have
started, look at event logs, collect any core dumps that are available, etc.
If the attempt to SymRemote into the faulty SP fails then we suspect a user space issue. The3r have been case where
NT has started but a user space component of the core software has hung and Symremote cannot access the SP.
There is really no choice but to re-image the Pair of boot drives effected via the Boot disk recovery procedure.
It should be noted that the reboot count protects an SP from failures in the majority of components of the core
software. If any of the core softare component that are not required for boot fails repeatedly then the reboot count
will eventually be exhausted and NT will boot without attempting to load the remainder of the core software. There
are several device drivers that are required to boot from Fibre and thus are not subject to this reboot count. A failure
of these drivers will make an SP un-bootable.
CLAR-PSP-0xx Page 3 of 10
Extended POST
Failures of the Extended POST diagnostics will result in an error code being displayed in the console output. The error
codes generated by the Extended POST diagnostics will attempt to isolate the fault to a field replaceable unit (FRU).
These error codes are documented in the Chameleon II & X1 Power On Self Test (POST) Functional Specifications.
Replace the FRU specified by the error code and restart the SP.
There are also non-fatal warning messages that are displayed by the Extended POST diagnostics on the console output.
If there are no Extended POST diagnostic failures then the storage processor should have access to the disks that make up the
mirrored boot drive. An SP could still fail to boot due to an inability to read from the boot drives. This could be due to
• backend loop failure
• faulty physical drive
• incorrect cabling.
The following diagram shows a CX600 that can not find the Boot chassis. This is due to a cabling error.
DDBS Failures
DDBS (Data Directory Boot Service) is a facility that is called by Extended POST to determine which half of the
mirrored boot drive the storage processor should boot from.
The DDBS console output in the table below shows that both halves of the mirrored boot drive are valid for boot. There
were no inconsistencies that would cause DDBS to disqualify either half of the mirror for booting NT. Extended POST
found the NT image and declared that disk 0 and 2 are both valid for booting this SP (SPA).
If DDBS finds any inconsistencies that cause it to disqualify a disk for boot then error messages are generated. It is
acceptable for one half of the mirror to be disqualified since a rebuild may be required.
CLAR-PSP-0xx Page 4 of 10
The below diagram shows a CX600 chassis with non-bootable drives in slots 1 & 2 and no drives in slots 2 & 3 (same as
non-bootable drives)
If the boot drives are in the correct slots there is likely a Utility partition for each SP. This would allow the Drives to be re-
imaged. See clar-psp-078
If an SP fails to boot and significant fewer reads have been performed than expected then the boot process has hung at
some point prior to completion. The SP boot drives should be re-imaged.
CLAR-PSP-0xx Page 5 of 10
No Console Output Failures
If there are no Console Output failures, you can Ping the SP, and you can symremote onto the SP , but it appears to be UN-
Managed by Navisphere and the management server does restart by going to ipaddress/setup and restarting the management
server, call EMC/CLARiiON Support.
The following is the typical console port output that is observed when booting from Fibre. Comments are in a blue italicized
font. Additional details can be found in the C2 and X1 Power On Self Test (POST) Functional Specifications.
CX600
ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
ÄÄÄÄÄ¿
³ PhoenixBIOS Setup Utility ³
³ ³
³ CPU Type : Intel(R) XEON(TM) System ROMz : E9DB - FFFF ³
³ CPU Speed : 2000 MHz BIOS Date : 11/04/02 ³
³ ³
³ System Memory : 640 KB COM Ports : 03F8 02F8 0300 0308 ³
³ Extended Memory : 4119552 KB LPT Ports : 03BC ³
³ Shadow Ram : 384 KB Display Type : EGA \ VGA ³
³ Cache Ram : 512 KB PS/2 Mouse : Not Installed ³
³ ³
³ Hard Disk 0 : None ³
³ Hard Disk 1 : None ³
³ Hard
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
AabcdBCDabEabcdFGHabIabcJabcKabLabMabcNabOabPabQabRabSabTabUabVabWabXYZAA
CLAR-PSP-0xx Page 7 of 10
FCDMTL 0 [2.4.1] DVM Duplicate address id already in list: E4
Finding the first 4 drives in the chassis. This does not mean they are bootable, only found.
Autoflash POST?
Autoflash BIOS?
This is where the SP would update POST or BIOS if required. IT would go back into a reboot at
this point if it needed to be updated
Disk Set: 1 3
--------------------------end-----------------------
CLAR-PSP-0xx Page 10 of 10