Advisor Webcast Exadata Disk Management and Troubleshotting Tips Final

ATTENTION
AUDIO OpEons
You can:
q Either listen the audio broadcast on your computer
q Or join teleconference (dial in)
Copyright 2014 Oracle and/or its aliates. All rights reserved. |
Voice Streaming Audio Broadcast

Listen only mode
Advantage: no need to dial in
What about QuesEons?
Type your quesEons into WebEx Q&A panel
If you prefer full audio access in order to ask
quesEons directly, please connect to our
teleconference
q Connect details you will nd at next slide
q
q
q
q
q
ATTENTION AUDIO INFORMATION

Teleconference Connect details:
1. Conference ID: 59043827
2. US Free Number: 877-236-4242
3. US & Canada Toll Number: 706-635-7928
4. List with naEonal toll free numbers is available in
Note ID: 1148600.1
You can view this info anyEme during the conference using
Communicate > Teleconference > Join Teleconference
from your WebEx menu
Safe Harbor Statement

The following is intended to outline our general product
direcEon. It is intended for informaEon purposes only, and may
not be incorporated into any contract. It is not a commitment to
deliver any material, code, or funcEonality, and should not be
relied upon in making purchasing decision. The development,
release, and Eming of any features or funcEonality described for
Oracles products remains at the sole discreEon of Oracle.
Exadata Disk Management

and Troubleshooting tips
Prabu Krishnamachari
Principal Sojware Engineer, Engineered Systems Support

Ericka Washington
Senior Principal Sojware Engineer, Engineered Systems Support

Jaime Figueroa
Senior Principal Sojware Engineer, Engineered Systems Support

Safe Harbor Statement

The following is intended to outline our general product direcEon. It is intended for
informaEon purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or funcEonality, and should not be relied upon
in making purchasing decisions. The development, release, and Eming of any features or
funcEonality described for Oracles products remains at the sole discreEon of Oracle.
ObjecEves
Understanding Exadata Disk Layout

Handling disk failures in Exadata
Basic troubleshooEng Eps for Disk replacement issue
Agenda
1
Disk Layout at Storage cell side
Disk Layout at ASM/DB node side
Exadata Auto Management
Replacing the failed/failing disk
TroubleshooEng disk replacement
Disk Layout at Storage

Each Exadata storage will have 12 disks
High Performance disk or High Capacity disk based on conguraEon choice
First two disks will have OS ,Exadata SW(system area)
First two disks System area mirrored by Sojware RAID(mdadm)
Physical disk lowest level visible to Disk controller
LUN is lowest level of Storage visible to Exadata SW
Cell disk is Higher level abstracEon for the data storage area on Physical
disk.
10
Exadata Storage Layout

Physical disks map to Cell Disks
Cell Disks parEEoned into one or mulEple Grid Disks
Grid Disks created in order of hooest rst to coldest porEon of the disk
last
ASM disk groups created from Grid Disks
Transparent above the ASM layer
Physical
Disk
Lun Disk
System Lun
Cell
Disk
Grid Disk 1
Grid
Grid Disk
Disk nn
ASM disk
ASM disk
11
Disk Layout at Storage

Physical Disk
LUN
Cell Disk
Grid Disk
12
Physical Disk

A Physical disk is an actual device within the storage cell that consEtutes a
single disk drive spindle
13
Physical Disk ..cont

LisDng the physical disk via Megacli
/opt/MegaRAID/MegaCli/MegaCli64 -Pdlist a0 | more
Adapter #0
Enclosure Device ID: 20
Slot Number: 0 ============ Disk slot Number
Device Id: 8
Sequence Number: 2
Media Error Count: 0
.
Firmware state: Online, Spun Up ========== It should be Online,Spun UP for acEve disk
SAS Address(0): 0x5000c5000561b071
14
LUN(Logical Unit Number)

LUN is the lowest level of storage abstracEon visible to the cell Sojware
First two LUN considered as system Lun
15
Cell Disk

Cell disk is an Oracle Exadata Cell abstracEon that is built on top of a Lun
Higher Level of abstracEon for the data storage on a Physical disk
It can be further divided into Grid disks, which are directly exposed to
ASM/DB
16
CreaDng Cell Disk

17
Grid Disk
Grid Disk is a potenEally nonconEguous parEEon of the cell disk that is directly exposed
to ASM to be used for ASM disk group creaEons
MulEple Grid disks can be created on a cell disk.

Grid disk = ASM disk
18
Grid Disk
To list griddisk to query name,size and oset
CellCLI> list griddisk where celldisk=CD_06_dmorlcel05 aoributes name,size,oset
DATA_DMORL_CD_06_dmorlcel05 2208G 32M
RECO_DMORL_CD_06_dmorlcel05 552.109375G 2208.046875G
DBFS_DG_CD_06_dmorlcel05 33.796875G 2760.15625G

Lower oset will be placed on outer most tracks (faster tracks).

Examples of Space AllocaDon for Grid Disks on an Exadata Database Machine (Doc ID
1513068.1)
19
SYSTEM LUN

sdb
sda
Sda3
Size 528 GB
sda5 to sda11
Sdb3
Cell disk
Size 528 GB
OS,Exadata SW,etc
size 29 GB
sdb5 to sdb11
size 29 GB
Mirrored by MDADM software RAID
20
NON SYSTEM LUN/DATA DISK

sdc
Cell
disk
sdc
sdb
DATA Grid
Disk
RECO Grid
Disk
DBFS Grid
Disk
21
22
LisDng disk parDDon informaDon for 3TB or 4TB disk

# parted /dev/sdad print

Model: LSI MR9261-8i (scsi)
Disk /dev/sdad: 3000GB
Sector size (logical/physical): 512B/512B
ParEEon Table: gpt

Number Start End Size File system Name Flags
1 32.8kB 123MB 123MB ext3 primary raid
2 123MB 132MB 8225kB primary
3 132MB 2964GB 2964GB primary
4 2964GB 2964GB 32.8kB primary
5 2964GB 2975GB 10.7GB ext3 primary raid
7 2985GB 2989GB 3221MB ext3 primary raid
9 2992GB 2994GB 2147MB linux-swap primary raid
10 2994GB 2995GB 732MB primary raid

InformaEon: Don't forget to update /etc/fstab, if necessary.
# parted /dev/sdb print

Model: LSI MR9261-8i (scsi)
Disk /dev/sdb: 3000GB
Sector size (logical/physical): 512B/512B
ParEEon Table: gpt

Number Start End Size File system Name Flags
1 32.8kB 123MB 123MB ext3 primary raid
2 123MB 132MB 8225kB ext2 primary
3 132MB 2964GB 2964GB primary
4 2964GB 2964GB 32.8kB primary
9 2992GB 2994GB 2147MB linux-swap primary raid
10 2994GB 2995GB 732MB primary raid

InformaEon: Don't forget to update /etc/fstab, if necessary.

23
MD (MulDpath Device)

MD device used to mirror two system Lun
This system area contains OS image,Swap,Exadata SW,logs and other
cong les
/dev/md5 & /dev/md6 System parEEon - root pariEon
/dev/md7 & /dev/md8 Sojware - exadata sojware installaEon
/dev/md4 - boot
/dev/md11 - /var/log/oracle - to storage cellos and crash les etc
at any point 4 md parEEon will be mounted
24

ROOT
Exadata SW
Boot
Cellos and
crash
25

26
Agenda

27
Local Disk Layout at DB node side

Each Exadata database node X2-2/X3-2/X4-2 contains 4 local physical
drives and X2-8/X3-8 contains 8 local physical disks.
The local disks are used for OperaEng system, Grid and RDBMS binaries
staging.
The disk drives in database nodes are maintained by LSI MegaRAID Disk
controller.
Hard disks in Database nodes are congured into RAID volumes and hot
swappable.
The volume management depends on the OS installed.
28
Local Disk Layout at DB node side..cont

V2/X2-2/X3-2 Linux only ( dual-boot Solaris image parEEon has been reclaimed or was
not present )
3-disk RAID 5 with 1 global hot spare on images 11.2.3.1.1 and earlier
4-disk RAID 5 on images 11.2.3.2.0 and later
X2-8 Linux only ( dual-boot Solaris image parEEon has been reclaimed)
7-disk RAID 5 with 1 global hot spare on images 11.2.3.1.1 and earlier
8-disk RAID 5 on images 11.2.3.2.0 and later
X3-8 Linux only:
8-disk RAID 5
29
Hotspare: What opDons are available

From Oracle Exadata release 11.2.3.2.1 customers have two opEons for the
hot-spare:

1. Keeping the hot-spare 'as-is' (i.e. not touching it, not adding it to the raid)
2. Claiming the hot-spare and adding it to the raid as part of the upgrade procedure to
11.2.3.2.1

NOTE: Both opEons only apply if the hot-spare isn't claimed already as part of a previous (11.2.3.2.0)
update. Customers who require their hot-spare 'back' ajer a previous upgrade (to 11.2.3.2.0) need to
reimage to a release < 11.2.3.2.0 and then upgrade to 11.2.3.2.1 and then follow the steps in this note -
guided by Oracle Support.
30

Hotspare is not reclaimed

# /opt/oracle.SupportTools/reclaimdisks.sh -check
[INFO] This is SUN FIRE X4170 M2 SERVER machine

[INFO] Number of LSI controllers: 1
[INFO] Physical disks found: 4 (252:0 252:1 252:2 252:3)
[INFO] Logical drives found: 1
[INFO] Linux logical drive: 0
[INFO] RAID Level for the Linux logical drive: 5
[INFO] Dual boot installaEon: no
[INFO] LVM based installaEon: yes
[INFO] Physical disks in the Linux logical drive: 3 (252:0 252:1 252:2)
[INFO] Dedicated Hot Spares for the Linux logical drive: 0
[INFO] Global Hot Spares: 1 (252:3)
[INFO] Valid single boot conguraDon found for Linux: RAID5 from 3 disks
and 1 global hot spare disk

# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep

"Firmware state"
Hotspare reclaimed

# /opt/oracle.SupportTools/reclaimdisks.sh -check
[INFO] This is SUN FIRE X4170 M2 SERVER machine

[INFO] Number of LSI controllers: 1
[INFO] Physical disks found: 4 (252:0 252:1 252:2 252:3)
[INFO] Logical drives found: 1
[INFO] Linux logical drive: 0
[INFO] RAID Level for the Linux logical drive: 5
[INFO] Dual boot installaEon: no
[INFO] LVM based installaEon: yes
[INFO] Physical disks in the Linux logical drive: 4 (252:0 252:1 252:2 252:3)
[INFO] Dedicated Hot Spares for the Linux logical drive: 0
[INFO] Global Hot Spares: 0
[INFO] Valid single boot conguraDon found for Linux: RAID5 from 4 disks
with no global and dedicated hot spare disks

Firmware state: Online, Spun Up

Firmware state: Hotspare, Spun Down
31
Resizing the volume to extend existing file system or

create new (after reclaiming the hot spared in Linux)
Check RAID size
[root@dmorldb03 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -a0 | egrep "^RAID|^Size"
RAID Level : Primary-5, Secondary-0, RAID Level Qualier-3
Size : 835.394 GB
Check sda reects correct size or not

Disk /dev/sda: 896.9 GB, 896998047744 bytes
255 heads, 63 sectors/track, 109053 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 65 522081 83 Linux
/dev/sda2 66 36351 291467295 8e Linux LVM
/dev/sda3 36352 109054 583985248 8e Linux LVM
Check Volume group size reects correct size or not

[root@dmorldb03 ~]# vgdisplay VGExaDb -s
"VGExaDb" 277.96 GB [184.00 GB used / 93.96 GB free]

# pvs
PV VG Fmt Aor PSize PFree
/dev/sda2 VGExaDb lvm2 a-- 277.96G 93.96G
How to reclaim the 4th disk drive

on an Exadata database (compute)
node (Doc ID 1582139.1)
32

X2-2/X3-2 Solaris only, if dual-boot Linux image parEEon has been
reclaimed:
4 single-disk RAID0 volumes congured into 2 mirrored zpool's (rpool and data), on
images 11.2.2.3.2 and later
33
How to replace an Exadata Compute node HDD

To replace disk on database node:
How to Replace an Exadata Compute (Database) node hard disk drive (PredicDve
or Hard Failure) (Doc ID 1479736.1)
34
Disk Layout at ASM/DB side

ASM discovers the disks by reading cellip.ora le and asm_diskstring=o/*/
* parameter
In an Exadata ASM instance, ASM_DISKSTRING is set to o/*/*.
Grid disks from the same storage cell in Exadata are automaEcally assigned
to the same fail group.
35
Administering ASM Diskgroup with Exadata GridDisk

Determine which Exadata grid disks are available by querying V$ASM_DISK
view on the ASM Instance:
36
Conguring ASM disk group

DISKGROUP DATA NORMAL REDUNDANCY DISK o/*/data*
CREATE
ATTRIBUTE compaEble.rdbms=11.2.0.0.0,
compaEble.asm = 11.2.0.3.0,
cell.smart_scan_capable=TRUE,
content.type=data,
au_size =4M;

The new aoribute CONTENT.TYPE is required for 11.2.0.3 ASM or higher version to use cross-disk group
mirror isolaEon capability when the disk groups share the same disks and storage servers.
The following CONTENT.TYPE aoributes must also be set:
DATA ASM: CONTENT.TYPE= data
RECO ASM: CONTENT.TYPE=recovery
DBFS_DG ASM: CONTENT.TYPE= system
37
without CONTENT.TYPE sefng with CONTENT.TYPE agribute set
Slot 3
Slot 6
Slot 3
DATA 3
DATA 6
DATA 3
DATA 6
RECO 3
RECO 6
RECO 3
RECO 6
Slot 8
Slot 11
Slot 8
Slot 11
DATA 11
Slot 6
DATA 8
DATA 11
DATA 8
RECO 8
RECO 11
RECO 8
Cell 1 slot 3 and 8
RECO 11
DBFS 11
Cell 2 slot 6 and 11

38
without CONTENT.TYPE sefng with CONTENT.TYPE agribute set
Slot 3
Slot 6
Slot 3
DATA 3
DATA 6
DATA 3
DATA 6
RECO 3
RECO 6
RECO 3
RECO 6
Slot 8
Slot 11
Slot 8
Slot 11
DATA 11
Slot 6
DATA 8
DATA 11
DATA 8
RECO 8
RECO 11
RECO 8
Cell 1 slot 3 and 8
RECO 11
DBFS 11
Cell 2 slot 6 and 11

39
Agenda

40
AutomaDc Exadata management

What is automated ?
What requires human intervenEon ?
41
What is automated ?

OFFLINE/ONLINE operaEon of an ASM Disk.

DROP and ADD operaEon of an ASM disk.
42

What requires human intervenDon ?

Diskgroup mount is not automated. So if a diskgroup got dismounted due
to say loss of all physical mirrors, ASM admin would have to manually
mount the diskgroup when those disks become accessible again.
Users taken oine disks should be brought back online manually ajer
maintenance. Users drop disks should be added back manually.
XDMG and XDWK are new background processes which does the auto
management. Both are restart able process.
For more detail, kind refer
Auto disk management feature in Exadata (Doc ID 1484274.1)
43
Agenda

44
Replacing Failed/Failing disk

Type of disk failure
Disk Type (System disk or Data disk)
MS and disk controller monitors the disk
Cellcli list physicaldisk or Megacli64 command can be used to monitor
manually
MS generates the alert when the disk failure occurs.
You can congure the system for alert noEcaEons, then alert is sent by
email to designated address.
45
Types of Disk failure

Disk Failure (Dead disk)

Disk/Media Problem (PredicEve failure)
Poor performance
46

Disk Failure (Dead disk)

Disk Controller detects that the disk was dead
Exadata Auto management force drop the grid disks on the dead disk from
ASM disk group
Need to replace the disk ASAP
47

Disk/Media Problem (PredicDve failure)

MS detects a disk in predicEve failure

MS moves the Lun to warning and Physical disk to PredicEve failure
Celldisk and its associated grid disk goes to proacEve failure
Cellsrv sends message to drop the Grid disks from ASM side(Normal drop)
48

Poor Performance

MS detects a disk in poor performance
MS moves the Lun to warning and Physical disk to PredicEve failure
Celldisk and its associated grid disk goes to proacEve failure
Cellsrv sends message to force drop the Grid disks from ASM side .If Grid
disk cannot be force dropped (due to oine partners),the disks will be
dropped normal (get the data relocated out of the slow disk eventually.)
49

Disk Status

Dead disk
PredicDve failure disk
Poor performance disk
Physical disk status
criEcal
PredicEve failure
Poor performance
Lun status
warning
warning
warning
Cell disk status
Not present
ProacEve failure
ProacEve failure

Grid disk status
Not present
ProacEve failure

ProacEve failure

ASM acDon
Force drop
ASM will perform

normal drop( rebalance
data out from disk)
Will try to drop force .If

partner is not alive,then
normal drop
50

AutomaDc Removal of Underperforming Disk
From 11.2.3.2 onwards, an underperforming disk can be removed from the
acEve conguraEon.
Normal
status
1st Phase
2nd Phase
3rd Phase
Physical Disk
normal
warning - connedOnline

warning - connedOine'
warning - poor performance

Lun
normal
warning - connedOnline

warning - connedOine'

warning - poor performance
Cell Disk
normal
normal - connedOnline

normal - connedOine
predicDve failure
Grid Disk
acDve
acDve - connedOnline
acDve - connedOine

Runs calibrate IO on disks. If the

result is ne, revert back to
normal status
ASM AcDon
None
None
Take Grid disk oine if possible Online the disk if the disks are
ne/drop force the disk if it
detects performance
Refer:- IdenDcaDon of Underperforming Disks feature in Exadata (Doc ID 1509105.1)

51

Things to check before replacing disk/pulling the disk out

First idenEfy the disk that needs to be replaced at cell side
System disk or Data disk?
IdenEfy the corresponding the LUN/cell disk and Grid disks
If its system disks, then check the partner status in mdadm sojware RAID
Check the grid disks are online at ASM side or not
Check any rebalance is going on
52

Replacing the Disk (Hard failure or PredicDve disk)

IdenDfy the disk to be replaced
cellsrv would have send an alert related to disk failure if you have congured the
system for alert noEcaEon .the below is an example of the email received

53

Drop Physical Disk for Replacement

From 11.2.3.3.0 and later
Execute this command to replace disk.
CellCLI> ALTER PHYSICALDISK <disk_id/disk_name> DROP FOR REPLACEMENT

Check various condiEons to decide whether it is safe to remove a hard disk or not.if all
condiEons are met, prepare the hard disk for replacement. otherwise, output the
reason why the hard disk cannot be removed
Must be successfully executed before a hard disk can be removed if its blue OK-to-
Remove LED is not on
To Undo above command
CellCLI> ALTER PHYSICALDISK 20:2 REENABLE
54

Cont..
11.2.3.2.1 and earlier version (Manual method to verify)
1. check the grid disks status on ASM side
Use the following queries to validate that Grid disks are DROPPED from ASM for proacEve failure .For hard
failure(dead disk ),the mode_status should be OFFLINE and mount_status =CLOSED.

col name format a30
col path format a40
col group_number number 99
col mount_status status a10
set linesize 200
select path,name,group_number,mount_status,mode_status from v$asm_disk where path like %CD_07_dmorlcel06%';
PATH
GROUP_NUMBER MOUNT_STATUS MODE_STATUS
o/192.168.10.6/SYSTEMDG_CD_07_dmorlcel06
CLOSED OFFLINE
o/192.168.10.6/DATA_CD_07_dmorlcel06
CLOSED
OFFLINE
o/192.168.10.6/RECO_CD_07_dmorlcel06
CLOSED
OFFLINE
55

2. If the grid disk shows online that needs to be replaced, drop those griddisk manually
alter diskgroup <dg name> drop disk <grid disk name> rebalance power 11;
Note :- recommended rebalance power limit 1 to 32 only.
3. Wait for rebalance to complete only for normal drop (PredicDve failure or online disk
replacement only)
SQL> select * from gv$asm_operaEon;
If it returns no rows ,then there is no rebalance going on currently

4. Verify MS process is running on the cell node before replacing the disk
CellCLI> list cell aoributes cellsrvStatus,msStatus,rsStatus detail
cellsrvStatus: running
msStatus: running
rsStatus: running
56

5. If its a System disk, then check the other system partner status

Run the below command to verify MD device volume status

for x in 1 2 4 5 6 7 8 11; do mdadm -Q --detail /dev/md$x; done
Sample output from one of the md device (output truncated and highlight required informaEon)
# mdadm -Q --detail /dev/md5
/dev/md5:
State : clean
Number Major Minor RaidDevice State
0 8 5 0 acDve sync /dev/sda5
1 0 0 1 removed
2 8 21 - faulty spare /dev/sdb5
The most important to check partner disk sate and State is clean, clean,degraded or acEve". If it is
"clean" is safe to hot remove, "acDve" is acEvely syncing the disk mirrors and should wait unEl it is
"clean" before hot removing the disk. If the disk is staying in "acEve" state, then follow the steps in
MOS Note1524329.1 to set it to removed before replacing

57

Perform the following steps to replace physical disk
Locate the failed disk (The system disks are lejmost in the system)
Validate the failed disk has the Amber LED turned on or Blue Ok-to remove LED on
Press the latch to release the eject lever
Remove the disk from the system using the lever
Place the new drive in the open slot
Close the lever
Verify the Green LED begins to icker as the system recognizes the new drive
When you replace a physical disk, the disk must be acknowledged by the RAID
controller before it can be used. This does not take a long Eme, and you can use the
LIST PHYSICAL command to monitor the status unEl it returns to normal.
58

Post Replacement check
1 Validate that the cell disk and Grid disk were created
Status should be normal for the cell disk and acEve for the grid disk. This can be used
using cellcli command line
cellcli e list celldisk
cellcli -e list griddisk
2 Connect to the ASM instance and idenDfy the status of the rebalance
operaDon
SQL> select * from gv$asm_operaEon;
An acEve rebalance operaEon can be idenEed by STATE=RUN. The column group
number and inst_id provide the disk group number of the disk group been
rebalanced and the instance number where the operaEon is running.
59

the following queries to validate that all failgroups have the same number of disks
3.Use
on the correct status. (MODE_STATUS = ONLINE or MOUNT_STATUS=CACHED)
select path,name,group_number,mount_status,mode_status from v$asm_disk where path like

'%CD_07_dmorlcel06%';

PATH
GROUP_NUMBER MOUNT_STATUS MODE_STATUS
o/192.168.10.6/SYSTEMDG_CD_07_dmorlcel06
1 CACHED ONLINE
o/192.168.10.6/DATA_CD_07_dmorlcel06
2 CACHED ONLINE
o/192.168.10.6/RECO_CD_07_dmorlcel06
3 CACHED ONLINE
If the disk shows as CANDIDATE ,then you need to add it manually using below command

alter diskgroup <DG NAME> add disk <disk path> rebalance power 11;
60

If its a System disk,then check the replaced disk synced with other system partner disk
4.
for x in 1 2 4 5 6 7 8 11; do mdadm -Q --detail /dev/md$x; done

Below is the correct output if the md devices are synchronized. The important data to validate are: the State,
that should be clean and the informaEon at the booom, where it shows the parEEons with the other disk
are in sync.

61

MOS ArDcle Reference:-

Oracle Exadata DiagnosDc InformaDon required for Disk Failures and some other
Hardware issues (Doc ID 761868.1)

How to Replace a Hard Drive in an Exadata Storage Server (Hard Failure) (Doc ID
1386147.1)
How to Replace a Hard Drive in an Exadata Storage Server (PredicDve Failure) (Doc ID
1390836.1)

Things to Check in ASM When Replacing an ONLINE disk from Exadata Storage Cell (Doc
ID 1326611.1)
62
Agenda

63
TroubleshooDng Dps for disk replacement problem

Check Physical Disk status
On the storage cell where disk was replaced, idenEfy the slot number and run megacli
and cellcli to verify the status.

# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a 0 |egrep -i 'slot|rmware state
results will be like:
Slot 02 Device 10 (HGST H7240AS60SUN4.0TA1CA1402E40AWX ) status is: Uncongured(good),Spun down

If it shows as Uncongured(bad), Spun down"or FAILED, then the replaced disk is having
issue. Please Contact Oracle Support

64
Check Physical Disk status

# cellcli e list physical disk
20:10 L1NWXQ not present = =====> old disk
20:10 L1NRQ3 criEcal =======> new disk

You can ignore the above duplicate entry for the replaced slot. We keep the old disk in
'not present' state for 7 days, so people can collect data like disk ids etc about the dead
disk even ajer the disk has been replaced.

I
65
Check Lun status

Run the below command to check Lun status
Cellcli e list lun

It will not show if the physical disk status showed as Uncongured(good),Spun down

You can verify same at OS level by running below command
# lsscsi|grep LSI

It should return 12 disk if the newly inserted disks lun has been created successfully.
If it shows 11 disks, then the newly inserted disk lun is not created.
66
For Cells running image version 11.2.3.3.0 or higher

In 11.2.3.3.0 ,a new command has been introduced

# cellcli e alter physicaldisk X:Y reenable force;
Where X: Enclosure ID and Y:slot number
(WARNING :- This command will reformat the disk.hence this should not run against
normal status disk)

The goal of this command is to execute in the background all the commands required to
make the disk available on the storage cell.
Ajer running this command ,validate the physical disk and lun shows as
normal.Celldisk/griddisks are created .if its system lun, check md device sync staus.

67
For Cells running image version 11.2.3.2.1 or earlier

i) Check Lun is present or not
cellcli e list lun
If not, please contact Oracle Support
ii) If the Lun is present,check celldisk is present or not
cellcli e list celldisk

iii) If celldisk is not present,then follow below arDcle steps to recreate celldisk and
griddisk
Note:1281395.1 Steps to manually create cell/grid disks on Exadata V2 if auto-create
fails during disk replacement
68
Summary
What we covered today
qExadata Storage and database node disk layout
qOverview of Exadata Auto management
qHandling disk failure
qThings to check before replacing disk
q TroubleshooEng Eps for disk replacement issue
Top ArEcles and Community Links

q Determining when Disks should be
replaced on Oracle Exadata Database
Machine (Doc ID 1452325.1)
q How to Replace an Exadata Compute
(Database) node hard disk drive
(PredicDve or Hard Failure) (Doc ID
1479736.1
q Exadata CriDcal Issues (Doc ID 1270094.1)
Learn More
Available References and Resources to Get ProacEve
qAbout Oracle Support Best PracEces
www.oracle.com/goto/proacEvesupport
qGet ProacEve in My Oracle Support
hops://support. oracle.com | Doc ID: 432.1
qMy Oracle Support Blog
hops://blogs.oracle.com/supportportal/
qAsk the Get ProacEve Team
get-proacEve_ww@oracle.com
Accessing My Oracle Support Community

1.
Via My Oracle Support -> Community

Tab
2.
Directly hops://communiEes.oracle.com
Where Can I Get The Slides From This Session?

1. PDF link from Doc ID 740966.1 (within 48 hours)
2. RAC/Scalability Community >Content Tab > Documents (within 24 hours)

Session Related Community Links
q The following thread will have a copy

of the presentaEon and can be used
for addiEonal quesEons or discussions
on this topic.
q h"ps://community.oracle.com/thread/
3570481
Oracle Advisor Webcast Program

LocaEng Current Schedule & Archived Recordings

qFrom Note ID : 740966.1

drill down to your area
of interest
qFor us, Oracle Database
qAccess the DB page
directly via Note ID :
1455369.1
Oracle Advisor Webcast Program

LocaEng Current Schedule & Archived Recordings For DB

Note:
q Click column headings to sort
q Hover on Webcast Title for more
informaEon
q Recordings available within 48
hours
q Advisor Webcast QuesEons on a
webcast or ask quesEons via the
QuesEons? link
Q & A
THANK YOU
79
Oracle Color Paleoe

Lights/Darks
R 255
G 255
B 255
R 95
G 95
B 95
Accents and default chart color order
R 220
G 227
B 228
R 127
G 127
B 127
R 255
G 0
B 0
R 138
G 19
B 59
R 255
G 119
B 0
R 70
G 87
B 94
R 141
G 166
B 177
R 176
G 195
B 200
80
AddiEonal Resources
Oracle Corporate Photography
Oracle Corporate Hardware Photography
my.oracle.com\site\mktg\creaEve\graphics\photography
my.oracle.com/site/mktg/creaEve/Graphics/Photography/cnt1375391.htm
Academic
Airline
AnalyEcs
ApplicaEon
ATM
Oracle Corporate Icons
Oracle Corporate Logos
my.oracle.com/site/mktg/creaEve/Graphics/Icons/index.html
my.oracle.com/site/mktg/creaEve/Logos/index.html
81
AUDIO INFO Join Teleconference
QuesDon and Answer InstrucDons

Q&A panel
3
2
Send your quesEon
Ask: ALL PANELLIST leave default!

1
type your quesEon here

QuesDon and Answer InstrucDons (cont)
your quesEon pop-up here

Advisor Webcast Exadata Disk Management and Troubleshotting Tips Final

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Advisor Webcast Exadata Disk Management and Troubleshotting Tips Final

Diunggah oleh

Hak Cipta:

Format Tersedia

ATTENTION

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Voice Streaming Audio Broadcast

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

ATTENTION AUDIO INFORMATION

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Safe Harbor Statement

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Exadata Disk Management

Safe Harbor Statement

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Understanding Exadata Disk Layout

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Disk Layout at Storage cell side

Disk Layout at ASM/DB node side

Exadata Auto Management

Replacing the failed/failing disk

TroubleshooEng disk replacement

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Disk Layout at Storage

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Exadata Storage Layout

Disk Layout at Storage

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Physical Disk ..cont

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

LUN(Logical Unit Number)

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

CreaDng Cell Disk

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

MulEple Grid disks can be created on a cell disk.

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Mirrored by MDADM software RAID

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

NON SYSTEM LUN/DATA DISK

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

LisDng disk parDDon informaDon for 3TB or 4TB disk

# parted /dev/sdb print

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Local Disk Layout at DB node side

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Local Disk Layout at DB node side..cont

Hotspare: What opDons are available

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Local Disk Layout at DB node side..cont

Hotspare is not reclaimed

[INFO] This is SUN FIRE X4170 M2 SERVER machine

# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep

[INFO] This is SUN FIRE X4170 M2 SERVER machine

Firmware state: Online, Spun Up

Copyright 2014 Oracle and/or its aliates. All rights reserved. |

Resizing the volume to extend existing file system or

Check sda reects correct size or not

Check Volume group size reects correct size or not

How to reclaim the 4th disk drive

Copyright 2014 Oracle and/or its aliates. All rights reserved. |