Anda di halaman 1dari 17

html  videos 2 node rac Solaris .

Commands :

Crs status: (root not required)

./crs_stat –t  all cluster resource information .
./crsctl check crs  information of background process/demon
ps –ef |grep d.bin  information on 3 background process .
crs_stat –p /-ls / -v  check crs profile
oifcfg getif  get cluster interconnect information .
crs_profile -print

crsctl check daemon .

srvctl modify database –dbname –y automatic/manual  automatic start/stop resource .
ps -ef | grep <instance_name> | grep pmon

$CRS_HOME/bin / –collect  as root user .

srvctl status database -d wwprd

srvctl status nodeapps -n wwprod3
srvctl status asm –n node_name

Starting /Stopping crs :

connect oracle user.
Login as ROOT user.
cd $CRS_HOME/bin
./crsctl stop crs
./crsctl start crs

crsctl disable crs  disable crs from coming up after reboot

crsctl enable crs  enable crs on next startup
These functions update the file /etc/oracle/scls_scr/node_name/root/crsstart that contain
the string enable or disable
srvctl start nodeapps -n green -4 ( starts ons, listener , gsd, vip )
srvctl start asm -n green –i asm1
srvctl start instance –d orcl –i orcl1
srvctl start database -d dctmdb
srvctl start listener -n node_name

srvctl stop database -d dctmdb -o immediate

srvctl stop instance –d orcl –i orcl1
srvctl stop asm -n green –i asm1 –o immediate
srvctl stop nodeapps -n green -4

If database does not reflect in crsstat :

1) crs_stat -t --> unknown state
crs_stop -f ora.SHCL1_prmy.SHCL1N02.inst  state should be offline before starting .
2) start the database using sqlplus
4) srvctl start asm -n node_name --> update status to online

Voting disk : ( contains node member information )

cd $CRS_HOME/bin
./crsctl query css votedisk  display current votedisk

dd if=backup_file_name of=voting_disk_name bs=4096  online backup

dd if=from_backup_file_name of=to_voting_disk_name  restore backup

crsctl delete css votedisk path –force  add voting disk

crsctl add css votedisk path -force  remove voting disk ( -f if crs is not
started )

we should use -force option of add/delete voting disk only if cluster in down .

ocrcheck  online integrity check of current ocr . Display current ocr file.
cluvfy comp ocr –n all  to verify ocr
ocrconfig – showbackup  list available physical backups .
ocrdump -backupfile <file_name> where file_name is backupfile .
ocrconfig -export /oracle/ocr_export.dmp1 -s online (as root user)
ocrconfig –backuploc /path/  change path of default backup place .
ocrconfig –replace ocrmirror {filename} to replace mirror file (can also be used to add
/ remove)
ocrconfig –replace ocr {filename}  to replace ocr file

Manual ocr backup :

crsctl stop crs
ocrconfig -export /tmp/ocr_bak  manually take logical backup of ocr .
crsctl start crs

restore ocr from logical backup:

crsctl stop crs  on all node
ocrconfig –import <backup path>
crsctl start crs  on all node

restore ocr from physical backup: ( as root user)

we need to keep on only that node on which restore will be done . the node needs to be
started in single user mode without crs daemons .
ocrconfig –showbackup
crsctl stop crs  local node
ocrconfig –restore <backup path-optional>
crsctl start crs  local node

OCR add disk image:

1.) Crsctl stop crs
2.) To create a RAW image of the OCR devices, such as: / dev/rhdisk6
3.) Ocrconfig-export /tmp/ocr_bak
4.) Edit /etc/oracle/ocr.loc file , add the line ocrmirrorconfig_loc (on all nodes)
$ Cat ocr.loc
ocrconfig_loc = / dev/rhdisk2
ocrmirrorconfig_loc = / dev/rhdisk6
local_only = FALSE
5.) Ocrconfig-import
6) Ocrcheck
On windows path is present in registry . \hklm\software\oracle\ocr

10) cluvfy stage –pre dbinst -n all

 need root access to manage CRS ( start/stop/debugging )
 resource’s managed by Cluster is known as CRS. It includes ons , gsd, vip, l
listener, asm , db and instance.
 “crsctl stop crs” -- operates on local node only
 Crs stack is created when we run at end of crs installation .
 CRS is REQUIRED to be installed and running prior to
installing 10g RAC
 crs home must be different than oracle_home .
 shared file for voting disk and ocr must be available
before installation of CRS .
 The script at the end of the CRS installation
starts the CRS stack. ( crs,css,evmd )
 If there is a network split (nodes lose communication with each
other). One or more nodes may reboot automatically to prevent
data corruption
 status of crs resources can be viewed by crs_stat –t
 If the CRS installation process fails, you need to re-initialize voting disk set, DD
can be used or re-roll dd if = / dev / zero of = / dev/rhdisk3 bs = 8192 count =
2560 Backup votedisk: dd if = / dev/rhdisk3 of = / tmp / votedisk.bak


CRSD: run as root user .
- Engine for HA operation
- Manages 'application resources'
- Starts, stops, and fails 'application resources' over
- Spawns separate 'actions' to start/stop/check application resources
- Maintains configuration profiles in the OCR (Oracle Configuration Repository)
- Stores current known state in the OCR.
- Runs as root
- Is restarted automatically on failure

OCSSD: (Cluster Synchronization Service)  manages clusterware

configuration Run a oracle user
- OCSSD is part of RAC and Single Instance with ASM
- Provides access to node membership
- Provides group services
- Provides basic cluster locking
- Integrates with existing vendor clusterware, when present
- Can also runs without integration to vendor clusterware
- Runs as Oracle.
- Failure exit causes machine reboot.
--- This is a feature to prevent data corruption in event of a split brain.

EVMD: ( runs as oracle user )

- Generates events when things happen
- Spawns a permanent child evmlogger
- Evmlogger, on demand, spawns children
- Scans callout directory and invokes callouts.
- Runs as Oracle.
- Restarted automatically on failure

 OCR records of members of the node configuration information, such as [u] database,
ASM, instance, listener, VIP, and other resources of the CRS configuration information
can be stored in the device or bare cluster file system, the recommended setting for the
size of 100MB
 if cluster installation fails we need to re-initialize ocr disk using “dd if = / dev / zero
of = / dev/rhdisk2 bs = 8192 count = 12800” .
 crs automatically backup ocr files every four hours and maintain last three version of
backup files in $ORA_CRS_HOME /cdata/crs
 location of OCR files is encrypted in /etc/oracle/ocr.loc ( ocrconfig_loc,
ocrmirrorconfig_loc )
 We must login as root user to add/remove/replace ocr file using ocrconfig .
 ocrcheck utility creates a log file in $ORA_CRS_HOME/log/hostname/client

 The Oracle Cluster Registry (OCR) records log information in the following location:

 OCRDUMP also creates a log file in CRS_Home/log/hostname/client. To change

the amount of logging, edit the file CRS_Home/srvm/admin/ocrlog.ini.

Voting disk:
On Shared storage, Used by CSS, contains nodes that are currently available within the
If Voting disks are lost and no backup is available then Oracle Clusterware must be
3 way multiplexing is ideal
We know whether voting disk is corrupted from crs alert log and css daemon log .
Cluster Synchronization Service (CSS)
ocssd daemon, manages cluster configuration

Cluster Ready Services (CRS)

manages resources(listeners, VIPs, Global Service Daemon GSD, Oracle Notification
Service ONS)
crsd daemon backup the OCR every for hours, configuration is stored in OCR

Event Manager (EVM)

evmd daemon, publish events

RAC specific background processes for the database instance :

LMSn (Global cache service monitor ) :coordinate block updates from one cache to
another . The number of lms server processes running is determined by
gcs_server_processes . Default value for gcs_server_processes is 2 and can be increased
to 32 .
LMON ( global enqueue service monitor for shared locks ) : also used by css . (across
node )
LMDn : manages requests for global enqueues . (across node )
LCK0 :handle resources not requiring Cache Fusion like library cache etc. (same node)
DIAG : collect diagnostic info

GSD 9i is not compatible with 10g

Cache fusion :

Storage options :
 Raw devices : Does not supports archive log. Mainly used for I/O benefits . Soft link
are created for raw devices and it is used to be accessed through oracle .
 ASM disk Does not support oracle binaries , ocr + voting disk
 CFS (ocfs) : Not recommended but supports oracle_home on windows from starting
and supports oracle_home on linux from ocfs2 .
 NFS mount points :

Adding new node :

 configure new hardware
 configure new operating system .
 adding node to cluster layer .: by running from crs_home of one of the
existing nodes .
 adding node to rdbms layer : by running from oracle_home of one of the
existing nodes .These will open OUI .
 Run the vipca on either node as root user . : vipca –nodelist oldnode,newnode
 adding the instance : We need to run dbca from one of the existing node using rac
option and

Removing the node form cluster :

 remove instance using dbca from node that will continue to stay
 srvctl remove instance –d db_name –I instance_name
 perform log switch from node to be removed
 alter system archive log all;
Alter database drop log file group 3 ;
 remove asm instance : srvctl remove asm –n node_name .
 stop nodeapp service : srvctl stop nodeapps –n node_name
 run the below command from oracle_home/install as root user.
$ORACLE_HOME/install/ node_name
 run below script as oracle user to update node list
$Oracle_home/oui/bin/runinstaller –updatenodelist oracle_home/home/
cluster_nodes=survivingnode1 survivingnode2
 run below command from node to be removed as root user. These will stop the crs and
remove the ocr.loc file . these will also remove the init files from /etc/init.d
$crs_home/install/ remote nosharedvar
 run the below command as root user from one of the surviving node
crs_home/install .here we also need to pass node number .
olsnodes –n
$crs_home/install/ node_name , 2
 $crs_home/oui/bin/runinstaller –updatenodelist oracle_home=/home/
cluster_nodes=nodename .

Redo management:
1) each has its own thread of redo groups .
2) each thread must have atleast 2 groups of redo logs.
3) Commands :
 alter database add logfile thread 2 group 5 ;
 alter database add logfile thread 2 group 6 ;
 alter database enable thread 2;
4) For recovery we require archives from both threads.
5) Number of Thread depends upon the max instances parameter defined whiles
creating database .
Tracing :
1) SRVM tracing : We enable srvm tracing by giving “ export
SRVM_TRACE=TRUE ” . By doing so we enable tracing for srvctl,cvu ,VIPCA
and gsdctl .
2) Debugging the resource as root user :
crsctl debug log module_name component:debugging_level
 module_name = crs , css , evm
 level = 1 to 5
 component : “crsctl lsmodules module_name”
3) Resource debugging :
 crsctl debug log res ""

Notes :
1) installation steps :
we should create appropriate entry for vip address in dns or /etc/hosts and then
execute vipca
 prepare the o/s level cluster .
 atleast 1gb of ram and double the amount of swap memory .
 atleast 400mb of free space in tmp directory .
 at each appropriate stage
 prepare raw devices for ocr,voting disk, datafiles
 lun should be stripe size of 1mb fro datafile and 256 kb for redo and conterolfile .
 oracle user on both node has same uid and gid
 synchronize time on both the nodes .
 configure ssh / user Equivalence in /etc/hosts.equiv for remote installation
 for linux we require certain packages which can be identified by cluvfy. Also we
require hangcheck-timer module for linux .
 prepare disk for voting disk and ocr.
 each node should have 2 network adaptor . public network adaptor must
tcp/ip and public network adaptor must support UDP.
 make entry in host file.
 install crs software
Add the second node .
Specify ocr file (2 copies for normal redundancy )
Specify voting disk .. (3 copies for normal redundancy) install oracle inventor on each nodes . file invokes the deamons .(on first node is initializes ocr and voting disk,)
when we run on last node , it runs vipca in silent mode.
 vipca configuration .( vip , ons and gsd is installed )  from crs home
 The VIP must be a DNS known IP address because we use the VIP for the
tnsnames connect The VIP and private ip details are incorporated in /etc/host/
 check crs status on all nodes , to confirm all deamons are started .
 apply crs version patch
 install oracle software .
 install on all nodes .
 upgrade patch + cpu patch
 listener configuration (node 1)
 configure asm (on one node with external redundancy )
 create oracle database . ( node 1 )
 listener configuration ( node 2)
 asm configure ( node 2 )
 add instance.
 post installation : backup voting disk, backup

Or we can choose 2 nodes while creating asm,listener and database above .
step-by-step installation of rac - metalink ( for 9i rac )

2) Voting disk is managed by crsctl + dd and ocr is managed by (ocrconfig +

ocrcheck + ocrdump )
3) when we run on first node after successful installation of oracle
software it initiates vipca . vipca is prompted only on first node .
4) when migrating existing single instance to rac, we need to first run localconfig
script with delete flag , which will stop ocss daemons which is currently
running from oracle_home and remove ocss entry from inittab file .
5) main reason to install crs before asm is that ASM requires css daemons .
6) we need to configure ssh user Equivalence as part of network confirguration,
before rac setup for remote node access .
7) ocr and voting disk is kept on shared device .
8) There are there main background process of crs : cssd , crsd , evnd which are
automatically started on system reboot . we can manually start them using below
command . /etc/inittab

/etc/init.d/ start

9) it is not recommended to have cross cable for private interconnect since it could
lead to error on surviving node due to media sensing behavior . instead it is
advisable to have a switch between the interconnect .
10) crsctl start performs below task
 The nodeapps (gsd, VIP, ons, listener) are brought online.
(srvctl start nodeapps -n green )
 The ASM instances are brought online.
(srvctl start asm -n green )
 The database instances are brought online.
(srvctl start instance –d orcl –i orcl1 )
 Any defined services are brought online.
11) crs automatically creates backup of ocr file in $CRS_HOME/cdata/crs/ every four
hours . it maintains 3 versions of ocr backup files .
12) oracle 9i srvconfig.loc is still available in 10g with name ocr.loc
13) ocr and voting disk is created while crs installation .
14) adding log file thread :
 alter database add logfile thread 3 group 5;
 alter database enable thread 3 ;
 alter system set thread=3 scope=spfile sid=’RAC01’;
 srvctl start instance –d racdb -I rac01
15) crs works on ocr and voting disk .
16) we can get the vip information from /etc/host or “ oifcfg getif ”
17) 'gc cr failure' and 'cr request retry' waits represents there is issue in interconnect
between 2 nodes .
18) we can set srvm_trace environment to true before we execute srvctl command
19) we need to run catclust.sql to create rac specific dictionary tables if not present
20) to debug currently running crs daemon  crsctl debug log crs
21) storage options available are ( ocfs , asm , shared raw , NFS , ocfs2 )
22) to add voting disk  crsctl add css votedisk /path -force
23) To manage SCN in rac, oracle uses two mechanism .
24) change VIP in rac :
 stop all vip dependant cluster on one node
 make necessary changes in /etc/hosts and DNS .
 change VIP using SRVCTL .: srvctl modify nodeapps –n node1 –A
 restart all VIP dependant components .
 perform same steps on all nodes .
25) restore ocr :
 ocrconfig –showbackup
 crsctl stop crs ( on all cluster nodes )
 ocrconfig –restore <backup_pathname>
 crsctl start crs ( on all cluster nodes )

Ocrconfig restore option must be used for restoring physical backup , and
import option must be used to restore logical backup

26) 10g rac new :

 crs
 VIP . (needs additional ip , can have client connection )
 manage service ( grid concept )

27) services in rac :

 created using srvctl and dbca
 performance can be monitored using dbca and EM .
 when creating service using dbca , it creates service , makes crs entry and makes
net service entry for the same.
 can be monitored by v$service_stat.
 tnsnames.ora contains entry for service .
 application VIP fails over if associated application fails over .

28) adding node to cluster :

 run in asm home from one of existing node .
 run racgons add_config from one of the existing node to add ons metadata to OCR
 run in crs home from one of the existing node .
 run netca in new node to add listener in new node
 run in rdbms home from one of the existing home.

29) If we are not using voting disk, then due to network failure, nodes are neither able
to communicate with each other nor synchronize with database, this situation is
called as cluster SPLIT-BRAIN problem.
30) For troubleshooting we can use hunganalyze , oswatcher and systemstate .
31) Crs and evmd processes are started with respawn option , they will be restarted
in case of node failure . CSS as part of CRS, is started as fatal option . The
failure of CSS deamon will lead to node restart to prevent data corruption.
32) On windows ocr and voting disk information is mentioned in
33) Main rac background process are : LMON, DIAG , LMS , LMD , LCK
We can also enable tracing for these background process .
34) We can use ocrdump to check whether ocr backup is not corrupted . if these
completes without error it indicates backup is proper .
And we use ocrcheck + cluvfy to check integrity of ocr files .
35) Auto_start=1 in profile indicates resource must auto start on cluster startup .
Value of 2 indicates we need to manually start the resource .
36) Ocrcheck is used inigrity of current OCR and ocrdump is used to check backup
files of ocr .
37) When all the instances fails, its called crash . In crash recovery redo is applied one
thread at a time as only one instance can dirty a block at a time . We need to
allocate channel for nodes where archives exits .
38) Instance recovery of failed instance is done by surviving instance lmon process .
39) Oracle automatically manages undo segments within a specific undo tablespace
that is assigned to an instance . only instance assigned to undo tablespace can
modify the contents of that tablespace . however each instance can read the undo
blocks created by any instance . also when performing transaction recovery ,any
instance can update any undo tablespace as long as that unod tablespace is not
being used by another instance for undo generation or transaction recovery .
40) crs home should not be same as oracle base directory .
41) oracle home requires atleast 1.5gb and crs home required atleast 120 mb
42) Both the voting disks and the OCR must reside on shared devices that you
configure before you install Oracle Clusterware and Oracle RAC.
43) In rac we need to use _fast_start_instance_recovery_target instead of
fast_strat_mttr_target for crash recovery .
44) Ocr and voting disk should be atleast 256mb each
45) For the private network, the end points of all designated interconnect interfaces
must be completely reachable on the network. There should be no node that is not
accessible by other nodes in the cluster using the private network.
To determine what interfaces are configured on a node running Red Hat Linux, use
the following command as the root user: # /sbin/ifconfig
46) You should configure the same private interface names for all nodes as well. If
eth1 is the private interface name for the first node, then eth1 should be the
private interface name for your second node.
47) Public interface names must be the same for all nodes. If the public interface on
one node uses the network adapter eth0, then you must configure eth0 as the
public interface on all nodes.
48) Each node needs at least two network interface cards, or network adapters. One
adapter is for the public network and the other adapter is for the private network
used by the interconnect. You should install additional network adapters on a
node if that node:
– Does not have at least two network adapters
– Has two network interface cards but is using network attached storage (NAS).
You should have a separate network adapter for NAS.
32) You must have at least three IP addresses available for each node:
1. An IP address with an associated host name (or network name) for the public
2. A private IP address with a host name for each private interface.
3. One virtual IP address with an associated network name. Select a virtual IP
(VIP) address that meets the following requirements:
The VIP address and associated network name are currently unused.
The VIP is on the same subnet as your public interface.
49) post installation : backup , voting disk , configure user accounts .
Oracle recommends that you back up the script after you complete an
installation. If you install other products in the same Oracle home directory, OUI
updates the contents of the existing script during the installation. If you
require information contained in the original script, then you can recover it
from the backup copy.
50) Failed over rac vip does not accept new connections while failed over application
virtual ip accepts new connection . We can share rac VIP among nodes whereas
we cant share application vip between nodes. Application VIP fails over if
associated application fails over .
51) SRVCTL uses information from the OCR file.
52) To perform recovery and most of maintenance activity we need to start single
instance in non cluster mode .
53) In 10gr2 to enable archiving , you need not change cluster_database parameter to
accomplish this task; you merely have to make sure the instance you are working
from is the only instance with the database mounted.
54) Node affinity awareness : For rman backup if we allocate multiple channel to
connect all nodes , rman automatically decides which node has faster access to
datafiles and makes backup from that node . From 10g onwards we don’t have to
separately give channel to rman . When we define degree of parallelism it
automatically connects to all nodes making use of load balancing .
55) SAN can be attached to server using fiber channel or iSCSI. Using iSCSI we can
deploy SAN at LAN, WAN and MAN .
56) Having individual home on local drives facilitates Rolling upgrade patch .
57) During performance issue on rac we need to run catpar.sql + racdiag.sql
58) Undo tablespace of both nodes should be on shared location.
59) If instance does not start .
 check instance alert log
 check crs alert log
 crs_stat –t
 crs_stat –p service_name
 respective background process log in crs home

60) We can dynamically enable clusterware debugging using crsctl as root user.
For srvctl tracing we can set srvm_trace environment to true before we execute
command .
61) Voting disk information can be gained from alert log of crs_home .
62) To delete node , we need to run $CRS_HOME/install/ from node to
be deleted .
The CSS misscount parameter represents the maximum time, in seconds, that a
heartbeat can be missed before entering into a cluster reconfiguration to evict the
crsctl get css misscount
crsctl get css disktimeout
crsctl get css reboottime

changing miscount :
Keep only one node up and running, stop the others
Backup the content of your OCR
crsctl set css misscount 30 --> as root user
crsctl get css misscount 30
Restart all other nodes
Troubleshoot node eviction :
1. Look at the cssd.log files on both nodes; usually we will get more information on the
second node if the first node is evicted. Also take a look at crsd.log file too
2. The evicted node will have core dump file generated and system reboot info.
3. Find out if there was node reboot , is it because of CRS or others, check system reboot
4. If you see “Polling” key words with reduce in percentage values in cssd.log file that
says the eviction is probably due to Network. If you see “Diskpingout” are something
related to -DISK- then, the eviction is because of Disk time out.
5. After finding Network or Disk issue. Then starting going in depth.
6. Now it’s time to collect NMON/OSW/RDA reports to make sure /justify if it was
DISK issue or Network.
7. If in case we see more memory contention/paging in the reports then it’s time to collect
AWR report to see what loads/SQL was running during that period?
8. If network was the issue, then check if any NIC cards were down, or if link switching
as happen. And check private interconnect is working between both the nodes.
9. Sometimes eviction could also be due to OS error where the system is in halt state for
while or Memory over commitment or CPU 100% used.
10. Check OS /system logfiles to get more information.

1) Difference between ocfs/asm etc
2) installation step in rac
3) symmetric and asymmetric rac
4) parallel recovery in 10g .
5) recovery_parallelism , db_recovery_file_dest
6) Flashback
7) Managing services in 10g rac .(add / remove / modify )
8) Role of sysaux in RAC .
9) Rolling forward patch .
10) Clustered asm and non clustered asm .
11) Inventory list lock .
12) Extended asm group .
13) Raw device mapping file .
15) fan in rac
16) adding / Removing new node
17) logs in rac .
18) what to do if rac instance is down .
19) rman auto locate .
20) AUTO_start attribute .
21) redundancy level (normal and external )
22) crs_stat –p service_name
23) olsnodes –n
24) srvctl config database –d db_name .
25) data guard fast start fail over .
26) crs_stat |grep –i node1
27) srvctl status nodeapps –n node2

Tools / Utilities used :


Views :
1) select * from GV$CACHE_TRANSFER ;
2) select * from v$ges_statistics;
3) select * from v$active_instances ;
4) select * from v$session where sid= and inst_id =;
5) select * from gv$instance ;
6) select * from x$ksxpia  To check interconnect which is used
8) select * from v$bgprocess where paddr<>’00’;
9) select * from gv$resource_limit ;
10) select ss.inst_id,
substr (ss.username||'('||se.sid||')',1,15) user_process,
substr(ss.osuser,1,23) USER_NM,
substr(ss.machine,1,23) MACHINE_NM,
substr(ss.program,1,17) program,
from gv$session ss, gv$sesstat se, gv$statname sn
where se.statistic# = sn.statistic#
and name like '%CPU used by this session%'
and se.sid = ss.sid
and ss.username is not null
and ss.status in ('ACTIVE')
and ss.username not in ('SYS','SYSTEM')
and value >=10000
order by substr(name,1,25), value desc
11) select * from v$resource_limit where resource_name like ‘g%s_%’;
12) select * from v$sgastat where name like ‘g_s%’ or name like ‘KCL%’;

O/S commands :
ifconfig -l  list network card on each nodes .
lscfg -vl hdisk  identify lun id assignd to disk
lsdev -Ccdisk  get list of disk
ps –ef|grep ASM  check asm deamons

Tuning RAC :
Tuning Recovery Parameters:
fast_start_mttr_target : controls recovery time
parallel_execution_message_size : increase from 2k to 4-8k .Provides better recovery
slave performance

Waits event Views:

 v$session_event : total waits for an event
 v$session_wait_class : waits for wait event class y session
 v$session_event : wait for event by session
 v$active_session_history : activity for recent active sessions
 v$session_wait_history : last 10 wait events for each active sessions
 v$session_wait : events for which active sessions are waiting
 V$sqlstats : identify sql statements impacted by interconnect latencies .

Other tuning views :

 v$segment_statistics
 v$enqueue_statistics
 v$instance_cache_transfer

Rac wait events that require attention :

 'gc cr failure' and 'cr request retry' : issue in interconnect
 gc cr block lost  should never happen

Logs to monitor :
 CRS ALERT LOG : log/<hostname>/alert<nodename>.log
 CRS logs : log/<hostname>/crsd/
 CSS logs : log/<hostname>/cssd/
 EVM logs : log/<hostname>/evmd+evm/log
 OPMN logs – opmn/logs
 resource specific logs – log/<hostname>/racg/
 cluster communication logs : log
 ORACLE_home :
 resource specific logs : log/<hostname>/racg
 srvm logs -- log/<hostname>/client
 alert and other trace files : bdump/cdump/udump
 AWR/ statspack/ ash/ addm of each node
 listener logs .
 ASM_logs :
 alert logs and other trace files : oracle_home/rdbms/log/ bdump/udump/cdump

Tools to diagnose :
 alert log /trace files
 debugging
 session tracing
 wait events using views
 hanganalyze / oswatcher (note 301137.1)/ rda
 catpar.sql + racdiag.sql
 awrrtp / awrrpti.sql

What to consider :
 index block contention
 high water mark consideration
 wait event (interconnect traffic )