General
Oracle Real Application Cluster (RAC) is a cluster system at the application level. It
uses shared disk architecture that provides scalability for all kind of applications.
Applications without any modifications can use the RAC database.
Since the requests in a RAC cluster are spread evenly across the RAC instances, and
since all instances access the same shared storage, addition of server(s) require no
architecture changes etc. And a failure of a single RAC node results only in the loss of
scalability and not in the loss of data since a single database image is utilized.
Important Notes
There are few important notes that might be useful to know before installing Oracle9i
RAC:
(*) If you want to install Oracle9i RAC using FireWire drive(s), make sure to read
first FireWire-Based Shared Storage for Linux!
(*) At the time of this writing, there is a bug that prevents you from successfully
installing and running Oracle9i RAC using OCFS on FireWire drives on RH AS
2.1.Note that FireWire-based shared storage for Oracle9i RAC is experimental!
See Setting Up Linux with FireWire-Based Shared Storage for Oracle9i RACfor more
information. At the time of this writing, the only option is to use raw devices for all
partitions on the FireWire drives. However, you might be lucky and get RAC working
using OCFS on FireWire drives in RH AS 2.1. And this article will show you how to
do it in case you are one of the lucky ones :)
(*) If you want to setup a FireWire shared drive using raw devices for all Oracle files,
then keep in mind that Linux uses the SCSI layer for FireWire drives. This means that
you can only create 14 raw devices on a single FireWire drive. Since /dev/sda16 is
really /dev/sdb, you have 15 partitions minus 1 partition for the extended partition.
Therefore you can create only 14 raw devices on a single FireWire disk. This means
that you might have to buy a second FireWire drive.
A requirement for Oracle9i RAC cluster is a set of servers with shared disk access and
interconnect connectivity. Since each instance in a RAC system must have access to
the same database files, a shared storage is required that can be accessed from all
RAC nodes concurrently.
The shared storage space can be used as raw devices, or by using a cluster file system.
This article will address raw devices and Oracle's Cluster File System OCFS. Note
that Oracle9i RAC provides it's own locking mechanisms and therefore it does not
rely on other cluster software or on the operating system for handling locks.
Shared Storage can become expensive. If you just want to check out the advanced
features of Oracle9i RAC without spending too much money, I'd recommend to buy
an external FireWire-Based Shared Storage for Oracle9i RAC. Caveat: You can
download a patch from Oracle for FireWire-Based Shared Storage for Oracle9i RAC,
but Oracle does not support the patch. It is intended for testing and demonstration
only! See Setting Up Linux with FireWire-Based Shared Storage for Oracle9i
RAC for more information.
Note that it is very important to get a FireWire-Based Shared Storage that allows
concurrent access for more than one server. Otherwise the disk(s) and partitions can
only be seen by one server at a time. Therefore, make sure the FireWire drive(s) have
a chipset that supports concurrent access for at least two servers or more. If you have
already a FireWire drive, you can check the maximum supported logins (concurrent
access) by following the steps as outlined under Configuring FireWire-Based Shared
Storage.
For test purposes I used an external 250 GB and 200 GB Maxtor hard drive which
support a maximum of 3 concurrent logins. Here are the technical specifications for
these FireWire drives:
- Vendor: Maxtor
- Model: OneTouch
- Mfg. Part No. or KIT No.: A01A200 or A01A250
- Capacity: 200 GB or 250 GB
- Cache Buffer: 8 MB
- Spin Rate: 7200 RPM
- "Combo" Interface: IEEE 1394 and SPB-2 compliant (100 to 400 Mbits/sec)
plus USB 2.0 and USB 1.1 compatible
Here are links where these drives can be bought:
Maxtor 200GB One Touch Personal Storage External USB 2.0/FireWire Hard Drive
Maxtor 250GB One Touch Personal Storage External USB 2.0/FireWire Hard Drive
The FireWire adapter I'm using is a StarTech 4 Port IEEE-1394 PCI Firewire Card.
Don't forget that you will also need a FireWire hub if you want to connect more than 2
RAC nodes to the FireWire drive(s).
The following steps need to be performed on all nodes of the RAC cluster unless it
says otherwise!
You cannot download Red Hat Linux Advanced Server, you can only download the
source code. If you want to get the binary CDs, you have to buy it
athttp://www.redhat.com/software/rhel/.
You don't have to install all RPMs when you want to run an Oracle9i RAC database
on Red Hat Advanced Server. You are fine when you select the Installation Type
"Advanced Server" and when you don't select the Package Group "Software
Development". There are only a few other RPMs that are required for installing
Oracle9i RAC. These other RPMs are covered in this article.
Make sure that no firewall is selected during the installation.
General
Using the right Red Hat Enterprise Linux kernel is very important for an Oracle
database. Beside important fixes and improvements, the hangcheck-timer.o
modulecomes now with newer Red Hat Enterprise Linux kernel versions which is a
requirement for a RAC system. Therefore it is important to follow the steps as
outlined under Upgrading the Linux Kernel unless you are using FireWire shared
drives (see below).
You can download a patch from Oracle for FireWire-Based Shared Storage for
Oracle9i RAC, but Oracle does not support the patch. It is intended for testing and
demonstration only! See Setting Up Linux with FireWire-Based Shared Storage for
Oracle9i RAC for more information.
There are two experimental kernels for FireWire shared drives, one for UP machines
and one for SMP machines. To install the kernel for a single CPU machine, run the
following command:
su - root
rpm -ivh --nodeps kernel-2.4.20-18.10.1.i686.rpm
Note that the above command does not upgrade your existing kernel. This is my
preferred method since I always want to have the option to go back to the old kernel
in case the new kernel causes problems or doesn't come up.
To make sure that the right kernel is booted, check the /etc/grub.conf file if you use
GRUB, and change the "default" attribute if necessary. Here is an example:
default=0
timeout=10
splashimage=(hd0,0)/grub/splash.xpm.gz
title Red Hat Linux (2.4.20-18.10.1)
root (hd0,0)
kernel /vmlinuz-2.4.20-18.10.1 ro root=/dev/hda1 hdc=ide-scsi
initrd /initrd-2.4.20-18.10.1.img
title Red Hat Linux Advanced Server-up (2.4.9-e.25)
root (hd0,0)
kernel /boot/vmlinuz-2.4.9-e.25 ro root=/dev/hda1 hdc=ide-scsi
initrd /boot/initrd-2.4.9-e.25.img
In this example, the "default" attribute is set to "0" which means that the the
experimental FireWire kernel 2.4.20-18.10.1 will be booted. If the "default" attribute
would be set to "1", the 2.4.9-e.25 kernel would be booted.
The private networks are critical components of a RAC cluster. The private networks
should only be used by Oracle to carry Cluster Manager and Cache Fusion inter-node
connection. A RAC database does not require a separate private network, but using a
public network can degrade database performance (high latency, low bandwidth).
Therefore the private network should have high-speed NICs (preferably one gigabit or
more) and it should only be used by Oracle9i RAC and Cluster Manager.
If you are trying to check out the advanced features of Oracle9i RAC on a cheap
system where you don't have two Ethernet adapters, you could assign both server
names (public and private) to the same IP adddress. For example:
192.168.1.1 rac1prv rac1pub # RAC node 1 - for server with single
network adapter
192.168.1.2 rac2prv rac2pub # RAC node 2 - for server with single
network adapter
192.168.1.3 rac3prv rac3pub # RAC node 3 - for server with single
network adapter
To configure the network interfaces, run the following command on each node:
su - root
redhat-config-network
For instructions on how to setup a shared storage device on Red Hat Advanced
Server, see the installation instructions of the manufacturer.
First make sure the experimental kernel for FireWire was installed and the server has
been rebooted (see Upgrading the Linux Kernel for FireWire Shared Disks Only):
# uname -r
2.4.20-18.10.1
To load the kernel modules/drivers with the right options etc., add the following
entries to the /etc/modules.conf file:
options sbp2 sbp2_exclusive_login=0
post-install sbp2 insmod sd_mod
post-remove sbp2 rmmod sd_mod
It is important that the parameter sbp2_exclusive_login of the Serial Bus Protocol
module sbp2 is set to zero to allow multiple hosts to log into or to access
theFireWire storage at the same time. The second line makes sure the SCSI disk
driver module sd_mod is loaded as well since sbp2 needs the SCSI layer. The SCSI
core support module scsi_mod will be loaded automatically if sd_mod is loaded - there
is no need to make an entry for it.
If the ieee1394 module was not loaded, then your FireWire adapter might not be
supported.
I'm using the StarTech 4 Port IEEE-1394 PCI Firewire Card which works great:
# lspci
...
00:0c.0 FireWire (IEEE 1394): VIA Technologies, Inc. OHCI Compliant IEEE 1394
Host Controller (rev 46)
...
Now run the script to rescan the SCSI bus and to add the FireWire drive to the system:
su - root
# /usr/local/bin/rescan-scsi-bus.sh
Host adapter 0 (sbp2_0) found.
Scanning for device 0 0 0 0 ...
NEW: Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: Maxtor Model: OneTouch Rev: 0200
Type: Direct-Access ANSI SCSI revision: 06
1 new device(s) found.
0 device(s) removed.
#
When you run dmesg, you should see entries similar to this example:
# dmesg
...
ohci1394_0: OHCI-1394 1.0 (PCI): IRQ=[9] MMIO=[fedff000-fedff7ff] Max
Packet=[2048]
ieee1394: Device added: Node[00:1023] GUID[0010b920008c85cb] [Maxtor]
ieee1394: Device added: Node[01:1023] GUID[00110600000032a0] [Linux OHCI-
1394]
ieee1394: Host added: Node[02:1023] GUID[00110600000032c7] [Linux OHCI-1394]
ieee1394: Device added: Node[04:1023] GUID[00110600000032d0] [Linux OHCI-
1394]
SCSI subsystem driver Revision: 1.00
scsi0 : SCSI emulation for IEEE-1394 SBP-2 Devices
ieee1394: sbp2: Query logins to SBP-2 device successful
ieee1394: sbp2: Maximum concurrent logins supported: 3
ieee1394: sbp2: Number of active logins: 0
ieee1394: sbp2: Logged into SBP-2 device
ieee1394: sbp2: Node[00:1023]: Max speed [S400] - Max payload [2048]
scsi singledevice 0 0 0 0
Vendor: Maxtor Model: OneTouch Rev: 0200
Type: Direct-Access ANSI SCSI revision: 06
blk: queue cc28b214, I/O limit 4095Mb (mask 0xffffffff)
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sda: 490232832 512-byte hdwr sectors (250999 MB)
sda: sda1 sda2
scsi singledevice 0 0 1 0
scsi singledevice 0 0 2 0
scsi singledevice 0 0 3 0
scsi singledevice 0 0 4 0
scsi singledevice 0 0 5 0
scsi singledevice 0 0 6 0
scsi singledevice 0 0 7 0
...
The kernel reports that the FireWire drive can concurrently be shared by 3 servers (see
"Maximum concurrent logins supported:"). It is very important that you have a drive
where the chip supports concurrent access for the nodes. The "Number of active
logins:" shows how many servers are already sharing the drive before this server
added this drive to the system.
Problems:
If the rescan-scsi-bus.sh script says: "0 new device(s) found.", then try to
run rescan-scsi-bus.sh several times. If this doesn't work, try to run the
following commands:
su - root
/usr/local/bin/rescan-scsi-bus.sh -r
/usr/local/bin/rescan-scsi-bus.sh
If this doesn't work either, remove the modules, power off/on
the FireWire drive, and rerun the whole procedure. This always worked for me
when myFireWire drive was not recognized by the system. You could also use
the "fwocfs" service script to reload the modules and to rescan the the SCSI
bus. SeeAutomatic Scanning of FireWire-Based Shared Storage for more
information.
Note that if you have a USB device attached, the system might not be able to
recognice your FireWire drive!
If everything worked fine, you should now be able to see the partitions of
the FireWire drives:
su - root
# fdisk -l
Disk /dev/sda: 255 heads, 63 sectors, 30515 cylinders
Units = cylinders of 16065 * 512 bytes
To have the FireWire drives automatically added to the system after each reboot, I
wrote a small service script called "fwocfs" for FireWire drives. This service script
also mounts OCFS filesystems if configured, see Configuring the OCFS File Systems
to Mount Automatically at Startup for more information. Therefore, this service script
can be used for OCFS filesystems or for raw devices. It is also very useful for
reloading the kernel modules for the FireWire drives and for rescanning the SCSI bus
if your FireWire drives were not recognized. The "fwocfs" script can be downloaded
from here.
If for any reason the FireWire drives have not been recognized, try to restart the
"fwocfs" service script with the following command:
su - root
service fwocfs restart
For more information on the "oinstall" group account, see When to use
"OINSTALL" group during install of oracle.
In my test setup, the database name is "orcl" and the Oracle SIDs are "orcl1" for RAC
node one, "orcl2" for RAC node two, etc.
# Oracle Environment
export ORACLE_BASE=/opt/oracle
export ORACLE_HOME=/opt/oracle/product/9.2.0
export ORACLE_SID=orcl1 # Each RAC node must have a unique Oracle SID!
E.g. orcl1, orcl2,...
export ORACLE_TERM=xterm
# export TNS_ADMIN= Set if sqlnet.ora, tnsnames.ora, etc. are not in
$ORACLE_HOME/network/admin
export NLS_LANG=AMERICAN;
export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data
LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
export LD_LIBRARY_PATH
# Specify that native threads should be used when running Java software
export THREADS_FLAG=native
Native threads are implemented using pthreads (POSIX threads) which take full
advantage of multiprocessors. Java also supports green threads. Green threads are
user-level threads, implemented within a single Unix process, and run on a single
processor.
Make sure to add these environment settings to the ~oracle/.bash_profile file if you
use bash. This will make sure that the Oracle environment variables are always set
when you login as "oracle", or when you switch to the user "oracle" by running "su -
oracle".
To create the Oracle datafile directory for the ORCL database, run the following
commands:
su - oracle
mkdir -p /var/opt/oracle/oradata/orcl
chmod -R 775 /var/opt/oracle
Some directories are not replicated properly to other nodes when RAC is installed.
Therefore the following commands must be run on all cluster nodes:
su - oracle
General
Note that it is important for the Redo Log files to be on the shared disks as well.
If you use OCFS for database files and other Oracle files, you can create several
partitions on your shared storage for the OCFS filesystems. If you use a FireWiredisk,
you could create one large partition on the disk which should make things easier.
For more information on how to install OCFS and how to mount OCFS filesystems on
partitions, see Installing and Configuring Oracle Cluster File Systems (OCFS).
After you finished creating the partitions, I recommend that you reboot the kernel on
all RAC nodes to make sure all partitions are recognized by the kernel on all RAC
nodes:
su - root
reboot
In the follwing example I will show how to setup raw devices on FireWire disk(s) for
all Oracle files including the Cluster Manager Quorum File and the Shared
Configuration File for srvctl. It requires more administrative work to use raw devices
for Oracle datafiles. And using a FireWire drive makes it even more complicated
since there is a hard limit of 15 partitions per SCSI drive since the FireWire drive uses
the SCSI layer.
In the following example I will setup 19 partitions for an Oracle9i RAC database
using raw devices on two FireWire disks. Keep in mind that we can only have 14
partitions on a single FireWire drive not including the Extended Partition.
Using 2 MB for the Cluster Manager Quorum File (raw device) should be more than
sufficient. And using 20 MB for the Shared Configuration File (raw device) for the
Oracle Global Services daemon should be more than sufficient as well.
After I created the following 19 partitions on one RAC node, I bound the raw devices
by running the raw command on all RAC nodes:
su - root
NOTE: It is important to make sure that the above binding commands are added to
the /etc/rc.local file! The binding for raw devices has to be done after each reboot.
Set the permissions and ownership for the 19 raw devices on all all RAC nodes:
su - root
Optionally, you can create soft links to the raw devices. If you do this it will be
transparent to you whether you use OCFS or raw devices when you run the Oracle
Database Assistant. In the following example I use the exact same file names which
the Database Configuration Assistant will use for the cluster system by default.
To create the soft links, run the following command on all RAC nodes.
su - oracle
ln -s /dev/raw/raw1 /var/opt/oracle/oradata/orcl/CMQuorumFile
ln -s /dev/raw/raw2 /var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile
ln -s /dev/raw/raw3 /var/opt/oracle/oradata/orcl/spfileorcl.ora
ln -s /dev/raw/raw4 /var/opt/oracle/oradata/orcl/control01.ctl
ln -s /dev/raw/raw5 /var/opt/oracle/oradata/orcl/control02.ctl
ln -s /dev/raw/raw6 /var/opt/oracle/oradata/orcl/indx01.dbf
ln -s /dev/raw/raw7 /var/opt/oracle/oradata/orcl/system01.dbf
ln -s /dev/raw/raw8 /var/opt/oracle/oradata/orcl/temp01.dbf
ln -s /dev/raw/raw9 /var/opt/oracle/oradata/orcl/tools01.dbf
ln -s /dev/raw/raw10 /var/opt/oracle/oradata/orcl/undotbs01.dbf
ln -s /dev/raw/raw11 /var/opt/oracle/oradata/orcl/undotbs02.dbf
ln -s /dev/raw/raw12 /var/opt/oracle/oradata/orcl/undotbs03.dbf
ln -s /dev/raw/raw13 /var/opt/oracle/oradata/orcl/users01.dbf
ln -s /dev/raw/raw14 /var/opt/oracle/oradata/orcl/redo01.log
ln -s /dev/raw/raw15 /var/opt/oracle/oradata/orcl/redo02.log
ln -s /dev/raw/raw16 /var/opt/oracle/oradata/orcl/redo03.log
ln -s /dev/raw/raw17 /var/opt/oracle/oradata/orcl/orcl_redo2_2.log
ln -s /dev/raw/raw18 /var/opt/oracle/oradata/orcl/orcl_redo3_1.log
ln -s /dev/raw/raw19 /var/opt/oracle/oradata/orcl/orcl_redo3_2.log
After you finished creating the partitions, I recommend that you reboot the kernel on
all RAC nodes to make sure all partitions are recognized by the kernel on all RAC
nodes:
su - root
reboot
The Oracle Cluster File System (OCFS) was developed by Oracle to overcome the
limits of Raw Devices and Partitions. It also eases administration of database files
because it looks and feels just like a regular file system.
At the time of this writing, OCFS only supports Oracle Datafiles and a few other files:
- Redo Log files
- Archive log files
- Control files
- Database datafiles
- Shared quorum disk file for the cluster manager
- Shared init file (srv)
Oracle says that in the later part of 2003 they will support Shared Oracle Home
installs. So don't install the Oracle software on OCFS yet. See Oracle Cluster File
System for more information. In this article I'm creating a separate, individual
ORACLE_HOME directory on local server storage for each and every RAC node.
NOTE:
If files on the OCFS file system need to be moved, copied, tar'd, etc., or if directories
need to be created on OCFS, then the standard file system commands mv, cp,
tar,... that come with the OS should not be used. These OS commands can have a
major OS performance impact if they are being used on the OCFS file system.
Therefore, Oracle's patched file system commands should be used instead.
It is also important to note that some 3rd vendor backup tools make use of standard
OS commands like tar.
Installing OCFS
To find out which OCFS driver you need for your server, run:
$ uname -a
Linux rac1pub 2.4.9-e.25smp #1 Fri Oct 6 18:27:21 EDT 2003 i686 unknown
For my SMP servers with <=4GB RAM I downloaded the following OCFS RPMs
(make sure to use the latest OCFS version!):
ocfs-2.4.9-e-smp-1.0.9-9.i686.rpm # OCFS driver for SMP kernels
ocfs-tools-1.0.9-9.i686.rpm
ocfs-support-1.0.9-9.i686.rpm
To install the RPMs for SMP kernels on servers with <= 4 GB RAM, run:
su - root
rpm -ivh ocfs-2.4.9-e-smp-1.0.9-9.i686.rpm \
ocfs-tools-1.0.9-9.i686.rpm \
ocfs-support-1.0.9-9.i686.rpm
To install the OCFS RPMs for uniprocessor kernels, run:
su - root
rpm -ivh ocfs-2.4.9-e-1.0.9-9.i686.rpm
ocfs-tools-1.0.9-9.i686.rpm \
ocfs-support-1.0.9-9.i686.rpm
NOTE: It is also very important to install an updated fileutils package that adds
support for the O_DIRECT flag which controls the use of synchronous I/O on file
systems such as OCFS. This updated package is required for better performance. If
commands like cp, mv, dd, etc. don't support the O_DIRECT flag when used on OCFS file
systems, then you can experience a big performance impact. For instance, some 3rd
party products use the dd command for doing the backup.
For FireWire kernels, download the latest OCFS RPMs for RH AS 2.1
from http://oss.oracle.com/projects/firewire/files.
To find out which OCFS driver you need for your server, run:
$ uname -a
Linux rac1pub 2.4.9-e.25 #1 Fri Oct 6 18:27:21 EDT 2003 i686 unknown
To install the OCFS RPMs for uniprocessor kernels, run e.g. (make sure to use the
latest OCFS version!):
su - root
rpm -Uvh ocfs-2.4.20-18.10-1.0.10-2.i386.rpm \
ocfs-tools-1.0.10-2.i386.rpm \
ocfs-support-1.0.10-2.i386.rpm
I would also recommend to update the fileutils package, see note above.
Configuring OCFS
To generate the /etc/ocfs.conf file, run the ocfstool tool. But before you run the
GUI tool, make sure you have set the DISPLAY environment variable. You can find a
short description about the DISPLAY environment variable here
node_name = rac1prv
node_number =
ip_address = 92.168.1.1
ip_port = 7000
guid = 167045A6AD4E9EAB33620010B5C05E7F
The guid entry is the unique group user ID. This ID has to be unique for each node.
You can create the above file without the ocfstool tool by editing
the/etc/ocfs.conf file manually and by running ocfs_uid_gen -c to assign/update the
guid value in this file.
If a FireWire storage is being used, the OCFS File Systems won't mount automatically
at startup with the steps described above. Some addional changes need to be made.
When I run load_ocfs on a system with the experimental FireWire kernel, it returns
the following error message:
su - root
# load_ocfs
/sbin/insmod ocfs node_name=rac1prv ip_address=192.168.1.1 ip_port=7000
cs=1841 guid=BB669BEFEA6C470479D10050DA1A2424 comm_voting=1
insmod: ocfs: no module by that name found
load_ocfs: insmod failed
#
The ocfs.o module can be found here:
su - root
# rpm -ql ocfs-2.4.20
/lib/modules/2.4.20-ABI/ocfs
/lib/modules/2.4.20-ABI/ocfs/ocfs.o
#
So for the experimental kernel for FireWire drives, I manually created a link for the
ocfs.o file:
su - root
mkdir /lib/modules/`uname -r`/kernel/drivers/addon/ocfs
ln -s `rpm -qa | grep ocfs-2 | xargs rpm -ql | grep "/ocfs.o$"` \
/lib/modules/`uname -r`/kernel/drivers/addon/ocfs/ocfs.o
Now you should be able to load the OCFS module and the output will look similar to
this example:
su - root
# /sbin/load_ocfs
/sbin/insmod /lib/modules/2.4.20-ABI/ocfs/ocfs.o node_name=rac1prv
ip_address=192.168.1.1 ip_port=7000 cs=1833
guid=01A553F1FD7B719E9D290010B5C05E7F comm_voting=1
#
Before you continue with the next steps, make sure you've created all needed
partitions on your shared storage.
The following steps for creating the OCFS filesystem(s) should only be executed on
one RAC node!
Alternatively, you can run the "mkfs.ocfs" command to create the OCFS filesystems:
su - root
mkfs.ocfs -F -b 128 -L /var/opt/oracle/oradata/orcl -m
/var/opt/oracle/oradata/orcl \
-u `id -u oracle` -g `id -g oracle` -p 0775 <device_name>
Cleared volume header sectors
Cleared node config sectors
Cleared publish sectors
Cleared vote sectors
Cleared bitmap sectors
Cleared data block
Wrote volume header
#
For SCSI disks (including FireWire disks), <device_name> stands for devices
like /dev/sda, /dev/sdb, /dev/sdc, dev/sdd, etc. Be careful to use the right device
name! For this article I created a large OCFS filesystem on /dev/sda1.
mkfs.ocfs options:
As I mentioned previously, for this article I created one large OCFS fileystem
on /dev/sda1. To mount the OCFS filesystem, run:
su - root
# mount -t ocfs /dev/sda1 /var/opt/oracle/oradata/orcl
Now run the ls command on all RAC nodes to check the ownership:
# ls -ld /var/opt/oracle/oradata/orcl
drwxrwxr-x 1 oracle oinstall 131072 Aug 18 18:07
/var/opt/oracle/oradata/orcl
#
NOTE: If the above ls command does not display the same ownership on all RAC
nodes (oracle:oinstall), then the "oracle" UID and the "oinstall" GID are not the
same accross the RAC nodes, see Creating Oracle User Accounts for more
information.
To ensure the OCFS filesystems are mounted automatically during a reboot, the
OCFS mount points need to be added to the /etc/fstab file.
To make sure the ocfs.o kernel module is loaded and the OCFS file systems are
mounted during the boot process, run:
su - root
# chkconfig --list ocfs
ocfs 0:off 1:off 2:off 3:on 4:on 5:on 6:off
If the flags are not set to "on" as marked in bold, run the following command:
su - root
# chkconfig ocfs on
You can also start the "ocfs" service manually by running:
su - root
# service ocfs start
When you run this command it will not only load the ocfs.o kernel module but it will
also mount the OCFS filesystems as configured in /etc/fstab.
If a FireWire storage is being used, the OCFS File Systems won't mount automatically
at startup with the "ocfs" service script. In Automatic Scanning of FireWire-Based I
already introduced the "fwocfs" service script which can be used for mounting the
OCFS file systems automatically for FireWire drives.
If you have not installed the "fwocfs" service script yet, follow the steps as outlined
in Automatic Scanning of FireWire-Based. Since the "ocfs" service script does not
work for FireWire drives, disable the "ocfs" service script. The "fwocfs" service script
can now be used instead.
su - root
chkconfig ocfs off
Now start the new fwocfs service:
su - root
# service fwocfs start
Loading ohci1394: [ OK ]
Loading sbp2: [ OK ]
Rescanning SCSI bus: [ OK ]
Loading OCFS: [ OK ]
Mounting OCFS file systems: [ OK ]
#
To see the mounted OCFS file systems, you can run the following command:
su - root
# service fwocfs status
/dev/sda1 on /var/opt/oracle/oradata/orcl type ocfs (rw,_netdev)
#
Next time when you reboot the server, the OCFS filesystems should be mounted
automatically for the FireWire drives.
Originally, hangcheck-timer was shipped by Oracle, but this module comes now with
RH AS starting with kernel versions 2.4.9-e.12 and higher. So hangcheck-timer is
now part of all newer RH AS kernels. If you upgraded the kernel as outlined
in Upgrading the Linux Kernel, then you should have the hangcheck-timer module on
your node:
# find /lib/modules -name "hangcheck-timer.o"
/lib/modules/2.4.9-e.25smp/kernel/drivers/char/hangcheck-timer.o
#
Therefore you don't need to install the Oracle Cluster Manager patch (e.g. patch
2594820) before you install the Oracle9i database patch set.
These two parameters indicate how long a RAC node must hang before the
hangcheck-timer module will reset the system. A node reset will occur when the
following is true:
system hang time > (hangcheck_tick + hangcheck_margin)
To load the module with the right parameter settings, you can run the following
command:
# su - root
# /sbin/insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180
# grep Hangcheck /var/log/messages |tail -1
Oct 18 23:05:36 rac1prv kernel: Hangcheck: starting hangcheck timer (tick is
30 seconds, margin is 180 seconds).
#
But the right way to load modules with the correct parameters is to make entries in
the /etc/modules.conf file. To do that, add the following line to
the/etc/modules.conf file:
# su - root
# echo "options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180" >>
/etc/modules.conf
Now you can run modprobe to load the module with the configured parameters
in /etc/modules.conf:
# su - root
# modprobe hangcheck-timer
# grep Hangcheck /var/log/messages |tail -1
Oct 18 23:10:23 rac1prv kernel: Hangcheck: starting hangcheck timer (tick is
30 seconds, margin is 180 seconds).
#
Note: To ensure the hangcheck-timer module is loaded after each reboot, add
the modprobe command to the /etc/rc.local file.
The following steps show how I setup a trusted environment for the "oracle" account
on all RAC nodes.
First make sure the "rsh" RPMs are installed on all RAC nodes:
rpm -q rsh rsh-server
If rsh is not installed, run the following command:
su - root
rpm -ivh rsh-0.17-5.i386.rpm rsh-server-0.17-5.i386.rpm
To enable the "rsh" service, the "disable" attribute in the /etc/xinetd.d/rsh file must
be set to "no" and xinetd must be refreshed. This can be done by running the
following commands:
su - root
chkconfig rsh on
chkconfig rlogin on
service xinetd reload
To allow the "oracle" user account to be trusted among the RAC nodes, create
the /etc/hosts.equiv file:
su - root
touch /etc/hosts.equiv
chmod 600 /etc/hosts.equiv
chown root.root /etc/hosts.equiv
And add all RAC nodes to the /etc/hosts.equiv file similar to the following example:
$ cat /etc/hosts.equiv
+rac1prv oracle
+rac2prv oracle
+rac3prv oracle
+rac1pub oracle
+rac2pub oracle
+rac3pub oracle
In the preceding example, the second field permits only the oracle user account to run
rsh commands on the specified nodes. For security reasons, the/etc/hosts.equiv file
should be owned by root and the permissions should be set to 600. In fact, some
systems will only honor the content of this file if the owner of this file is root and the
permissions are set to 600.
Now you should be able to run rsh against each RAC node without having to provide
the password for the oracle account:
su - oracle
$ rsh rac1prv ls -l /etc/hosts.equiv
-rw------- 1 root root 49 Oct 19 13:18 /etc/hosts.equiv
$ rsh rac2prv ls -l /etc/hosts.equiv
-rw------- 1 root root 49 Oct 19 14:39 /etc/hosts.equiv
$ rsh rac3prv ls -l /etc/hosts.equiv
-rw------- 1 root root 49 Oct 19 14:42 /etc/hosts.equiv
$
The default and maximum window size can be changed in the proc file system
without reboot:
su - root
sysctl -w net.core.rmem_default=262144 # Default setting in bytes of the
socket receive buffer
sysctl -w net.core.wmem_default=262144 # Default setting in bytes of the
socket send buffer
sysctl -w net.core.rmem_max=262144 # Maximum socket receive buffer size
which may be set by using the SO_RCVBUF socket option
sysctl -w net.core.wmem_max=262144 # Maximum socket send buffer size
which may be set by using the SO_SNDBUF socket option
To make the change permanent, add the following lines to the /etc/sysctl.conf file,
which is used during the boot process:
net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=262144
net.core.wmem_max=262144
Setting Semaphores
It is recommended to follow the steps as outlined in Setting Semaphores.
Before you continue, make sure the OCFS filesystems are mounted on all RAC nodes
and that rsh really works for the oracle account on all RAC nodes. Here is the output
of the df command on my RAC test system:
su - oracle
rsh rac1prv df | grep oradata
/dev/sda1 51205216 45152 51160064 1%
/var/opt/oracle/oradata/orcl
rsh rac2prv df | grep oradata
/dev/sda1 51205216 45152 51160064 1%
/var/opt/oracle/oradata/orcl
rsh rac3prv df | grep oradata
/dev/sda1 51205216 45152 51160064 1%
/var/opt/oracle/oradata/orcl
It is wise to first patch the Oracle 9iR2 software before creating the database. To patch
Oracle9iR2 (Cluster Manager, etc.), download the Oracle 9i Release 2 Patch Set 3
Version 9.2.0.4.0 for Linux x86 (patch number 3095277) from metalink.oracle.com.
Copy the downloaded "p3095277_9204_LINUX.zip" file to e.g. /tmp and run the
following commands:
su - oracle
$ unzip p3095277_9204_LINUX.zip
Archive: p3095277_9204_LINUX.zip
inflating: 9204_lnx32_release.cpio
inflating: README.html
inflating: patchnote.css
$
$ cpio -idmv < 9204_lnx32_release.cpio
Disk1/stage/locks
Disk1/stage/Patches/oracle.apache.isqlplus/9.2.0.4.0/1/DataFiles/bin.1.1.jar
Disk1/stage/Patches/oracle.apache.isqlplus/9.2.0.4.0/1/DataFiles/lib.1.1.jar
...
The Cluster Manager and Node Monitor oracm accepts registration of Oracle instances
to the cluster and it sends ping messages to Cluster Managers (Node Monitor) on
other RAC nodes. If this heartbeat fails, oracm uses a quorum file or a quorum
partition on the shared disk to distinguish between a node failure and a network
failure. So if a node stops sending ping messages, but continues writing to the quorum
file or partition, then the other Cluster Managers can recognize it as a network failure.
The Cluster Manager (CM) uses now UDP instead of TCP for communication.
Once the Oracle Cluster Manager is running on all RAC nodes, OUI will
automatically recognice all nodes on the cluster. When you run the Installer, you will
see the "Cluster Node Selection" screen if the oracm process is running on the RAC
nodes. This also means that you can launch runInstaller on one RAC node and have
the Oracle software automatically installed on all other RAC nodes as well.
If you have the quorum file on an OCFS (1.0.10-2 OCFS) file system on
the FireWire drive, then the Cluster Manager oracm will only come up on one RAC
node. If you start oracm on a second RAC node, it will crash. Until this bug is
resolved, a raw device needs to be used for FireWire drives.
For Raw Devices:
In Creating Partitions for Raw Devices I created a 2 MB partitions (raw device) for
the quorum file on my external FireWire drive. The name of my quorum partition on
the FireWire drive is /dev/sda2 which is bound to /dev/raw/raw1.
Optionally, you can create a soft link to this raw device. If you haven't done it yet as
show in Creating Partitions for Raw Devices, then do it now by running the following
command on all RAC nodes:
su - oracle
ln -s /dev/raw/raw1 /var/opt/oracle/oradata/orcl/CMQuorumFile
To install the Oracle Cluster Manager, insert the Oracle 9i R2 Disk 1 and
launch /mnt/cdrom/runInstaller. These steps only need to be performed on one RAC
node, the node you are installing from.
It is not necessary to run the watchdogd daemon with the Oracle Cluster Manager
9.2.0.2 or with any newer version. Since watchdogd has been replaced with
thehangcheck-timer kernel module, some files need to be updated.
The following changes need to be done if the Oracle9i Cluster Manager 9.2.0.1.0 has
been patched to e.g. version 9.2.0.4.0 as described under Applying Oracle9i Cluster
Manager 9.2.0.4.0 Patch Set.
# Get arguments
#watchdogd_args=`grep '^watchdogd' $OCMARGS_FILE |\
# sed -e 's+^watchdogd *++'`
# Startup watchdogd
#echo watchdogd $watchdogd_args
#watchdogd $watchdogd_args
Starting and Stopping Oracle 9i Cluster Manager
To start the Cluster Manager (CM) and Node Monitor oracm, run the following
commands on all RAC nodes:
su - root
# . ~oracle/.bash_profile # Set Oracle environment
# $ORACLE_HOME/oracm/bin/ocmstart.sh
oracm </dev/null 2>&1 >/opt/oracle/product/9.2.0/oracm/log/cm.out &
#
# ps -ef |grep oracm
root 15249 1 0 Nov08 pts/2 00:00:00 oracm
root 15251 15249 0 Nov08 pts/2 00:00:00 oracm
root 15252 15251 0 Nov08 pts/2 00:00:00 oracm
root 15253 15251 0 Nov08 pts/2 00:00:00 oracm
root 15254 15251 0 Nov08 pts/2 00:00:04 oracm
root 15255 15251 0 Nov08 pts/2 00:00:00 oracm
root 15256 15251 0 Nov08 pts/2 00:00:00 oracm
root 15257 15251 0 Nov08 pts/2 00:00:00 oracm
root 15258 15251 0 Nov08 pts/2 00:00:00 oracm
root 15298 15251 0 Nov08 pts/2 00:00:00 oracm
root 15322 15251 0 Nov08 pts/2 00:00:00 oracm
root 15540 15251 0 Nov08 pts/2 00:00:00 oracm
#
NOTE:
Once the Cluster Manager is upgraded, oracm won't come up any more if you use
OCFS. It will always die after a few seconds. To fix this run the following command
on only one RAC node:
su - oracle
dd if=/dev/zero of=/var/opt/oracle/oradata/orcl/CMQuorumFile bs=4096 count=96
After that restart CM.
If the procps RPM was installed, you will only see one oracm process in the process
table when you run the ps command. That's because the ps command that comes with
the procps RPM does not show a thread as a separate process in the ps output.
rpm -qf /bin/ps
procps-2.0.7-11
Installing Oracle 9i Real Application Cluster
General
Before you install the Oracle9i Real Application Cluster 9.2.0.1.0 software (RAC
software + database software), you have to make sure that
the pdksh andncurses4 RPMs are installed on all RAC nodes! If these RPMs are not
installed, you will get the following error message when you
run $ORACLE_HOME/root.shon each RAC node during the software installation:
...
error: failed dependencies:
libncurses.so.4 is needed by orclclnt-nw_lssv.Build.71-1
error: failed dependencies:
orclclnt = nw_lssv.Build.71-1 is needed by orcldrvr-nw_lssv.Build.71-1
error: failed dependencies:
orclclnt = nw_lssv.Build.71-1 is needed by orclnode-nw_lssv.Build.71-1
orcldrvr = nw_lssv.Build.71-1 is needed by orclnode-nw_lssv.Build.71-1
libscsi.so is needed by orclnode-nw_lssv.Build.71-1
libsji.so is needed by orclnode-nw_lssv.Build.71-1
error: failed dependencies:
orclclnt = nw_lssv.Build.71-1 is needed by orclserv-nw_lssv.Build.71-1
orclnode = nw_lssv.Build.71-1 is needed by orclserv-nw_lssv.Build.71-1
/bin/ksh is needed by orclserv-nw_lssv.Build.71-1
package orclman-nw_lssv.Build.71-1 is already installed
A shared configuration file is needed for the srvctl utility which is used to manage
Real Application Clusters instances and listeners.
For OCFS Filesystems:
To create the shared configuration file for srvctl, run the following command on one
RAC node::
su - oracle
touch /var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile
For Raw Devices:
In Creating Partitions for Raw Devices I created a 20 MB partitions (raw device) for
the shared configuration file. The device name of the shared configuration file
is /dev/sda1 which is bound to /dev/raw/raw2.
Optionally, you can create a soft link to this raw device. If you haven't done it yet as
show under Creating Partitions for Raw Devices, do it now by running the following
command on all RAC nodes:
su - oracle
ln -s /dev/raw/raw2 /var/opt/oracle/oradata/orcl/SharedSrvctlConfigFile
To install the Oracle9i Real Application Cluster 9.2.0.1.0 software, insert the
Oracle9iR2 Disk 1 and launch runInstaller. These steps only need to be
performedon one node, the node you are installing from.
When the Oracle Enterprise Manager Console comes up, don't install a
database yet.
When the Install window displays "Performing remote operations (99%)", you will
see a command like this one running on the RAC nodes:
bash -c /bin/sh -c cd /; cpio -idmuc
If this command is running, it shows that the Oracle software is currently being
installed on the RAC node(s).
NOTE: There are still bugs that prevent Oracle sometimes from installing the Oracle
software on all RAC nodes. If the Installer hangs at "Performing remote operations
(99%)" and the above bash command is not running on the Oracle RAC nodes any
more, then you need to abort the installation. One time I kept the Installer running the
whole night without success. And one time I had to run the Installer five times
because it was alway hanging at "Performing remote operations (99%)". A
workaround would be to run runInstaller on all RAC nodes to install the software on
each RAC node separately.
Before any other Oracle patches are applied, you first need to
patch runInstaller in $ORACLE_HOME/bin.
Dynamic libraries:
08048000-0804c000 r-xp 00000000 03:05 1044862
/opt/oracle/jre/1.3.1/bin/i386/native_threads/java
0804c000-0804d000 rw-p 00003000 03:05 1044862
/opt/oracle/jre/1.3.1/bin/i386/native_threads/java
40000000-40016000 r-xp 00000000 03:05 767065 /lib/ld-2.2.4.so
40016000-40017000 rw-p 00015000 03:05 767065 /lib/ld-2.2.4.so
40018000-40025000 r-xp 00000000 03:05 767061 /lib/i686/libpthread-0.9.so
40025000-40029000 rw-p 0000c000 03:05 767061 /lib/i686/libpthread-0.9.so
...
4bb27000-4bb2e000 r-xp 00000000 03:05 2415686
/opt/oracle/oui/bin/linux/libsrvm.so
4bb2e000-4bb2f000 rw-p 00006000 03:05 2415686
/opt/oracle/oui/bin/linux/libsrvm.so
4bf27000-4c823000 r-xp 00000000 03:05 1518276
/opt/oracle/product/9.2.0/lib/libclntsh.so.9.0
4c823000-4c8e5000 rw-p 008fb000 03:05 1518276
/opt/oracle/product/9.2.0/lib/libclntsh.so.9.0
To patch the Oracle Software to Oracle9iR2 Patch Set 3 9.2.0.4, launch the Installer.
These steps only need to be performed on one node, the node you are installing from.
su - oracle
cd $ORACLE_HOME/bin
$ ./runInstaller
- Welcome Screen: Click Next
- Inventory Location: Click Next
- Cluster Node Selection:Select/Highlight all RAC nodes using the shift key
and the left mouse button.
Note: If not all RAC nodes are showing up, or if
the Node Selection Screen
doesn't appear, then the Oracle Cluster Manager
(Node Monitor) oracm is probably not
running on all RAC nodes. See Starting and
Stopping Oracle 9i Cluster Manager for more information.
- File Locations: - Click "Browse for the Source"
- Navigate to the stage directory where the patch
set is located.
On my system it is: "/tmp/Disk1/stage"
- Select the "products.jar" file.
- Click OK
- Available Products: Select "Oracle9iR2 Patch Set 3 9.2.0.4"
- Summary: Click Install
- When installation has completed, click Exit.
To start the Global Services daemon gsd on all RAC nodes, run the following
command on all RAC nodes:
su - oracle
$ gsdctl start
Successfully started GSD on local node
$
The Database Configuration Assistant will create the Oracle database files
automatically on the OCFS filesystem if "dbca -datafileDestination
/var/opt/oracle/oradata" is executed.
Using Raw Devices for Database Files and Other Files
Optionally, you can create soft links to the raw devices. In my setup I created soft
links to /var/opt/oracle/oradata/orcl.
If you haven't created the soft links yet as shown in Creating Partitions for Raw
Devices, run now the following commands on all RAC nodes since OCFS is not being
used here:
su - oracle
ln -s /dev/raw/raw3 /var/opt/oracle/oradata/orcl/spfileorcl.ora
ln -s /dev/raw/raw4 /var/opt/oracle/oradata/orcl/control01.ctl
ln -s /dev/raw/raw5 /var/opt/oracle/oradata/orcl/control02.ctl
ln -s /dev/raw/raw6 /var/opt/oracle/oradata/orcl/indx01.dbf
ln -s /dev/raw/raw7 /var/opt/oracle/oradata/orcl/system01.dbf
ln -s /dev/raw/raw8 /var/opt/oracle/oradata/orcl/temp01.dbf
ln -s /dev/raw/raw9 /var/opt/oracle/oradata/orcl/tools01.dbf
ln -s /dev/raw/raw10 /var/opt/oracle/oradata/orcl/undotbs01.dbf
ln -s /dev/raw/raw11 /var/opt/oracle/oradata/orcl/undotbs02.dbf
ln -s /dev/raw/raw12 /var/opt/oracle/oradata/orcl/undotbs03.dbf
ln -s /dev/raw/raw13 /var/opt/oracle/oradata/orcl/users01.dbf
ln -s /dev/raw/raw14 /var/opt/oracle/oradata/orcl/redo01.log
ln -s /dev/raw/raw15 /var/opt/oracle/oradata/orcl/redo02.log
ln -s /dev/raw/raw16 /var/opt/oracle/oradata/orcl/redo03.log
ln -s /dev/raw/raw17 /var/opt/oracle/oradata/orcl/orcl_redo2_2.log
ln -s /dev/raw/raw18 /var/opt/oracle/oradata/orcl/orcl_redo3_1.log
ln -s /dev/raw/raw19 /var/opt/oracle/oradata/orcl/orcl_redo3_2.log
Launch the Installer with the following option whether you use OCFS or raw devices
(assuming you created soft links to the raw devices):
su - oracle
dbca -datafileDestination /var/opt/oracle/oradata
To test TAF on the new installed RAC cluster, configure the tnsnames.ora file for
TAF on a non-RAC server where you have either the Oracle database software or the
Oracle client software installed.
Here is an example how
my /opt/oracle/product/9.2.0/network/admin/tnsnames.ora: looks like:
ORCL =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1pub)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = rac2pub)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = rac3pub)(PORT = 1521))
(LOAD_BALANCE = on)
(FAILOVER = on)
)
(CONNECT_DATA =
(SERVICE_NAME = orcl)
(FAILOVER_MODE =
(TYPE = session)
(METHOD = basic)
)
)
)
The following SQL statement can be used to check the sessions's failover type,
failover method, and if a failover has occured:
select instance_name, host_name,
NULL AS failover_type,
NULL AS failover_method,
NULL AS failed_over
FROM v$instance
UNION
SELECT NULL, NULL, failover_type, failover_method, failed_over
FROM v$session
WHERE username = 'SYSTEM';
SQL>
Now do a "shutdown abort" on "rac1prv" for instance "orcl1". You can use
the srvctl utility to do this:
su - oracle
$ srvctl status database -d orcl
Instance orcl1 is running on node rac1pub
Instance orcl2 is running on node rac2pub
Instance orcl3 is running on node rac3pub
$
$ srvctl stop instance -d orcl -i orcl1 -o abort
$
$ srvctl status database -d orcl
Instance orcl1 is not running on node rac1pub
Instance orcl2 is running on node rac2pub
Instance orcl3 is running on node rac3pub
$
SQL>
The SQL statement shows that the sessions has now been failed over to instance
"orcl2".
Appendix
Oracle 9i RAC Problems and Errors
This section describes problems and errors pertaining to installing Oracle9i RAC on
Red Hat Advanced Server.
• "Raw device validation check for Control file
"/var/opt/oracle/oradata/orcl/control01.ctl"
• failed, stat for /var/opt/oracle/oradata/orcl/control01.ctl failed.
There could be many reasons for this problem. If you use raw devices, check if you
rebooted the server after you created new raw devices.
• ./runInstaller
• An unexpected exception has been detected in native code outside the VM.
• Unexpected Signal : 11 occurred at PC=0x40008e4a
• Function name=_dl_lookup_symbol
• Library=/lib/ld-linux.so.2
•
• Current Java thread:
• at java.lang.ClassLoader$NativeLibrary.find(Native Method)
• at java.lang.ClassLoader.findNative(ClassLoader.java:1441)
• at ssOiGenClassux22.linkExists(Native Method)
• ...
•
• SQL> startup
• CMCLI ERROR: OpenCommPort: connect failed with error 111.
• CMCLI ERROR: OpenCommPort: connect failed with error 111.
• CMCLI ERROR: OpenCommPort: connect failed with error 111.
• ORACLE instance started.
•
• Total System Global Area 89199856 bytes
• Fixed Size 451824 bytes
• Variable Size 62914560 bytes
• Database Buffers 25165824 bytes
• Redo Buffers 667648 bytes
• ORA-32700: error occurred in DIAG Group Service
The Oracle Cluster Manager oracm is not running.