Revision History
Revision Revision Summary of Changes Changed by
Number Date (Acronym)
1.0 11/05/04 First draft SRO
1.1 09/08/04 Few updates and add monitoring SRO
1.2 14/09/04 PDBnet SRO
Restoring
Various additions
1.3 31/01/05 PDBnet information updated SRO
1.4 16/02/05 Veritas Support Information SRO
Table Of Contents
1. Introduction....................................................................... .........................4
9.6 Recovering the service after a HW failure has occurred (BETA section)..............15
1.Introduction
As the Globe solution is being deployed, there is a rising need for highly available applications. For
instance, intranet websites need to be available to the users 24 hours a day, 7 days a week.
Server cluster provides failover support for applications and services that require high availability,
scalability and reliability. With clustering, organizations can make applications and data available
on multiple servers linked together in a cluster configuration. Back-end applications and services,
such as those provided by database servers, are ideal candidates for Server cluster.
The concept of a cluster involves taking two or more computers and organizing them to work
together to provide higher availability, reliability and scalability than can be obtained by using a
single system. When failure occurs in a cluster, resources can be redirected and the workload can
be redistributed. Typically, the end user experiences a limited failure, and may only have to refresh
the browser or reconnect to an application to begin working again.
The Windows 2K Development Team located in Vevey has developed two possible
implementations of Windows clustering: simple cluster and geographically dispersed cluster (also
referred to as geocluster).
File: 8909694.doc Version: 1.4
Author: Steve Rosa Page 4 of 28
Saved on 03/11/2005 04:51:00 AM
Last printed 16/02/2005 11:34:00 AM
Nestlé European Information Technology Operations Center (ITOC) S.A.
The solution provided by the Dev Team has been certified on the following hardware:
• IBM eSeries x360 connected to an IBM ESS E800 SAN (simple clusters and geoclusters,
only in regional datacenters)
• Dell PowerEdge connected to a Dell PowerVault PV-220s configured in cluster mode
(simple clusters only)
Concerning software requirements, the only version of Oasis 2 that is supported for clustering is
2226.
This document will focus on the IBM implementation, as it is the only one used in Z-EUR for the
moment.
ESS
Q: drive (quorun)
F: drive (data)
SERVER1 SERVER2
VIRTUAL SERVER
SERVERNAME ipaddress
This cluster consists of two different servers located in the same site and connected to the same
SAN.
Each of the servers (but not at the same time) has access to two SAN volumes1
one volume is called the Quorum and holds the information on the cluster, like the name of the
node that owns the resources, etc…
the other volume holds the actual data
Access to the volumes is managed by the cluster service in a way that at a particular time, only
one node has access to the volume.
The two nodes of the cluster have at least two network connections:
- one connection is for the production network allowing the clients/applications to connect to the
cluster (it is called Public Network Connection)
- one connection is for the internal connection (like the heartbeat) between the two nodes (it is
called Private Network Connection): in the simple cluster implementation, a crossed network cable
is used.
The cluster mechanism presents a virtual server with a virtual IP address to the clients or
applications. If the node currently owning the cluster resource group fails, the other node will take
over all resources.
1
These volumes are configured with special settings, in order to allow the two servers to recognize the same
data.
4.GeoCluster Architecture
Before starting, note that geocluster is only supported in the Regional DataCenters (Frankfurt and
Mainz in our case).
Datacenter 2
ESS Datacenter 2
Q: drive (quorun)
F: drive (data)
Datacenter 1
ESS Datacenter 1
SERVER1 Q: drive (quorun) SERVER2
F: drive (data)
VIRTUAL SERVER
SERVERNAME ipaddress
We will explain here the differences between simple cluster and geocluster.
Each node is located in a specific datacenter and connects to both ESS SAN: the one in its own
site and the one in the remote site.
As with the simple cluster, two SAN volumes are accessible by each node (not a the same time).
One of the major differences between simple cluster and geocluster is about the storage. In the
geocluster, each disk volume used by the cluster consists of 3 LUNs forming a mirror
concatenated2: 1 on the local SAN and 2 on the remote SAN. This will allow the cluster to be
independent from whole SAN failure in the local site.
2
The FlashSnap feature of Veritas Volume Manager achieves synchronisation between the LUNs.
Another major difference with the simple cluster is the existence of an extra heartbeat connection.
The two nodes of the cluster have at least three network connections:
one connection is for the production network allowing the clients/applications to connect to the
cluster (it is called Public Network Connection)3
two connections are used for the internal connection (like the heartbeat):
one goes via the MAN connection between the two datacenters
one goes via a dedicated leased line; this enables the cluster to be independent from a MAN
failure
Tivoli Storage Manager client: used for the backup of the physical nodes (C$, D$ and system
objects) (especially in the RDC)
Tivoli Data Protection agent for SQL server used for the backup of the SQL server (if the cluster
runs SQL Server 2000) (in the future)
Tivoli EndPoint (physical endpoint is installed on each of the nodes and a logical endpoint is
installed on the cluster)
Veritas Volume Manager with FlashSnap option used for the synchronisation between the LUNs on
the SAN (if it is a geocluster).
Veritas Backup Exec: used to backup data in the markets (this solution is still in development).
3
Note that with the geocluster, the public network connection relies on the teaming of two different NICs.
6.Supported Applications
The cluster can be configured in two different ways, depending on the needs.
We will take SQL server as an example, but the explanation is valid for any other cluster-aware
application.
An active/passive configuration allows you to have a single instance of SQL Server running on one
of the physical servers in the cluster. The other nodes in the cluster are in standby mode until a
failure on the active node or a manual failover during maintenance occurs. Only one SQL Server
2000 virtual server is installed on an active/passive SQL Server cluster environment.
An active/active configuration allows you to have multiple instances of SQL Server running on both
nodes of a cluster. If one of the SQL Servers in the cluster fails, the failed instances of SQL Server
will automatically fail over to the other server. This means that both instances of SQL Server will be
running on one physical server, instead of two separate servers. This could lead to some
performance drops, but it is much better than having nothing.
This role information is gathered and inserted in the PDB Net application.
A cluster node is detected as having the mscs role, as shown in the following screenshot:
So, in case of a failure, when you have to recover a server, you can directly see in the inventory
that this server is a cluster node.
In addition to this, PDBnet application has been amended in order to show partnership between
servers.
For instance, if you look for DESDB017, you can see the following comment:
This clearly explains which server is the counterpart, which framework is using the cluster and the
names of the virtual servers. The first one is always the name of the MS cluster, while the following
names are the names of the virtual SQL cluster instances.
From there on, you can open one of the virtual servers, e.g. DESDB523:
So, from a physical node, you can obtain the names of the counterpart and the virtual servers.
From a logical node or virtual server, you can obtain the names of the two physical nodes.
Once you have started it, it may ask for the cluster you want to manage. If you are logged on to a
node of the cluster, simply enter . as cluster name 4:
We will now see the things that can be done with this console.
4
You can enter the name of the cluster or the name of one of the nodes.
The different resources of the cluster are grouped together. If you expand the Groups folder on the
left pane, you can see the groups present on this cluster:
In the right pane, you can see the resources member of the selected group.
Due to the dependencies, you may fail individual resources over. You must always fail group over.
To do so, right-click on a group and select Move Group:
To take a resource offline, locate the resource, right-click on it and select Take Offline 5:
Note that the resource holding the quorum disk (usually the Q:) may not be taken offline.
5
Note that doing this will take offline the resources that depend on the resource you take offline.
To bring a resource online, locate the resource, right-click on it and select Bring Online:
When planned maintenance has to be performed on the nodes (upgrade, service pack, hotfix, …),
follow the following steps:
Make sure that all resources are owned by one single node.
Perform the maintenance on the standby node (i.e. the node not owning any resources).
Reboot if needed the standby node.
When it is back, move all resource groups to the standby node.
Wait for the application to come online.
Have confirmation from the users or application owners that the application is still working as
expected.
Perform the maintenance on the other node.
Reboot if needed the other node.
When it is back, move again all resource groups to it.
Have confirmation from the users or application owners that the application is still working as
expected.
9.6 Recovering the service after a HW failure has occurred (BETA section)
You can have different cases:
9.6.1Shared data and / or quorum drive lost / corrupt
This is the situation where you have to restore the shared data (either on a SAN or on a shared
PV) before being able to restart the application.
1. Make sure that the disk subsystem has been repaired and that the partitions are created
and visible by both nodes (see the GLOBE ISIT OASIS2 Cluster Guide).
2. Shut down one node of the cluster in order to make sure that only one node is running.
3. Make sure that the application is stopped.
4. Restore the quorum partition and the shared data partition(s) (F:, G:, …) 6.
5. Restart the application via the Cluster Administrator tool.
6. Have the functionality of the application checked by the application responsible.
7. Restart the second node of the cluster.
6
Note that in the RDC, this has to be done with the help of the AIX Support team.
8. Fail the application over to the second node. Attention, this causes an application outage,
please check with the application responsible on beforehand.
9. Have the functionality of the application checked by the application responsible.
10.Monitoring
Furthermore, on GeoClusters, the Veritas Volume Manager software is monitored as well. Please
refer to Appendix A – List of Veritas Volume Manager events monitored by Tivoli for the list of the
monitored events and Appendix B – Veritas Volume Manager Support Information for support
information.
In addition, there is a logical endpoint created for the monitoring of the application itself. The name
of the logical endpoint is VirtualServerName_ClusterGroupName-log
(e.g.: DESDB537_DESDB537\PSWWW50-log).
The configuration and the logfiles are stored on the shared drive used by the application (F: or G:)
at the following location: SharedDrive:\Program Files\Tivoli\lcfX., where X is 2 or 3,
depending on which cluster group is monitored by this endpoint.
Note that the logical endpoint is a cluster resource and is member of the cluster group of the
application:
11.Backup
As any server located in the Regional Data Center, each local resource of the server is backed up
by TSM. So, all the local partitions (C: D:) and the system objects (registry, WMI repository, system
files,…) are protected.
The cluster resources (usually, SAN partitions) are backed up as well via TSM. In fact we install
another Scheduler service. This scheduler is an extra resource in the cluster group:
Servers located in the markets are backed up by a special component of Backup Exec
(development of this solution is in progress).
12.Troubleshooting
The cluster service is relatively verbose; there are several ways to troubleshoot potential issues.
You can find precious information (especially during startup of the cluster and nodes joining) in the
following file:
C:\WINNT\cluster\Cluster.log
Always remember that users and/or applications rely on the applications hosted on the
cluster. Each action you perform can have dramatic results. Some applications need to be
restarted after the DB has been restarted or failed over. In summary always discuss with the
application responsible before doing anything on the database side.
Do not ever reboot the two cluster nodes at the same time. Always reboot the first node, wait
and then reboot the second node.
One of the most sensitive parts of a cluster is the communication between the two nodes. If the
nodes do not communicate with each other, the resources will start on both nodes, causing
damage to the application data (this is referred to as the split-brain situation). Note that the only
way to recover from this situation is to take the application offline and restore the data from the
last backup tape.
Globe documents:
GLOBE ISIT OASIS2 Cluster Guide.doc
GLOBE ISIT OASIS2 GeoCluster Architecture.doc
GLOBE ISIT OASIS2 Geo Cluster Installation Document.doc
GLOBE ISIT OASIS2 TSM Backup and restore guidelines.doc
GLOBE ISIT SQL 2000 Operational guideline.doc
M250 Windows Naming Conventions.doc
Website:
www.microsoft.com
ID Priority Description
584 P2 No disk was found for the cluster dynamic disk group.
584 P2 No disk was found for the cluster dynamic disk group.
150 P3 INTERNAL Error - No valid disk found containing dynamic disk group
541 P3 Dynamic disk group not found. Failed to start SCSI reservation thread.
Unable to reserve a majority of dynamic disk group members. Failed to start SCSI
542 P3 reservation thread.
543 P3 Failed to start SCSI reservation thread for dynamic disk group.
544 P3 Failed to import dynamic disk group.
545 P3 Failed to stop SCSI reservation thread for dynamic disk group.
546 P3 Failed to release dynamic disk group reservations.
547 P3 Dynamic disk group not found. Failed to update SCSI reservation thread.
548 P3 Failed to obtain SCSI reservations for all members of dynamic disk group.
549 P3 Failed to update SCSI reservation thread for dynamic disk group.
580 P3 Failed to Recover the DiskGroup.
581 P3 Failed to lock the volume.
586 P3 Starting reservation thread on the cluster dynamic disk group failed.
589 P3 Import the cluster dynamic disk group failed.
740 P3 Dynamic disk group could not be found.
791 P3 Failed to resynchronize volume %1.
809 P3 Volume capacity reached error condition
813 P3 Capacity critical error on %1.
814 P3 The volume free space has reached the user defined critical condition.
8016 P3 Immediately back up your data and replace your hard disk. A failure may be imminent.
8020 P3 Failed to start SCSI reservation thread for dynamic disk group.
8022 P3 SCSI Reservation Thread Start Failure
8050 P3 The SCSI reservation thread update failed for dynamic disk group.
8052 P3 SCSI Reservation Thread Update Failure
8094 P3 Failed to reactivate Harddisk%1.
10231 P3 Import cluster dynamic disk group failed
48 P5 INTERNAL Error - No convergence between dynamic disk group and disk list
103 P5 INTERNAL Error - Log length too small for logging type
104 P5 INTERNAL Error - Log length too large for logging type
106 P5 INTERNAL Error - Field value is out of range
111 P5 The specified disk is not ready or usable
112 P5 INTERNAL Error - Volume is unusable
113 P5 The specified disk is not ready or usable
114 P5 INTERNAL Error - Device node not block special
124 P5 INTERNAL Error - Configuration too large for dynamic disk group log
125 P5 INTERNAL Error - Disk log too small for dynamic disk group configuration
127 P5 INTERNAL Error - Error in configuration record
146 P5 INTERNAL Error - Too many minor numbers for volume or plex
147 P5 INTERNAL Error - Invalid block number
148 P5 INTERNAL Error - Invalid magic number
149 P5 INTERNAL Error - Invalid block number
154 P5 INTERNAL Error - Temp and perm configurations does not match
156 P5 INTERNAL Error - Rootdg dynamic disk group has no configuration copies
157 P5 INTERNAL Error - Rootdg dynamic disk group has no log copies
161 P5 INTERNAL Error - Kernel and on-disk configurations does not match
162 P5 INTERNAL Error - Disk public region is too small
163 P5 INTERNAL Error - Disk private region is too small
164 P5 INTERNAL Error - Disk private region is full
170 P5 INTERNAL Error - Disks for dynamic disk group are inconsistent
177 P5 INTERNAL Error - Disk VTOC does not list public partition
178 P5 INTERNAL Error - Disk VTOC does not list private partition
182 P5 INTERNAL Error - Dynamic disk group has no valid configuration copies
184 P5 INTERNAL Error - Disk for dynamic disk group in other dynamic disk group
185 P5 INTERNAL Error - Stripe column number too large for plex
186 P5 INTERNAL Error - Volume does not have a RAID read policy
187 P5 INTERNAL Error - Volume has a RAID read policy
190 P5 INTERNAL Error - License has expired or is not available for operation
192 P5 INTERNAL Error - Volume does not have the storage attribute
203 P5 INTERNAL Error - Overlapping partitions detected
240 P5 Disk(s) can not be upgraded as there are too many partitions.
The disk configuration appears to be changed by another system. Use Merge Foreign Disk
243 P5 to correct the problem.
244 P5 Disk upgrade aborted
BOOT.INI should be modified because the partition number of a bootable volume might
264 P5 have been changed.
306 P5 Snap record not found.
Operation failed. Diskgroup requires recovery. Please recover the diskgroup and retry this
323 P5 operation.
500 P5 Could not open a handle to vxload.sys
501 P5 Call to start vxio failed.
502 P5 Could not load vxconfig.dll.
503 P5 Could not get a proc address in vmconfig.
504 P5 DmConfig is not loaded.
505 P5 Vxio driver not started.
507 P5 Unable to read disk layout information.
509 P5 Disk partition not found.
528 P5 Unable to start or modify cluster dynamic disk group SCSI reservation thread.